Stable Diffusion Fix Hands Without ControlNet and Inpainting (Easy) | SDXL FREE! (Automatic1111)

Xclbr Xtra
23 Apr 202406:50

TLDRIn this video, the host demonstrates a straightforward method to generate realistic hands in images using the Stable Diffusion model without resorting to complex techniques like ControlNet or inpaintings. The process involves using two models: the Real V SDXL for initial generation and then the Dreambooth Turbo for enhancement. The host emphasizes that while the method may not produce highly complex hand poses, it is quite effective for creating natural-looking hands without extra fingers or deformities. The video provides detailed steps, including the use of specific settings like sampling steps, DPM Plus+ 3M, SD exponential, and CFG scale. The host also discusses the importance of using a full-body prompt to ensure the visibility of hands and the use of negative prompting to avoid unwanted elements. The final result is a more realistic image with properly formed hands, suitable for most use cases that do not require professional-level perfection.

Takeaways

  • 🎨 Use the Real V SDXL model to generate decent hands without complex details.
  • 🚫 Avoid using control net or inpaint techniques for a simpler process.
  • 🌟 Start with a full-body prompt to ensure hands are visible and not just a close-up.
  • πŸ” Use mid-journey mimic settings at 0.5 for an aesthetic feel without overdoing it.
  • 🚫 Don't use detailers initially as upscaling will be done later with a different model.
  • πŸ”’ Set 50 sampling steps and DPM Plus+ 3M, SD exponential for the initial model.
  • πŸ”„ Enable self-attention guidance integrated for better results.
  • πŸ“ˆ Use a batch count of two and CFG scale around 6 to 7 for initial generation.
  • πŸ”„ For the second model, use Dream Shipper Turbo with a batch count of four.
  • πŸ“Š Adjust the CFG scale to 1 for the turbo model, which is sufficient for realistic results.
  • πŸ–ŒοΈ Consider using inpainting for further improvement if needed, but the process covers most use cases.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is showing how to create proper hands in images using the Stable Diffusion model without using ControlNet or inpainting.

  • Which model is suggested for generating decent hands?

    -The Real V SDXL model is suggested for generating decent hands.

  • What is the issue with the Real V SDXL model according to the speaker?

    -The issue with the Real V SDXL model is that it does not feel as realistic as the speaker wants it to be.

  • What are the two models used in the process mentioned in the video?

    -The two models used in the process are the Real V SDXL model and the Dream Shipper Turbo model.

  • What is the purpose of using a detailer in the process?

    -The purpose of using a detailer is to enhance the details of the image after it has been upscaled.

  • What is the significance of the 'mid Journey mimic' setting?

    -The 'mid Journey mimic' setting is used to give an aesthetic feel to the image and is set at 0.5 to avoid being too strong.

  • What are the sampling steps used for the Real V SDXL model?

    -The sampling steps used for the Real V SDXL model are 50.

  • What is the batch count used in the process?

    -The batch count used in the process is two.

  • What feature should be enabled for the Forge web UI?

    -The 'self attention guidance integrated' feature should be enabled for the Forge web UI.

  • What is the purpose of the 'clip skip' setting?

    -The 'clip skip' setting is used to provide a little bit better details in the image, although it does not affect the hands.

  • How does the turbo model contribute to the final image?

    -The turbo model works on denoising the image to the strength provided and makes the skin look more realistic.

  • What is the speaker's final recommendation for users who are not using the process professionally?

    -For users who are not using the process professionally, the speaker recommends the described process as it should cover almost all use cases without the need for further inpainting.

Outlines

00:00

🎨 Creating Realistic Hands in Art with SDXL Model

The first paragraph introduces the video's purpose, which is to demonstrate how to create realistic hands using the real V SDXL model. The creator acknowledges that while the model can produce decent hands, they may not feel as realistic as desired. The process described is simple and does not involve complex techniques like control net or in-painting. The focus is on generating hands that appear normal with no disfigurements. The video also mentions using two models to achieve the desired result and provides a prompt example of a fairy tale woman with long blue hair. The importance of a full-body shot is highlighted to ensure the hands are visible. The settings for the model are discussed, including the use of mid-Journey mimic at a 0.5 setting, negative prompting to avoid NSFW and blurry images, and specific parameters like sampling steps, DPM Plus+ 3M SD exponential, and CFG scale. The video concludes this section by noting the use of the Forge web UI and the integrated features that simplify the process.

05:02

πŸš€ Enhancing Image Realism with Turbo Models

The second paragraph details the process of enhancing the realism of the generated image using a turbo model. The creator explains that they will upscale the initial image and then apply an A-detailer for further refinement. The video demonstrates changing the model to 'dream shipper turbo' while keeping the aesthetic settings consistent. The settings for the turbo model are adjusted to include eight steps, a higher S3 cara scale, and an increased batch count for better image quality. The importance of finding the right balance in CFG scale is emphasized, with a suggestion to set it to one for this scenario. The video also discusses the use of the A-detailer and the Freu integrated and self-attention guidance features. The creator shares their findings that turbo models can add a more realistic feel to the skin texture. The paragraph concludes with the creator's recommendation that for most use cases, the described process should suffice, and additional in-painting could be used for further improvement if necessary.

Mindmap

Keywords

πŸ’‘Stable Diffusion

Stable Diffusion is a term referring to a type of machine learning model used for generating images from textual descriptions. In the context of the video, it is the core technology that the presenter is utilizing to create images with proper hands, without the need for additional control mechanisms like ControlNet or inpaintings.

πŸ’‘Hands

In the video, 'hands' are a critical focus as the presenter aims to generate images where the hands are depicted correctly and not disfigured. The challenge is to create realistic and well-formed hands in the generated images, which is a common issue in AI-generated imagery.

πŸ’‘ControlNet

ControlNet is a term that refers to a neural network architecture used to control and guide the output of generative models. The video mentions that the process for generating better hands does not require the use of ControlNet, indicating a simpler approach to achieving the desired results.

πŸ’‘Inpainting

Inpainting is a technique used in image processing to fill in missing or damaged parts of an image. The video script specifies that the method demonstrated will not involve inpaintings, suggesting an alternative approach to generating images with properly formed hands.

πŸ’‘Real V SDXL Model

The 'Real V SDXL Model' is a specific model mentioned in the video that is capable of generating images with decent hands. However, the presenter notes that the realism of the hands could be improved, which is the subject of the video's tutorial.

πŸ’‘Mid Journey Mimic

This term refers to a setting or feature within the image generation process that gives the image an aesthetic feel. The presenter uses it at a 0.5 setting to avoid making the image too strong, indicating it's a tool to control the style of the generated image.

πŸ’‘Negative Prompting

Negative prompting is a technique used when working with generative models where the user specifies what they do not want to appear in the generated image. In the video, the presenter uses negative prompting to avoid NSFW content, blurriness, and bad hands in the generated images.

πŸ’‘Sampling Steps

Sampling steps refer to the number of iterations the generative model goes through to create an image. The video mentions using 50 sampling steps with a specific set of parameters to achieve a better result in terms of hand depiction.

πŸ’‘CFG Scale

CFG stands for 'Control Flow Guidance', and the scale is a parameter that adjusts the influence of the guidance on the image generation process. The presenter mentions adjusting the CFG scale to find a balance that works well for upscaling the images.

πŸ’‘Image to Image

This term refers to a process where an existing image is used as a starting point to generate a new image with specific modifications or enhancements. In the video, the presenter uses the 'Image to Image' process to refine the hands in the generated image.

πŸ’‘Denoising

Denoising is the process of removing noise or unwanted elements from an image to improve its quality. The video discusses using a 'turbo' model to denoise the image and make the skin appear more realistic, which is a crucial step in enhancing the final output.

Highlights

The video demonstrates an easy way to generate proper hands without using ControlNet or inpaint techniques.

The process is suitable for creating simple hand poses without extra fingers or disfigurements.

The Real V SDXL model is used for generating decent hands, but improvements are needed for realism.

Two models are utilized in the process to enhance the hand generation quality.

The video emphasizes the importance of a full-body prompt to ensure hands are visible in the generated image.

Mid Journey mimic is used for an aesthetic feel at a 0.5 setting to avoid over-strength.

Negative prompting is employed to avoid NSFW content, blurriness, and poorly generated hands.

The video uses the Real Viz normal model with 50 sampling steps and DPM Plus+ 3M SD exponential settings.

CFG scale is set between 6 to 7 for the initial generation, which will later be upscaled.

Self-attention guidance integration is enabled for better image quality.

Clip skip is set to two for better detail in the initial image generation.

The generated image is then sent to the image-to-image model for further enhancement.

Dreamshipper turbo is used for the image-to-image process with settings optimized for realism.

The turbo model is found to make the skin look more realistic compared to the initial model.

The final images have mostly okay hands, with some variations in quality.

In-painting can be used for further improvement, but the method covers most use cases for non-professional use.

The video concludes with a summary of the process and an invitation for viewers to try it for themselves.