How to use Stable Diffusion. Automatic1111 Tutorial

Sebastian Kamph
1 Jun 202327:09

TLDRThis video tutorial offers a comprehensive guide on using Stable Diffusion for creating generative AI art. It covers the installation process, selecting and using different models, and the essential features within the interface. The video delves into text-to-image generation, exploring prompts, styles, and advanced settings like sampling methods and CFG scale. It also introduces image-to-image transformation, upscaling, and the use of control nets for consistency, providing tips for achieving high-quality results and recommending workflows for different scenarios.


  • πŸ“Œ Stable Diffusion is a tool for creating generative AI art, with various models and settings to customize the output.
  • πŸ”§ Installation of Stable Diffusion and its extensions was covered in a previous video, which is essential before using the tool.
  • 🎨 The user interface of Stable Diffusion offers different models to choose from, which can be selected through a dropdown menu.
  • πŸ–ΌοΈ The 'Text to Image' tab is the primary function for generating images, where users can input positive and negative prompts.
  • πŸ› οΈ Sampling methods and steps are crucial for the image generation process, with different samplers offering varying levels of detail and speed.
  • 🎨 Styles can be applied to the generated images to modify their appearance, with options like 'Digital Oil Painting' available.
  • πŸ”„ The 'Image to Image' tab allows users to upscale or modify existing images while retaining certain characteristics.
  • πŸ”„ The 'Upscale' function in the 'Extras' tab can enlarge images, but for significant detail enhancement, other methods like 'HighResFix' or 'Image to Image' are recommended.
  • 🎭 The 'In Paint' feature enables users to manually edit parts of an image, adding details or changing elements like turning a shape into a heart.
  • πŸ“ˆ The 'CFG Scale' slider adjusts how closely the generated image adheres to the prompt, with higher values increasing adherence but potentially sacrificing creativity.
  • πŸ”„ The 'HighResFix' button is a quick way to upscale images while adding detail, bypassing the need for manual adjustments.

Q & A

  • What is the main focus of the video?

    -The main focus of the video is to teach viewers how to use Stable Diffusion for creating generative AI art.

  • What is the first step in using Stable Diffusion?

    -The first step is to install Stable Diffusion, which includes checking the previous video for instructions on installing extensions and the first model.

  • What is the significance of the Stable Diffusion checkpoint?

    -The Stable Diffusion checkpoint is the model used for generating images, and users can select different models by using the dropdown menu in the interface.

  • How can users find more styles to use with Stable Diffusion?

    -Users can find additional styles in the video description, which are created by the video creator and their community.

  • What are sampling methods and sampling steps in Stable Diffusion?

    -Sampling methods are tools that turn the prompt and model into an image in a set number of steps, while sampling steps refer to the individual stages the image goes through to be completed.

  • Why is the DPM Plus+ 2m Caris sampler recommended for beginners?

    -The DPM Plus+ 2m Caris sampler is recommended for beginners because it produces good images quickly, typically within 15 to 25 steps, and is a converging sampler, meaning it consistently works towards the same image.

  • What is the purpose of the CFG scale in Stable Diffusion?

    -The CFG scale determines how much the Stable Diffusion model will listen to the prompt. Higher settings force the prompt more, potentially resulting in less creative but more consistent images, while lower settings allow for more creative freedom.

  • How can users upscale their images while maintaining quality?

    -Users can use the High-Res Fix feature, which first generates a low-resolution image and then upscales it, adding more detail in the process. Alternatively, they can use the Image to Image feature with a higher resolution setting and adjust the Denoising strength.

  • What is the role of the Control Net in Stable Diffusion?

    -Control Net allows users to recreate an image by reading from a provided image and generating new images with similar compositions or features based on the chosen model and pre-processor.

  • How can users make changes to specific parts of an image?

    -Users can utilize the In Paints feature to select and modify parts of an image by either masking the original content or introducing new content through the latent noise area.

  • What is the benefit of using the PNG info tab in Stable Diffusion?

    -The PNG info tab allows users to view and reuse all the settings from a previously generated image, making it easier to recreate or modify that image with the exact same parameters.



🎨 Introduction to Stable Diffusion and Generative AI Art

The video begins with an introduction to Stable Diffusion, a tool for creating generative AI art. The speaker guides viewers on how to install the necessary extensions and models, referencing a previous video for detailed installation steps. The focus is on using Stable Diffusion to generate unique AI art, with an emphasis on the importance of a good checkpoint and model selection for achieving quality results.


πŸ› οΈ Understanding Stable Diffusion Interface and Settings

This paragraph delves into the Stable Diffusion user interface, explaining the various models and settings available to users. The speaker clarifies the difference between model numbers and Stable Diffusion versions, and provides guidance on how to select models and adjust settings such as the VAE, Laura, and Hyper Network. The importance of understanding these settings for achieving desired outcomes in generative AI art is emphasized.


πŸ–ŒοΈ Text-to-Image Generation with Stable Diffusion

The speaker introduces the text-to-image feature in Stable Diffusion, which allows users to generate images based on textual prompts. The process involves using positive and negative prompts to guide the image generation. The speaker demonstrates the basic functionality and then explores the use of styles and the impact of the checkpoint model on the quality of the generated images. The importance of using appropriate prompts and understanding the role of the CFG scale in influencing the final image is discussed.


πŸ” Samplers and Image Resolution in Stable Diffusion

This section focuses on the role of samplers in the image generation process within Stable Diffusion. The speaker explains how samplers transform noise into images and the significance of step counts in this transformation. The differences between convergent and non-convergent samplers are highlighted, along with recommendations for which samplers to use for quick and consistent results. The paragraph also touches on the impact of the CFG scale on image quality and the importance of setting it appropriately for different models.


πŸš€ Advanced Workflows for High-Quality Image Generation

The speaker presents advanced techniques for improving the quality and resolution of images generated with Stable Diffusion. The concept of 'highres fix' is introduced as a method for upscaling images while maintaining detail. Two recommended workflows for achieving higher quality images are discussed: using highres fix for a single image or using image-to-image generation for more control over the final composition. The speaker also provides tips on using different settings for batch count and batch size, depending on the user's hardware capabilities and desired output.


🎯 Fine-Tuning and Upscaling Generated Images

The final paragraph covers the fine-tuning of generated images using the 'inpainting' feature and the upscaling of images using various methods. The speaker demonstrates how to make specific changes to an image by using the paint mask and latent noise tools. The process of upscaling images while retaining quality is discussed, with recommendations on the best upscalers to use for different types of images. The paragraph concludes with a summary of the video's content and encourages viewers to explore further resources for learning more about generative AI art.



πŸ’‘Stable Diffusion

Stable Diffusion is a type of generative AI model that specializes in creating images from textual descriptions. It is considered the 'king of generative AI art' and is the primary focus of the video. The script provides a guide on how to install and use Stable Diffusion, including the necessary steps to generate AI art.


In the context of the video, a checkpoint refers to a specific model or version of the Stable Diffusion AI that users can select for image generation. The script mentions different versions like 1.5, 2.0, 2.1, etc., which users can choose from based on their preferences and requirements.


A prompt in the context of the video is a textual description or input provided by the user that guides the Stable Diffusion AI in generating an image. It is a crucial element in the image creation process and can be both positive (what the user wants in the image) and negative (what the user wants to avoid in the image).

πŸ’‘Sampling Method

The sampling method is a technique used by the Stable Diffusion AI to interpret the prompt and model and convert it into an image. It involves a series of steps or iterations that progressively refine the image from a noise state to a more detailed and prompt-aligned output.

πŸ’‘CFG Scale

CFG Scale, or Control Flow Graph Scale, is a parameter in Stable Diffusion that determines how closely the generated image adheres to the prompt. A higher CFG scale makes the AI pay more attention to the prompt, potentially at the cost of creativity, while a lower scale allows for more creative freedom, sometimes at the expense of accuracy.


Upscaling refers to the process of increasing the resolution of an image, typically to enhance its detail and quality. In the context of the video, upscaling is achieved through features like 'Highres Fix' or 'Image to Image', which allow users to generate higher resolution images from lower resolution inputs.

πŸ’‘Control Net

Control Net is a tool or technique used in conjunction with Stable Diffusion to maintain consistency in the generated images. It works by using a reference image to guide the AI in creating new images that are similar in composition or features to the original.

πŸ’‘DenoiSing Strength

DenoiSing Strength, also referred to as D-noising, is a parameter in the image-to-image feature of Stable Diffusion that controls the degree of change introduced to the input image when generating a new image. A higher D-noising value results in more significant changes, while a lower value retains more of the original image's characteristics.


In-Painting is a feature in Stable Diffusion that allows users to manually edit or refine parts of a generated image. This can involve adding or changing elements within the image to achieve a desired outcome, such as transforming a specific part of the image into a heart.


The Extras tab in Stable Diffusion is a feature that provides additional tools for manipulating and enhancing images, such as upscaling and improving image quality. It serves as a repository for various utilities that can be applied to a previously generated image.


Introduction to stable diffusion for creating generative AI art

Explanation of the stable diffusion interface and model selection

Use of positive and negative prompts for image generation

Importance of choosing the right checkpoint for quality images

Application of styles to enhance image generation

Understanding the role of sampling methods and steps in image creation

Recommendation of DPM Plus+ 2m Caris as a reliable sampler

Explanation of the CFG scale and its impact on image adherence to prompts

Adjusting image dimensions and aspect ratios for desired outputs

Use of batch count and batch size for efficient image generation

Introduction to the highres fix feature for upscaling images

Demonstration of image to image workflow for resolution enhancement

Utilizing denoising strength for controlled image alterations

In-depth guide on inpainting for detailed image modifications

Explanation of extras tab for additional image processing

Overview of PNG info tab for revisiting and reusing previous settings