Stable Diffusion入门教程,1小时入门AI绘画

10 Sept 202362:11

TLDRIn this informative tutorial, the creator delves into the world of AI art, focusing on the Stable Diffusion tool. They compare Stable Diffusion with MidJourney, highlighting the former's open-source accessibility despite a higher learning curve. The video provides a step-by-step guide on installing the launcher on Windows, detailing hardware requirements and the installation process. It also covers the importance of using NVIDIA GPUs for optimal performance and the necessity of ample disk space. The creator then explains the use of models, including where to download them and how to install and switch between them. The tutorial further explores the intricacies of prompts, including their syntax and the use of plugins like 'prompt all-in-one'. The role of Lora and VAE in refining the art style and color palette is also discussed, with practical advice on their application. The video concludes with tips on high-resolution image generation and the use of seed values for consistent results.


  • 🎨 Stable Diffusion and MidJourney are two prominent AI painting tools with different characteristics.
  • 🚀 MidJourney is beginner-friendly and produces high-quality images but requires payment.
  • 🤖 Stable Diffusion is open-source and free but has a higher learning curve.
  • 💻 For Stable Diffusion, a powerful GPU (NVIDIA recommended) with at least 8GB VRAM is essential, while CPU requirements are minimal.
  • 🧠 A minimum of 16GB RAM and 100GB disk space are advised for optimal performance.
  • 🔍 Downloading and installing the Stable Diffusion launcher involves finding the right resources and following a series of steps.
  • 📚 Understanding the role of models, such as Checkpoints, and how to download, install, and switch between them is crucial for achieving desired styles in AI-generated images.
  • 🌐 Prompts (or 'prompt words') are at the core of Stable Diffusion, guiding the AI in generating images based on specific descriptions.
  • 🔧 The use of positive and negative prompts allows for fine-tuning the generated images to include or exclude certain elements.
  • 🔗 The script introduces the concept of Lora, a method to refine the style of generated images beyond what text prompts can achieve.
  • 🎭 VAE models serve as filters or color adjusters to enhance the visual appeal of the AI-generated images.

Q & A

  • What are the two AI painting tools mentioned in the script?

    -The two AI painting tools mentioned are MidJourney and Stable Diffusion.

  • What is the main difference between MidJourney and Stable Diffusion in terms of user accessibility?

    -MidJourney has a lower entry threshold and produces good images but requires payment, while Stable Diffusion is open-source and free but has a higher entry threshold.

  • What are the system requirements for running Stable Diffusion effectively?

    -For Stable Diffusion, it is recommended to have a GPU from NVIDIA (preferably a 3060 or higher with at least 8 GB of VRAM), 16 GB of RAM or more, and at least 100 GB of disk space.

  • What happens if you use an AMD GPU for Stable Diffusion?

    -If you use an AMD GPU, Stable Diffusion can still run, but the drawing process will be slower and it will use the CPU for computations, which can cause overheating issues.

  • How can you download the Stable Diffusion launcher package?

    -You can download the Stable Diffusion launcher package by searching for 'SD launcher' on Bilibili, finding a video by the author '秋叶', and extracting the download link from the video description or by messaging the author with specific keywords.

  • What is the role of the 'prompt' in Stable Diffusion?

    -The 'prompt' is a core element in Stable Diffusion that defines the subject and style of the generated image. It can include positive and negative keywords to guide the AI in creating the desired output.

  • What are 'Lora' and 'VAE' in the context of Stable Diffusion?

    -Lora is a model that captures a specific style, usually based on a set of images, and VAE is a filter or color adjustment tool that can enhance the color vibrancy of the generated images.

  • How can you install and use Lora in Stable Diffusion?

    -Lora can be downloaded from websites like Liblib or C站, and then placed in the 'models/lora' folder within the Stable Diffusion directory. It can be used by either inserting its information directly into the prompt field or by enabling it as a plugin in the 'Additional networks' section.

  • What is the purpose of the 'CLIP termination layer' parameter in Stable Diffusion?

    -The 'CLIP termination layer' parameter affects the interpretation of the prompt by the AI. A value of 2 is generally recommended as it provides a good balance between interpretability and image quality.

  • How can you improve the resolution of the generated images without causing image degradation?

    -You can use the 'High-resolution repair' feature with a lower base resolution (like 540x960) and an upscaling factor of 2. This method helps avoid common issues like image degradation while enhancing the resolution.

  • What is the significance of the 'Seed' value in Stable Diffusion?

    -The 'Seed' value determines the randomness of the image generation. Locking a specific Seed value allows you to recreate the same image with different resolutions or other modifications without altering the core image content.



🎨 Introduction to AI Art and Stable Diffusion

The speaker discusses their recent focus on learning AI art, specifically highlighting two prominent tools: MidJourney and Stable Diffusion. They contrast MidJourney's low entry barrier and high-quality outputs with its cost, against Stable Diffusion's free, open-source nature and higher entry barrier. The speaker shares their experience learning Stable Diffusion and introduces a tutorial aimed at getting beginners started within an hour. They proceed to detail the hardware requirements for running Stable Diffusion, emphasizing the importance of a powerful GPU and sufficient disk space for storing models and the software.


🔧 Installation and Initial Setup of Stable Diffusion

The speaker guides the audience through the process of downloading and installing the Stable Diffusion launcher, providing specific instructions for locating the installation package on Bilibili and extracting it. They explain the steps to install the launcher on a Windows computer, including running the necessary dependencies and launching the application. The speaker also demonstrates how to check if the installation was successful by looking for the launcher's icon and starting the application, which should automatically open a web UI for further operations.


🖌️ Exploring Stable Diffusion's Features and Model Management

The speaker delves into the features of Stable Diffusion's web UI, including model management and the importance of selecting the right model, referred to as the 'Checkpoint' or 'base model'. They discuss where to download models, how to install them, and how to switch between them using the UI. The speaker recommends specific models for different styles and explains the role of prompts in defining the output, emphasizing the need for precise and effective prompt construction for desired results.


📝 Understanding Prompts and Their Syntax in Stable Diffusion

The speaker provides an in-depth explanation of prompts, their syntax, and their critical role in Stable Diffusion. They cover positive and negative prompts, the use of full sentences versus individual words, and the concept of 'prompt weight' through the use of parentheses and colons for scaling. The speaker introduces a plugin called 'prompt all-in-one' to assist with prompt construction, including automatic translation and the ability to save and load presets for efficiency.


🌟 Advanced Prompt Techniques and Features

The speaker continues the discussion on prompts, introducing advanced techniques such as prompt plugins and other related features within the Stable Diffusion UI. They explain the functionality of various UI elements, including the ability to copy generation data from example images, the use of presets, and the importance of understanding the visual representation of prompts. The speaker also touches on the concept of 'Lora' as a way to convey complex visual styles directly to the AI.


👗 Demonstrating Lora's Impact on AI Art

The speaker demonstrates the practical application of Lora by creating AI art that reflects specific styles, such as traditional Chinese clothing and accessories. They explain the process of selecting a Lora style, adjusting its weight, and combining it with prompts to generate images. The speaker also discusses the importance of 'trigger words' in activating certain elements within a Lora and the impact of Lora weight on the final output.


🎨 Fine-Tuning with VAE and Additional Parameters

The speaker introduces VAE (Variational Autoencoder) as a tool for color enhancement in AI-generated images, providing a visual comparison to illustrate its effects. They discuss the importance of following model authors' recommendations regarding the use of VAE and detail the process of downloading and installing VAE models. The speaker also touches on other parameters such as CLIP termination layer, iteration steps, and high-resolution修复, offering practical advice and observations from their experiences.


🔄 Regenerating and Saving Artwork

The speaker concludes the tutorial by explaining how to regenerate images with the same characteristics by locking the random seed number and using high-resolution修复 to enhance image quality. They also demonstrate how to revert to a random generation mode by resetting the seed number and discuss the importance of saving generated images to avoid losing them. The speaker provides a final overview of the Stable Diffusion interface and its functionalities, wrapping up the introductory course.



💡AI Painting

AI Painting refers to the use of artificial intelligence to create visual art. In the context of the video, it specifically refers to the use of AI tools like MidJourney and Stable Diffusion for generating digital images. These tools utilize machine learning models to interpret user inputs, such as text prompts, and produce corresponding images.


MidJourney is an AI painting tool known for its low entry barrier and high-quality image output. However, it is a paid service, which contrasts with the open-source and free nature of Stable Diffusion. It is one of the two AI painting products mentioned in the video and is used as a point of comparison to illustrate the different options available to users.

💡Stable Diffusion

Stable Diffusion is an open-source AI painting tool that allows users to generate images from text prompts. It is characterized by its high entry barrier and the requirement of specific hardware configurations, such as NVIDIA GPUs with at least 8 GB of VRAM. The video provides a tutorial on how to install and use Stable Diffusion, making it accessible to viewers.


GPU stands for Graphics Processing Unit, which is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. In the context of the video, a GPU is essential for running Stable Diffusion, with a recommendation for an NVIDIA GPU with at least 8 GB of VRAM for optimal performance.


In the context of AI and machine learning, a Checkpoint refers to a point during the training process where the model's state is saved. This saved state can then be used to resume training or to infer on new data without starting from scratch. In the video, the term is used to describe a type of model file for Stable Diffusion that users can download and use to generate images with specific styles or characteristics.


In the context of AI painting tools like Stable Diffusion, a Prompt is a text input provided by the user that guides the AI in generating an image. Prompts can be simple descriptions, phrases, or a combination of words that the AI interprets to create visual content. The effectiveness of the prompt directly influences the output image.


WebUI stands for Web User Interface, which refers to the visual and interactive part of a web application that users can navigate and use to interact with the system. In the context of the video, WebUI is the interface through which users interact with the Stable Diffusion AI painting tool, inputting prompts and adjusting settings to generate images.


Lora is a term used in the context of AI painting to refer to a set of images that define a specific style. These image sets are used to train the AI to recognize and reproduce the style when generating new images. Lora files are additional network files that can be loaded into the AI painting tool to influence the style of the generated images.


VAE stands for Variational Autoencoder, which is a type of generative model used in machine learning and AI. In the context of AI painting, VAE models are used as filters or color adjustments to modify the color palette and overall visual aesthetic of the generated images, making them more vibrant or stylistically consistent.

💡High-Resolution Repair

High-Resolution Repair is a feature in AI painting tools that allows users to upscale the resolution of generated images while maintaining or improving the quality and detail. This process is particularly useful for creating images suitable for high-definition displays or large-format printing.

💡Prompt Weight

Prompt Weight refers to the influence a particular prompt has on the generation of an AI painting. By adjusting the weight, users can control the importance of certain elements in the final image. Higher weights make the AI pay more attention to those prompts, while lower weights give the AI more freedom to interpret the prompt creatively.


The speaker discusses two prominent AI painting tools, MidJourney and Stable Diffusion, with MidJourney being user-friendly but paid, while Stable Diffusion is open-source and free but has a higher learning curve.

The speaker provides a one-hour tutorial on getting started with Stable Diffusion, ensuring that viewers can essentially become beginners in the software after watching.

For Stable Diffusion, the speaker recommends a computer configuration with at least 16GB of RAM and an NVIDIA GPU with 8GB VRAM or higher, specifically suggesting models above the 3060 series.

The speaker warns against using AMD GPUs with Stable Diffusion, as it can cause slow rendering and overheating issues with the CPU.


The speaker shares a personal电脑配置清单 that cost around 6,000 RMB and can generate a 512x512 image in about 3 seconds.

A detailed guide on downloading and installing the Stable Diffusion launcher is provided, including specific steps for Windows users.

The speaker explains how to download and install models for Stable Diffusion from recommended websites and how to switch between them within the WebUI interface.

The concept of prompts (正负向提示词) in Stable Diffusion is introduced, with examples of how to use them to generate desired and avoid undesired elements in the AI-generated images.

The speaker discusses the importance of prompt syntax in Stable Diffusion, including the use of parentheses, brackets, and colons to adjust the weight of prompts.

The use of prompt plugins, such as 'prompt all-in-one', is demonstrated to enhance the ease of inputting and organizing prompts for Stable Diffusion.

The speaker introduces Lora (Lora原生和lora插件) and its role in capturing the style of a set of images to apply to AI-generated content, overcoming the limitations of textual prompts.

The process of downloading and applying VAE (变分自编码器) models to enhance the color and vibrancy of generated images is explained, with recommendations on when to use VAE based on the base model.

The speaker provides insights on various parameters and features in Stable Diffusion, such as CLIP termination layer, iteration steps, and DPM+2M karas method, with practical advice on their usage.

The importance of aspect ratio and resolution in generating high-quality images with Stable Diffusion is discussed, along with techniques to avoid common issues like '崩图'.

The tutorial concludes with tips on how to save and download generated images, ensuring that users can preserve their creations.