Run Stable Diffusion 3 Locally! | ComfyUI Tutorial

Markury AI
12 Jun 202403:48

TLDRThis tutorial guides viewers on running Stable Diffusion 3 Medium locally using ComfyUI. The process begins with downloading the necessary files from Hugging Face, including the sd3 medium safe tensors and text encoders. After updating ComfyUI, the models are installed, and the tutorial demonstrates generating an image with a prompt. The video highlights the impressive results and encourages users to address licensing issues with Stability AI, ending with a community call to action.

Takeaways

  • 🌐 Visit Hugging Face to access the Stable Diffusion 3 medium model, which requires filling out a form and agreeing to access the repository.
  • 📁 Download essential files including 'sd3 medium.safetensors', 'clip G clip L', 'T5 XXL', and 'fp16' from the Hugging Face repository.
  • 🔄 Update ComfyUI by navigating to its directory and running the 'update_comfy_ui.bat' script to ensure compatibility with the new models.
  • 📂 Organize downloaded models by placing them in the appropriate folders within the ComfyUI directory, such as the 'clip' and 'checkpoints' folders.
  • 🚀 Prepare to run ComfyUI by executing the 'Nvidia GPU dobat' script to start the application with the necessary GPU support.
  • 🔍 Load the 'sd3 medium.safetensors' checkpoint in ComfyUI to integrate the Stable Diffusion 3 medium model for image generation.
  • 📝 Use natural language prompts in ComfyUI for better response and image generation, as demonstrated by the example prompt about a female character with northern lights-like hair.
  • 🎨 Witness the generation of high-quality images with the Stable Diffusion 3 model, showcasing its impressive capabilities.
  • 📝 Note the licensing issue mentioned in the video; consider opening an issue or contacting Stability AI to address the licensing concerns.
  • 🔧 The video suggests that the community should work together to help update the license for the model's proper use and distribution.
  • 👋 The tutorial concludes with a reminder to enjoy the capabilities of the newly released Stable Diffusion 3 model and to have a great day.

Q & A

  • What is the main topic of the video tutorial?

    -The main topic of the video tutorial is how to use Stable Diffusion 3 Medium and ComfyUI locally.

  • Where should one go to access the Stable Diffusion 3 model?

    -To access the Stable Diffusion 3 model, one should go to Hugging Face and fill out the form to gain access to the repository.

  • What files does the user need to download from Hugging Face for Stable Diffusion 3 Medium?

    -The user needs to download the SD3 Medium safe tensors, text encoders including CLIP G, CLIP L, and T5 XXL, all in fp16 format.

  • What is the purpose of updating ComfyUI before installing new models?

    -Updating ComfyUI ensures that the software is compatible with the new models and provides the latest features and bug fixes.

  • How does one update ComfyUI according to the tutorial?

    -To update ComfyUI, one should go to the ComfyUI directory, navigate to the 'update' folder, and run the 'update_comfy_ui.bat' file.

  • What is the recommended workflow to use with the Stable Diffusion 3 Medium model?

    -The tutorial recommends using the 'basic inference workflow' with the Stable Diffusion 3 Medium model.

  • Where should the downloaded models and checkpoints be placed within the ComfyUI directory structure?

    -The downloaded models should be placed in the 'clip' folder under 'models', and the checkpoints should be placed in the 'checkpoints' folder, preferably in a new 'sd3' folder.

  • What is the significance of the 'Q prompt' in the ComfyUI workflow?

    -The 'Q prompt' is used to input a description or prompt for the model to generate an image based on the provided text.

  • What type of image is generated in the example provided in the script?

    -The example generates an image of a female character with long flowing hair made of ethereal swirling patterns resembling the Northern Lights or Aurora Borealis.

  • What issue is mentioned regarding the licensing of the Stable Diffusion 3 model?

    -The issue mentioned is that the licensing is a bit unclear or 'messed up', and the community is encouraged to open an issue or contact Stability AI to update the license.

  • How does the tutorial describe the generated image quality of the Stable Diffusion 3 model?

    -The tutorial describes the generated image quality as 'really amazing' and expresses excitement about the release of the model's weights for free.

Outlines

00:00

🎨 Introduction to Using Stable Diffusion 3 Medium

The video begins with an introduction to the Stable Diffusion 3 Medium model, which has just been released. The host guides viewers on how to access and use this gated model by visiting Hugging Face, filling out a form, and agreeing to access the repository. The process includes downloading necessary files such as the 'sd3 medium.safetensors', text encoders like 'clip G clip L and T5 xx fp16', and the 'comfy UI workflows'. The host also explains the need to close and update Comfy UI before proceeding with the installation of the new models.

🔄 Updating Comfy UI and Installing Models

This paragraph details the steps to update Comfy UI and install new models. The host instructs viewers to navigate to the Comfy UI directory and run the 'update comfy ui.bat' file to ensure they have the latest version. After updating, the host guides viewers to install the CLIP models by placing the downloaded files into the appropriate folders within the Comfy UI directory. Additionally, a new 'sd3' folder is created for the 'sd3 medium.safetensors' file, which is then added to the checkpoints folder, preparing the system for the next steps.

🚀 Starting Comfy UI with the New Model

The host demonstrates how to start using the new Stable Diffusion 3 Medium model with Comfy UI. After ensuring the Nvidia GPU 'dobat' is running, the host switches to another machine to load the 'sd3 medium.safetensors' checkpoint and the CLIP files. The workflow is set up to use a natural language prompt, which is different from the traditional 'boru tag' style, and the host uses an example prompt provided by the model developers to generate an image of a female character with hair resembling the northern lights. The host expresses excitement about the model's capabilities and encourages the community to help address licensing issues by opening issues or contacting Stability AI.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 refers to the third iteration of a generative model that uses a diffusion process to create images from textual descriptions. It is a significant update in the field of AI-generated art, providing more detailed and realistic image outputs. In the video, the presenter is excited to demonstrate how to use this new model, indicating its importance and novelty in the context of AI art generation.

💡ComfyUI

ComfyUI is a user interface that simplifies the process of interacting with AI models like Stable Diffusion. It is designed to be user-friendly, allowing users to run AI models locally on their machines. The script mentions updating ComfyUI, which is crucial for integrating the new Stable Diffusion 3 model and ensuring compatibility with the latest features.

💡Hugging Face

Hugging Face is a platform that hosts a wide range of AI models, including Stable Diffusion 3. It requires users to fill out a form to access gated models, emphasizing the exclusive nature of some AI technologies. In the script, the presenter guides viewers through the process of accessing the Stable Diffusion 3 model from Hugging Face, highlighting the platform's role in distributing AI models.

💡Text Encoders

Text encoders are components of AI models that convert textual descriptions into a format that can be understood by the model. In the context of the video, the presenter downloads CLIP (Contrastive Language-Image Pretraining) encoders, which are essential for the Stable Diffusion 3 model to interpret text prompts and generate images accordingly.

💡CLIP Models

CLIP models are neural networks trained on a large corpus of text-image pairs, enabling them to understand the relationship between text and images. The script mentions downloading CLIP models such as CLIP G, CLIP L, and CLIP T5 XXL, which are necessary for the Stable Diffusion 3 model to function properly and generate images from text prompts.

💡Safe Tensors

Safe tensors refer to a specific format of data used by AI models to represent and manipulate information. In the video, the presenter downloads 'sd3 medium safe tensors', which are the core data files needed for the Stable Diffusion 3 model to perform its image generation tasks.

💡Checkpoints

In the context of AI models, checkpoints are snapshots of the model's training progress, including its learned parameters. The script discusses placing the 'sd3 medium safe tensor' file into a checkpoints folder, which is essential for loading the Stable Diffusion 3 model's state so it can generate images.

💡Nvidia GPU

Nvidia GPUs are graphics processing units designed by Nvidia, known for their high-performance capabilities in handling complex computations, such as those required by AI models. The script mentions running 'Nvidia GPU dobat', indicating the use of Nvidia GPUs to accelerate the Stable Diffusion 3 model's image generation process.

💡Workflow

A workflow in the context of AI and software interfaces refers to a sequence of steps or procedures to accomplish a task. The video script mentions a 'basic inference workflow' for using Stable Diffusion 3, which guides users through the process of generating images from text prompts using the model.

💡Q Prompt

In AI-generated art, a 'Q prompt' or 'query prompt' is a text description that guides the model in creating an image. The script provides an example of a Q prompt, describing a female character with hair made of northern lights patterns, demonstrating how users can interact with the Stable Diffusion 3 model to generate specific images.

💡Ethereal

Ethereal refers to something being extremely delicate and light, often associated with a sense of being otherworldly or celestial. In the script, the example Q prompt uses 'ethereal' to describe the swirling patterns in the character's hair, illustrating how adjectives can influence the style and mood of the generated image.

💡Aurora Borealis

Aurora Borealis, also known as the northern lights, is a natural light display in the Earth's sky, predominantly seen in the high-latitude regions. The script uses 'Aurora Borealis' in the Q prompt to inspire the Stable Diffusion 3 model to create an image with patterns resembling this natural phenomenon, showcasing the model's ability to interpret and visualize complex descriptions.

Highlights

Introduction to using Stable Diffusion 3 Medium and ComfyUI.

Accessing the gated model on Hugging Face and filling out the form to gain access.

Downloading required files such as sd3 medium safe tensors and text encoders.

Instructions on updating ComfyUI to the latest version.

Installing CLIP models into the ComfyUI directory.

Creating a new folder for sd3 medium safe tensors in the checkpoints directory.

Starting ComfyUI with the Nvidia GPU dobat.

Loading the checkpoint sd3 medium safe tensors in ComfyUI.

Using the example prompt for generating an image with natural language.

Explanation of the model's response to the prompt featuring a female character with northern lights hair.

Observation of the model's incredible image generation capabilities.

Discussion on the model's licensing issues and the need for community involvement.

Encouragement to open an issue with Stability AI regarding the license.

Final thoughts on the tutorial and a reminder to have a great day.