Stable Diffusion Animation Use SDXL Lightning And AnimateDiff In ComfyUI

Future Thinker @Benji
7 Mar 202422:42

TLDRThis tutorial video guides viewers through an improved workflow for creating animations using Stable Diffusion with SDXL Lightning and AnimateDiff in ComfyUI. The presenter details the process of loading and resizing video, selecting appropriate checkpoints and control net models, and setting up advanced control net custom nodes. The video also covers the creation of conditioning groups for text prompts and the use of the AI pre-processor for different pre-processor types. The workflow includes the use of Juggernaut XL as an SDXL model checkpoint and the integration of HS XL temporal motion models. The presenter emphasizes the importance of selecting the correct control net models and provides tips for enhancing the animation's quality through detailer groups and denoising. The video concludes with a demonstration of the workflow using a hand dance video, showcasing the improved synchronization and reduced noise in the final animation.


  • πŸ“ˆ **Workflow Improvement**: The script discusses an improved workflow for using SDXL Lightning with AnimateDiff and HS XL temporal motion models, addressing previous performance issues.
  • πŸ” **Community Collaboration**: The improvement was made possible through ideas shared by the AI community on Discord, highlighting the importance of collaboration.
  • πŸ“Ή **Video and Image Processing**: The workflow involves loading and resizing videos and images, which are fundamental steps for creating animations.
  • πŸ”— **Checkpoints and Models**: It's essential to load checkpoint models with noise and use custom nodes for the SDXL Lightning process.
  • 🎨 **Styling with Text Prompts**: The use of text prompts and conditioning groups helps in controlling the style and content of the animations.
  • πŸ” **Advanced Control Net**: An advanced control net is used for preprocessing images for the control net models, which is crucial for the animation's quality.
  • πŸ“Š **Resolution Management**: Pixel Perfect resolutions are used to maintain accurate width and height of image frames throughout the workflow.
  • πŸ”„ **Iterative Enhancement**: The script describes a two-step sampling process with detailers to reduce noise and enhance specific parts of the animation, like hands and faces.
  • πŸ› οΈ **Technical Considerations**: There's a caution to use the correct type of control net models compatible with SDXL, avoiding SD 1.5 training models.
  • πŸ“ˆ **Sampling and Scheduling**: The script emphasizes the need for specific sampling methods and schedulers, such as DPMP 2M and Simple, to achieve optimal results with SDXL Lightning.
  • πŸ“ **Workflow Organization**: The importance of organizing and aligning the workflow for better visualization and to avoid confusion is highlighted.

Q & A

  • What is the main focus of the tutorial video?

    -The main focus of the tutorial video is to demonstrate how to use the stable diffusion animation with SDXL Lightning and AnimateDiff in ComfyUI.

  • What was the issue with the previous workflow?

    -The previous workflow did not perform well in detail and had some issues with the quality of the output.

  • What is the role of the AI community in the development of this workflow?

    -The AI community on Discord provided ideas and collaborated to build and improve the workflow.

  • What are the basic requirements for creating animations with this workflow?

    -The basic requirements include loading a video, upscaling or resizing the image, and using checkpoint models for SDXL Lightning.

  • What is the significance of using the correct control net models?

    -Using the correct SDXL type control net models is crucial for the workflow to function properly, as SD 1.5 training models are not compatible.

  • How does the video guide the user through the process of setting up the workflow?

    -The video provides a step-by-step guide, starting from an empty workflow, loading necessary components, and connecting them in a specific order to create the desired animation.

  • What is the purpose of the 'conditioning groups' in the workflow?

    -The 'conditioning groups' contain the text prompts (positive and negative) and control net settings, which are essential for guiding the AI to generate the desired animation style.

  • null


  • How does the video help users troubleshoot potential issues?

    -The video identifies common issues such as incorrect model selection and provides solutions, such as ensuring the correct type of control net models are used.

  • What is the importance of the 'case sampler' in the workflow?

    -The 'case sampler' is a key component that connects the positive and negative conditions to the sampling process, which is essential for generating the final animation.

  • How does the video address the synchronization of frame rates in animations?

    -The video suggests adjusting the frame rate to match the desired output, as seen when the presenter lowers the frame rate to 16 for better synchronization.

  • What are the steps taken to enhance the quality of the animation?

    -The presenter uses detailer groups to enhance specific parts of the animation, such as the face and hands, and suggests using denoising to improve the overall image quality.

  • How does the video encourage community involvement and collaboration?

    -The video encourages viewers to join the Discord group for discussions, brainstorming, and to be part of the AI community, emphasizing a welcoming environment for those who contribute positively.



πŸ”§ Introduction and Workflow Setup

The video begins with an introduction to the improved animate,diff workflow for the stable diffusion model, sdxl lightning. The presenter shares that the previous workflow didn't perform well, but thanks to the AI community on Discord, they have a new approach. The workflow involves loading a video, resizing images, and using custom nodes with the Juggernaut XL model. The presenter also explains the process of connecting clip layers with text prompts and creating conditioning groups for text prompts and control net.


πŸ“ˆ Advanced Control Net and Pre-Processing

The second paragraph delves into the specifics of setting up the advanced control net and pre-processors. The presenter discusses the use of the AI pre-processor for selecting different types of pre-processors and connecting the resized image to the pre-processors. The Pixel Perfect resolutions are used to ensure accurate image dimensions. The video also covers the process of duplicating control net models and using video combined for output, while avoiding saving unnecessary control net outputs.


🎨 Animation and Style Adaptation

The third paragraph focuses on the animation and style adaptation process. The presenter explains the use of evolve sampling and Gen 2 animated custom notes. They also discuss the importance of selecting the correct motion model, specifically the HS XL temporal motions model, for compatibility with sdxl lightning. The presenter then details the use of the IP adapter for style transfer without text prompts, and the process of connecting the models and setting up the image for clip visions.


🧩 Post-Processing and Final Touches

The fourth paragraph describes the post-processing steps to refine the animation. The presenter talks about using detailer groups to enhance image quality, focusing on the character's hands and face. They also mention creating second sampling groups for further detail enhancement and noise reduction. The presenter emphasizes the importance of aligning the workflow for clarity and using video combined for the final output. They also share their experience with a hand dance video and how they adjusted settings to synchronize frame rate and reduce noise.


πŸ“ Conclusion and Community Engagement

In the final paragraph, the presenter concludes the tutorial by summarizing the workflow and encouraging viewers to join their Discord group for further discussions and brainstorming. They highlight the community's role in refining the workflow and share their plans to post progress updates. The presenter also discusses the output results from the detailers, showing the improvement from the first to the second sampling and the final enhancement of the hands. They invite viewers to ask questions on Patreon and Discord and express their intention to continue sharing their work in future videos.



πŸ’‘Stable Diffusion

Stable Diffusion refers to a type of machine learning model used for generating images from textual descriptions. In the context of the video, it is the core technology being used to create animations, with 'sdxl lightning' being a specific variant or enhancement of this technology.


AnimateDiff is a workflow or process mentioned in the video that is used in conjunction with Stable Diffusion to create animations. It seems to be a method or tool that helps in animating the generated images, although the video suggests it had some performance issues in detail.

πŸ’‘SDXL Lightning

SDXL Lightning appears to be a specific model or version of Stable Diffusion used for generating higher quality images or animations. The video discusses how it has been improved to work effectively with AnimateDiff.


In machine learning, checkpoints are saved states of a model during training which can be loaded later to continue training or to use the model for inference. In the video, loading checkpoints is a step in setting up the workflow to use specific models for image and animation generation.

πŸ’‘Custom Nodes

Custom nodes are user-defined components or modules within a workflow that perform specific tasks. The video mentions using custom nodes for various steps in the animation creation process, such as loading images, processing them, and generating the final output.

πŸ’‘Text Prompt

A text prompt is a textual description that guides the Stable Diffusion model to generate images or animations that match the description. The video discusses using both positive (desired outcome) and negative (unwanted features) text prompts to refine the generation process.

πŸ’‘Control Net

Control Net refers to a component in the workflow that seems to manage the style and structure of the animations. It is mentioned in the context of using specific models and settings to control the output of the animations.


Pre-processors are tools or functions that prepare or format data before it is used by a model. In the video, an AI pre-processor is used to process images for the Control Net models, which is a crucial step in the animation pipeline.

πŸ’‘IP Adapter

The IP Adapter, as mentioned in the video, seems to be a tool or method used to adapt or modify the style of images or animations. It is used without text prompts, implying that it can stylize content based on an input image.

πŸ’‘Case Sampler

A Case Sampler is likely a component of the workflow that handles the sampling or generation process of the animations. It is discussed in the context of connecting various conditions and models to produce the final output.

πŸ’‘VAE Decode

VAE stands for Variational Autoencoder, a type of neural network that can learn to compress and reproduce data. In the video, VAE Decode is a step used to transform the latent representations back into image data as part of generating the animations.


The tutorial introduces an improved workflow for using SDXL Lightning with AnimateDiff and HS XL temporal motion models.

The workflow has been optimized to perform better in detail, thanks to contributions from the AI community on Discord.

The video demonstrates how to set up the workflow from an empty state, including loading videos and upscaling images.

Juggernaut XL is highlighted as a recommended checkpoint model for the SDXL Lightning.

The process involves creating conditioning groups for text prompts and control net integration.

The use of an AI pre-processor for different types of pre-processing is explained.

Pixel Perfect resolutions are utilized to maintain accurate image dimensions throughout the workflow.

The importance of using the correct type of control net models for SDXL is emphasized.

The video covers how to connect the conditioning steps with the case sampler for the first stage of sampling.

Animated control groups are set up with evolve sampling and Gen 2 animated custom nodes.

Loop uniform context options are selected for compatibility with SDXL Lightning.

An IP adapter is used to stylize animations without the need for text prompts.

The video explains how to connect the models' outputs to the control net and animated sampling.

Motion models are loaded for the animated groups, with HS XL temporal motions model being tested and confirmed for SDXL Lightning.

The case sampler output requires a VAE decode, using a VAE encode from resized image frames.

The video demonstrates how to align and organize the workflow for better clarity and aesthetics.

A video combiner is used to gather all image frames and compile them into a video output.

The tutorial shows how to test the workflow step by step, adjusting settings as needed for synchronization and noise reduction.

Detailer groups are introduced to enhance image quality and clean up details like faces and hands.

The process of creating second sampling groups for further detail enhancement and noise reduction is explained.

The video concludes with a comparison of the output from different stages of the workflow, showcasing the improvements made.