How to Make AI VIDEOS (with AnimateDiff, Stable Diffusion, ComfyUI. Deepfakes, Runway)

3 Dec 202310:30

TLDRThe video script provides a comprehensive guide on creating AI videos using various technologies such as AnimateDiff, Stable Diffusion, ComfyUI, and Deepfakes. It discusses the distinction between an easy approach using services like Runway and a more complex method involving running a Stable Diffusion instance. The video demonstrates how to use ComfyUI with Stable Diffusion to modify the style of an existing video and generate AI videos. It also explores using Civit AI for pre-trained art styles and introduces Runway for simpler, hosted video generation. The script touches on additional tools like Wav2Lip for syncing audio with video and Replicate for voice cloning. It concludes with a mention of the latest Stable Diffusion XL Turbo model for real-time image generation.


  • 📈 AI videos are a trending topic in tech, with deep fakes and animated videos being particularly popular.
  • 🚀 There are two ways to create AI videos: an easy way using a service like Runway, and a more complex way involving running your own stable diffusion instance.
  • 🖥️ Stable Diffusion is an open-source project that can be used for both simple and complex AI video creation processes.
  • 🌐 Runway offers a cloud-based, fully managed version of stable diffusion, simplifying the process for users.
  • 🎨 Tools like AnimateDiff, Stable Diffusion, and ComfyUI are used to generate AI videos, with ComfyUI being a node-based editor.
  • 📂 The process involves selecting a UI for stable diffusion, loading videos or images, and refining the images and parameters through various nodes.
  • 🔍 Checkpoints are used to style the type of images desired, with different styles available like Disney Pixar cartoon style.
  • 🚀 SDXL models represent a different type of model that may not be compatible with certain styles.
  • 🌟 Civit AI offers pre-trained art styles for video generation, which can be integrated into the workflow.
  • 📹 Runway's Gen 2 feature allows for video generation using text, images, or both, providing an easier alternative for some users.
  • 🎥 For deep fake videos, tools like Wav2Lip can sync lips to a video, making the process straightforward.
  • 🔊 offers voice cloning and text-to-speech generation, useful for creating custom audio tracks for videos.
  • ⚡ Stable Diffusion XL Turbo is a recent advancement that enables real-time text-to-image generation, offering faster processing speeds.

Q & A

  • What are the two primary ways to create AI videos as mentioned in the transcript?

    -The two primary ways to create AI videos mentioned are the easy way, which involves using a service like Runway, and the hard way, which involves running your own stable diffusion instance on your computer.

  • What is AnimateDiff and how is it used in the process of creating AI videos?

    -AnimateDiff is a framework for animating images. It is used in conjunction with stable diffusion, which is a text to image AI generator, to generate AI videos by animating a set of images or an existing video.

  • What is the role of ComfyUI in the AI video generation process?

    -ComfyUI is a node-based editor used for the AI video generation project. It allows for a visual, drag-and-drop interface to manage the workflow and parameters of the images and processes involved in generating AI videos.

  • How does Runway simplify the AI video generation process?

    -Runway simplifies the process by offering a hosted version of stable diffusion. It provides a user interface that allows users to generate videos using text, images, or both without the need to run their own instances or manage complex command line interfaces.

  • What is a checkpoint in the context of stable diffusion?

    -A checkpoint in the context of stable diffusion is a snapshot of a pre-trained model. It is used to style the type of images that the user wants to generate, allowing for different artistic styles to be applied to the generated content.

  • How can Civit AI be used to enhance AI video generation?

    -Civit AI provides a collection of pre-trained art styles that can be used to generate videos. Users can search for specific styles, such as 'dark Sushi mix' for anime styles, and incorporate these styles into their AI video generation process by downloading the model into their workspace.

  • What is the motion brush feature in Runway used for?

    -The motion brush feature in Runway is used to animate specific areas of an image. Users can select the area they want to animate and choose the direction of motion (closer, further, left, or right) to add dynamic elements to their AI-generated videos.

  • How does the 'wav to lip' tool work in creating deepfake videos?

    -The 'wav to lip' tool works by syncing the lips in a video to an uploaded voice sample. It is a plug-and-play tool that allows users to create deepfake videos where the subject's lip movements match the provided audio track.

  • What is the purpose of the Replicate tool mentioned in the transcript?

    -The Replicate tool is used for cloning voices and generating speech from text. It allows users to input text, upload a voice sample, and then generate an audio file with the cloned voice speaking the provided text.

  • What is the latest development in stable diffusion technology mentioned in the transcript?

    -The latest development mentioned is stable diffusion XL turbo, which is a real-time text to image generation model. It improves upon previous models by providing faster and more accurate image generation.

  • How does the user interface of AI tools impact the creative process?

    -The user interface of AI tools greatly impacts the creative process by making it more accessible and easier to use, especially for creative types. A well-designed UI allows for quicker previews, easier manipulation of styles and parameters, and a more intuitive workflow for generating AI art and videos.

  • What are some additional tools and services mentioned for creating AI videos or deepfakes?

    -Additional tools and services mentioned include Dolly, any number of AI image generators for image-to-image generation, and 11 labs for voice AI generation. These tools offer various functionalities such as animating photographs, creating subtitles, and generating voiceovers for videos.



🎬 Introduction to AI Video Generation

The video script introduces AI video generation as a hot trend in technology. It discusses the process of creating animated videos and text-to-video content. The speaker shares their experience with AI art and guides viewers on how to make their own videos using AI. Two methods are presented: an easy way using a service like Runway ML or a more complex approach involving running a stable diffusion instance on one's own computer. The speaker also mentions using a hosted version of stable diffusion and various tools like Animate Div, Comfy UI, and the significance of checkpoints in styling images.


🖼️ Exploring AI Art Styles and Video Generation

The second paragraph delves into the process of using AI to stylize and generate videos. It covers the use of Civit AI's pre-trained art styles and how to integrate them with Runway ML's hosted version of stable diffusion. The speaker demonstrates how to use Runway's Gen 2 feature for generating videos from text and images, and also touches on the motion brush tool for adding camera motion to still images. Additionally, the paragraph mentions other tools for creating deep fake videos and voice cloning, highlighting the ease of use and the importance of a good user interface for creative AI tools.


🚀 Advanced AI Video Generation Techniques

The final paragraph provides a basic primer on advanced AI video and art generation. It emphasizes the ease of starting with tools like Runway ML, which offers various functionalities such as text-to-video generation, video-to-video, and image-to-image generation. The speaker also invites viewers to share other interesting tools or ask questions in the comments. The video concludes with a brief mention of the latest stable diffusion model, SDXL Turbo, which enables real-time image generation, and encourages viewers to explore these advanced workflows on their own.



💡AI Videos

AI Videos refer to videos that are generated or manipulated using artificial intelligence. In the context of the video, AI videos are created using various AI tools and techniques to animate or generate videos from text or existing video footage. It is a hot trend in tech, showcasing the capabilities of AI in creating deep fakes and animated content.


AnimateDiff is a framework mentioned in the video that is used for animating images. It plays a role in the process of creating AI videos by enabling the transformation of static images into animated sequences, which can then be incorporated into the final video output.

💡Stable Diffusion

Stable Diffusion is an open-source AI model used for generating images from text descriptions. It is a key component in the video's discussion on AI video creation, as it serves as the text-to-image AI generator that can produce the initial visuals needed for video generation.


ComfyUI is described as a node-based editor used in the video for managing the complex workflows involved in AI video generation. It allows for a more intuitive and visual way to manipulate and refine the parameters and processes that lead to the final video output.


Deepfakes are synthetic media in which a person's likeness is replaced with another's using AI. The video touches on the use of AI for creating deepfake videos, which are a significant application of AI video generation technology, showcasing how AI can be used to manipulate visual content.

💡Runway ML

Runway ML is a hosted platform for AI model deployment and is mentioned as an easier alternative to running a Stable Diffusion instance on one's own computer. It offers services like text-to-video generation and video-to-video generation, making it a user-friendly option for creating AI videos.


In the context of AI video generation, checkpoints are snapshots of pre-trained models that dictate the style of the generated images. The video discusses how different checkpoints can be chosen to style the type of images desired, such as a Disney Pixar cartoon style.



VAE, or Variational Autoencoder, is a type of generative model used in the video to generate a line model possibly for edge detection or motion analysis. It is part of the complex workflow for creating AI videos, contributing to the generation process by capturing the underlying structure of the data.

💡Civit AI

Civit AI is a website that hosts pre-trained art styles for AI video generation. The video script mentions it as a resource where users can find and utilize different styles, such as 'dark Sushi mix' for anime styles, to generate their own videos.


Replicate is a platform for hosted machine learning models, including a tool for cloning voices and generating speech from text. It is showcased in the video as a means to create custom audio tracks for AI videos by uploading a voice sample and specifying the desired text.

💡Stable Diffusion XL Turbo

Stable Diffusion XL Turbo is an advancement in AI image generation models, offering real-time text-to-image generation. The video highlights it as a fast and efficient way to create images, which can be rapidly altered and regenerated based on user input.


AI videos are a hot trend in tech, with deep fakes and animated videos being a significant part of this movement.

The video provides a primer on how to familiarize oneself with the latest technologies for creating AI videos.

AnimateDiff, Stable Diffusion, ComfyUI, Deepfakes, and Runway are the key technologies discussed for AI video generation.

Stable Diffusion is an open-source project that forms the basis for both easy and hard methods of creating AI videos.

Runway is introduced as an easy-to-use service for creating AI videos without the need for local setup.

ComfyUI is a node-based editor used in conjunction with Stable Diffusion to refine images and parameters.

The process involves loading a video or set of images into the system to generate AI videos.

Checkpoints are used to style the type of images desired in the final AI video output.

Different models, such as Disney Pixar cartoon style, are available to achieve various artistic styles.

Civit AI offers pre-trained art styles for generating videos, including an anime style known as Dark Sushi Mix.

Runway's Gen 2 feature allows for video generation using text, images, or both.

Motion can be added to AI-generated images using Runway's tools, including a motion brush for selecting areas of animation. offers a tool for generating speech from text and cloning voices from MP3 files.

Wav2Lip is a tool that synchronizes lip movements in videos with a provided voice sample.

Stable Diffusion XL Turbo is a recent advancement in real-time text to image generation.

ClipDrop is a sample website where users can experiment with Stable Diffusion XL Turbo's capabilities.

The video concludes with a recommendation of Runway for beginners due to its ease of use and creative potential.