SECRET FREE Stable Diffusion 2.1 TRICK! As Good As MIDJOURNEY!

16 Dec 202217:55

TLDRThe video provides a detailed guide on enhancing the capabilities of the Stable Diffusion 2.1 model to generate high-quality images rivaling those of MidJourney, using a technique called 'textual inversion embeddings'. The host apologizes for previous harsh comments about the model and demonstrates how adding specific keywords to prompts can drastically improve the output. Textual inversion embeddings are small files containing neural network data that can be downloaded or trained by users to customize the model's output. The video explains how to find, download, and apply these embeddings, and even how to train your own. Several recommended embeddings are listed, each producing unique styles when used in prompts. The host also shares a step-by-step process for training new embeddings, from pre-processing images to training and saving the final model. The video concludes with a positive outlook on the potential of Stable Diffusion 2.1, encouraging viewers to experiment with the technique.


  • πŸ“· **Photorealism Boost**: Stable Diffusion 2.1 can generate images comparable to MidJourney when specific prompts are used.
  • πŸ” **Simple Solution**: Adding 'mid-journey CGI underscore animation' to the prompt significantly improves the image quality.
  • βš™οΈ **Textual Inversion Embeddings**: Small files that contain trend data of a neural network part, enhancing the model's performance.
  • πŸ“š **Community Resources**: Textual inversion embeddings have been improved since the 2.0 model, offering better results than previous versions.
  • 🀝 **Combining Embeddings**: Multiple embeddings can be used together, allowing for a wide range of style combinations.
  • 🚫 **Compatibility Note**: Embeddings trained on 1.4 or 1.5 models do not work with 2.0 models and vice versa.
  • πŸ’Ύ **Downloading Embeddings**: Users can find and download embeddings from community sources like and
  • πŸ› οΈ **Training Your Own**: It's possible to train your own embeddings with a powerful graphics card and the right parameters.
  • πŸ” **Finding the Best**: Experimenting with different embeddings and checkpoints can lead to the best results for your needs.
  • 🌐 **Online Communities**: Users like Shadow X Shinigami and conflict X are known for creating high-quality embeddings.
  • 🎨 **Creative Freedom**: The ability to mix and match embeddings opens up a world of creative possibilities for image generation.

Q & A

  • What is the main subject of the video transcript?

    -The main subject of the video transcript is about enhancing the capabilities of the Stable Diffusion 2.1 model using textual inversion embeddings to generate high-quality images similar to those produced by Midjourney.

  • How does adding 'by mid-journey CGI underscore animation' to the prompt improve the image generated by Stable Diffusion 2.1?

    -Adding 'by mid-journey CGI underscore animation' to the prompt leverages the power of textual inversion embeddings to change the style of the generated image, making it closer to the aesthetic of Midjourney, which is known for its high-quality photorealistic images.

  • What are textual inversion embeddings?

    -Textual inversion embeddings are small files, usually a few kilobytes, that contain trend data of a small part of a neural network. They allow users to influence the output of AI models like Stable Diffusion by adding specific keywords to the prompt.

  • Why are the embeddings trained on the 2.0 model considered better than those trained on the 1.4 or 1.5 model?

    -Embeddings trained on the 2.0 model are considered better because the 2.0 model was trained differently compared to the 1.4 or 1.5 models, making these embeddings more effective and resulting in higher quality images when used with the 2.0 model.

  • How can users find and download textual inversion embeddings?

    -Users can find and download textual inversion embeddings from websites like, where community members share models they have trained. Users can filter by model types to find textual inversion models specifically designed for Stable Diffusion 2.0.

  • What are the recommended steps to use a downloaded textual inversion embedding in Stable Diffusion?

    -After downloading the embedding files, users should paste them into the 'embeddings' folder within their Stable Diffusion web UI directory. Upon launching Stable Diffusion, the embeddings will be loaded and ready for use. To use an embedding, users add the corresponding keyword to their prompt and then generate a new image.

  • How can users train their own textual inversion embeddings?

    -Users can train their own embeddings directly from the Stable Diffusion interface by clicking on the 'train' tab, creating a new embedding, and following the steps to pre-process images, set training parameters, and initiate the training process. This requires a powerful graphics card and is similar in complexity to training a model using DreamBooth.

  • What are some of the recommended textual inversion embeddings mentioned in the transcript?

    -Some of the recommended embeddings include Midjourney, Anthro, Remix, CGI Animation, Noling Case, Viking Punk, and V-Ray Render. These embeddings offer a variety of styles and effects that can be mixed and matched to generate unique images.

  • How does the process of training a textual inversion embedding differ from using DreamBooth?

    -Training a textual inversion embedding is more complex and offers more options than using DreamBooth. It requires more precise settings and a deeper understanding of the process, which is why the video suggests leaving detailed instructions for a future video. DreamBooth is generally considered less complex and more straightforward for users.

  • What is the significance of using multiple embeddings at the same time?

    -Using multiple embeddings at the same time allows for a nearly infinite combination of styles, enabling users to create highly unique and varied images. This flexibility and customization are significant advantages of textual inversion embeddings.

  • How can the strength of a keyword be adjusted in the prompt to achieve different results?

    -The strength of a keyword can be adjusted by either decreasing its value to lessen its impact or by adding another keyword on top of the previous embeddings used. This allows for fine-tuning the influence of each embedding on the final image generated.



πŸ“ˆ Enhancing Stable Diffusion 2.1 with Textual Inversion Embeddings

The speaker begins by apologizing for previous harsh comments about the Stable Diffusion 2.1 model and introduces a method to significantly improve its image generation capabilities. By adding specific arguments to the prompt, such as 'mid-journey CGI underscore animation,' the model can produce images comparable to those of mid-journey. This improvement is attributed to 'textual inversion embeddings,' small files containing trend data of a neural network's small part. These embeddings, when added to the prompt, allow for infinite combinations of styles, leading to highly customizable and detailed images. The community has found these embeddings to be particularly effective with the 2.0 model and later versions. The speaker also provides a website,, as a resource for finding and downloading these embeddings.


πŸ–ΌοΈ Using and Training Textual Inversion Embeddings

The paragraph explains how to download and use textual inversion embeddings. It mentions as another source for these embeddings, highlighting two users known for their quality work in this area. The process of downloading embeddings involves cutting and pasting files into the stable diffusion web UI folder. To use the embeddings, one must input a specific keyword associated with the embedding into the prompt. The paragraph also delves into the process of training one's own embeddings using the train tab in stable diffusion, noting the requirement of a powerful graphics card and the time commitment involved. It outlines the steps for creating a new embedding, preprocessing images, and the parameters to use for training, emphasizing the importance of using the correct version of the model for the embedding.


πŸ› οΈ Training Process and Parameters for Textual Inversion Embeddings

This section details the steps for training textual inversion embeddings, including setting up the source and destination directories for image processing, creating flipped copies of images, and using auto-focal crop point for precise cropping. It also covers the use of Bleep for generating captions and the importance of these steps in enhancing the training process. The speaker provides guidance on selecting the embedding, learning rate, batch size, and gradient accumulation steps. They also explain how to use the dataset directory and set parameters for image logging during training. The process concludes with saving the trained embedding and how to use checkpoints for selecting the best training result.


🎨 Exploring Various Embeddings and Their Applications

The speaker shares their personal selection of the best embeddings they have used, which significantly enhance the capabilities of the Stable Diffusion 2.1 model. These include the mid-journey embedding for beautiful artistic results, the anthro embedding for creating anthropomorphic animals, the remix embedding for varied and interesting image outcomes, the CGI animation embedding for Disney-like character enhancements, the noling case embedding for generating images within a case or translucent box, the Viking Punk embedding for a Viking cyberpunk theme, and the v-ray render embedding for a 3D effect. The paragraph concludes with an appreciation for the support from the audience and an invitation to join a Discord server for participation in weekly challenges.



πŸ’‘Stable Diffusion 2.1

Stable Diffusion 2.1 is an AI model used for generating images from textual descriptions. In the video, it's initially criticized but later praised for its improved capabilities when used with specific techniques, becoming a central theme as the host demonstrates how to enhance its performance.

πŸ’‘Textual Inversion Embeddings

Textual Inversion Embeddings are small files containing training data of a neural network. They allow users to influence the style of image generation in AI models like Stable Diffusion 2.1. The video explains how these embeddings can significantly improve the output quality, making it competitive with other models like MidJourney.


MidJourney is referenced as a high-quality image generation model. The video aims to show that with certain tricks, Stable Diffusion 2.1 can produce images of similar quality to MidJourney, which is considered a benchmark for photorealistic image generation.

πŸ’‘Negative Prompts

Negative prompts are additional parameters added to the image generation prompt to exclude certain elements or styles from the output. In the video, they are used to refine the image generation process, creating more tailored results.

πŸ’‘Viking Punk 63

Viking Punk 63 is an example of a specific Textual Inversion Embedding used in the video. It represents a style or theme that, when added to the prompt, influences the AI to generate images in the Viking Punk aesthetic.

πŸ’‘ is a website mentioned in the video where users can find and download various models trained by the community, including Textual Inversion Embeddings. It serves as a resource for enhancing AI image generation capabilities.

πŸ’‘ is another platform referenced for finding Textual Inversion Embeddings. Although the video notes it can be more challenging to find specific embeddings due to tagging inconsistencies, it still provides valuable resources for AI model enhancement.

πŸ’‘Training Embeddings

Training Embeddings involves creating new Textual Inversion Embeddings by training the AI model with a specific set of images and styles. The video outlines the process of training these embeddings, which allows for the creation of personalized image generation styles.


Dreambooth is a method used to train AI models with a specific dataset to generate images in a particular style. While not the main focus of the video, it is mentioned as a comparison to the process of training Textual Inversion Embeddings.

πŸ’‘Anthropomorphic Animals

Anthropomorphic Animals is a style that gives human characteristics to animals. One of the embeddings mentioned in the video, 'Anthro', is used to generate images of such animals, showcasing the versatility of embeddings in creating specific thematic content.

πŸ’‘V-Ray Render

V-Ray Render is an embedding that adds a 3D rendering effect to the generated images, making them appear as if they are from a high-quality video game. It demonstrates the ability of embeddings to introduce advanced visual styles to AI-generated content.


An apology is offered for previous harsh criticism of the Stable Diffusion 2.1 model.

A simple method to significantly improve the 2.1 model's image generation quality is introduced.

Adding 'mid-journey CGI underscore animation' to prompts enhances the image style dramatically.

Textual inversion embeddings are small files that contain trend data of a neural network part.

Embeddings work better with the 2.0 model compared to the older 1.4 or 1.5 models.

Multiple embeddings can be used simultaneously for infinite style combinations.

Embeddings can be downloaded from the community or trained personally. is recommended for finding and downloading trained models. is another source for embeddings, though tagging can make them hard to find.

Specific users known for creating high-quality embeddings are mentioned.

A step-by-step guide on how to download and use embeddings with Stable Diffusion is provided.

The process of training one's own embeddings directly from Stable Diffusion is outlined.

A detailed explanation of the training parameters and process for creating an embedding is given.

The importance of using specific keywords for different embeddings is emphasized.

Several recommended embeddings are listed with their unique features and keywords.

The 'mid-journey' embedding produces high-quality images comparable to Midjourney's outputs.

The 'Viking Punk' and 'v-ray render' embeddings offer unique styles for image generation.

The video concludes by acknowledging the potential of Stable Diffusion 2.1 with the right embeddings.