This new Open Source Model is better than Midjourney or SD3?! | Flux local ComfyUI Install Guide

Endangered AI
3 Aug 202416:30

TLDRThe video discusses the emergence of the Flux model by Black Forest, an open-source alternative to Midjourney and Stable Diffusion 3. It compares Flux's capabilities with other models, highlighting its impressive image generation, particularly in rendering human hands and faces. The tutorial guides viewers on installing Flux on ComfyUI, detailing the process of downloading models and setting up the workflow. The video also showcases the model's performance with different prompts and settings, suggesting Flux's potential in the open-source AI image generation community.

Takeaways

  • 🌐 The release of Black Forest's Flux 1.0 model has been seen as a significant advancement in open-source image generation, rivaling Midjourney and SD3.
  • 📈 Black Forest offers three versions of the Flux model: a non-commercial Dev model, a commercial-ready Schnell model, and a closed-source version accessible via API.
  • 🔍 The Flux model has been praised for its improved capabilities, particularly in generating more realistic and correctly proportioned images compared to previous models.
  • 🚀 The Schnell model, based on lightning, is noted for its ability to produce substantial visual changes with adjustments in the number of steps during image generation.
  • 🔑 The Dev model, despite being non-commercial, shows great potential and is considered by some to be superior to the commercial Schnell model.
  • 📝 The script provides a detailed guide on how to install and run the Flux model on Comfy UI, including the necessary steps to set up the model and text encoder.
  • 🎨 Comparisons between Flux and other models like AA Flow and Colors show Flux's superior performance in generating high-quality images with correct hand depictions.
  • 🔧 The video transcript includes a workflow setup for testing different models with the same prompt to compare their outputs and capabilities.
  • 👍 The Flux model's text encoding capabilities are highlighted, with successful rendering of text within generated images, showcasing the model's understanding of context.
  • 🔄 The script discusses the potential for Flux to integrate with tools like ControlNet and the anticipation for future improvements in AI-generated image quality.

Q & A

  • What is the significance of the newly released open-source model by Black Forest Labs?

    -The newly released open-source model by Black Forest Labs, known as Flux 1.0, is significant because it is being hailed as a superior alternative to Stable Diffusion 3. It addresses issues that were prevalent in previous models and is noted for its high-quality image generation capabilities.

  • What are the different versions of the Flux model released by Black Forest Labs?

    -Black Forest Labs has released three versions of the Flux model: the Dev model, which is non-commercial but can be licensed for use; the Schnell model, which is a commercial-ready, lightning-based model; and a close-source version available via their API.

  • What is the main issue that the Flux model seems to solve compared to Stable Diffusion 3?

    -The Flux model notably solves the problem that Stable Diffusion 3 had with generating images of women on grass, which was a common issue in the previous model's outputs.

  • How does the installation process of the Flux model on Comfy UI differ from other models?

    -The installation process of the Flux model on Comfy UI differs in that the model files are placed in the 'unit' folder instead of the 'checkpoints' or 'stable diffusion' folder. Additionally, the model requires separate loading of the text encoder via the Dual clip loader.

  • What are the recommended steps to set up the Flux model in Comfy UI?

    -To set up the Flux model in Comfy UI, one should download the model from Black Forest Labs' Hugging Face page, place the downloaded files in the appropriate folders within the Comfy UI models directory, and follow the example workflow provided on Comfy UI's GitHub page.

  • Why is the T5 XXL clip model used in conjunction with the Flux model?

    -The T5 XXL clip model is used with the Flux model because it serves as the text encoder, which is necessary for the model to interpret and generate images based on text prompts. It is the same text encoder used with Stable Diffusion 3.

  • What are the differences between the Flux model and the AA Flow model in terms of image generation?

    -The Flux model is noted for its ability to produce better quality images with correct human proportions and fewer issues with details like fingers. It also has a higher success rate in generating accurate images based on text prompts compared to the AA Flow model.

  • How does the Flux model handle the number of steps in image generation?

    -The Flux model, particularly the Schnell version, shows significant changes in the image output when the number of steps is adjusted. This is different from other models where the number of steps typically refines the image without substantial changes.

  • What are the limitations of the Dev model of Flux in terms of commercial use?

    -The Dev model of Flux, while impressive in its image generation capabilities, is non-commercial and does not have commercial terms available. This limits its use for monetization, which could be a drawback for community members looking to build on top of it.

  • How does the Flux model compare to other open-source models in terms of development and potential?

    -The Flux model is considered to be a step ahead of other open-source models like Colors and AA Flow in terms of image quality and capabilities. It represents a significant advancement in the field and has the potential to drive further development in the same way that competition in the large language model space has accelerated progress.

Outlines

00:00

🎪 Open Source Image Generation Models: Flux and Beyond

The paragraph discusses the recent developments in open-source image generation models, highlighting the release of the AA flow model as a significant advancement. It contrasts this with the perceived shortcomings of stable diffusion 3. The emergence of Black Forest, a company formed by the former sdxl team, is noted, along with their release of flow 1.0. The paragraph also delves into the different versions of the model offered by Black Forest, including the non-commercial dev model, the commercial-ready Schnell model, and a close-source version. The author expresses disappointment that the dev model lacks commercial terms but acknowledges the impressive capabilities of both the dev and Schnell models, particularly in addressing issues with generating images of women on grass.

05:00

🛠️ Setting Up Flux Models in Comfy UI

This paragraph provides a step-by-step guide on how to set up the Flux models in Comfy UI. It begins with instructions on downloading the model from Black Forest Labs' Hugging Face page, selecting either the dev or Schnell model. The process involves downloading specific files and placing them in the correct folders within the Comfy UI models directory. The paragraph also explains the need to download the T5 XXL CLIP model due to the non-traditional model loader used by Flux. It details the workflow setup within Comfy UI, including the configuration of various nodes such as the sampler, guide, and sigmas. The author assures that despite the initial complexity, the setup is quite straightforward and that the Comfy UI team has provided example pages and files to facilitate the process.

10:02

🖌️ Comparing Image Generation Outputs Across Models

The paragraph focuses on the comparison of image generation outputs from different models, including Flux, AA flow, and others like colors. The author runs the same prompt and seed across these models to evaluate their performance, particularly in rendering human features like hands and faces. The Flux model is noted for its superior ability to produce realistic hands and faces, with a high success rate. The paragraph also touches on the variations in art style between different versions of AA flow and the potential for future improvements. The author expresses a preference for the aesthetics of AA flow 0.1 over 0.2, despite the latter being a more recent version. The comparison serves to highlight the strengths and weaknesses of each model and the potential for further development in the field.

15:02

🏛️ Exploring Realism and Text Encoding in Flux Models

In this paragraph, the author explores the realism and text encoding capabilities of the Flux models by experimenting with different prompts and settings. The results from Flux are described as highly impressive, with accurate human proportions, clear text, and good facial details. The paragraph also discusses the model's ability to handle various prompts, including a female knight and a female pirate, and how the model's output changes with different seeds and settings. The author notes the model's potential when combined with tools like control net and expresses excitement for future developments in the open-source image generation community. The paragraph concludes with a reflection on the rapid pace of development in the field and the challenge for content creators to keep up with these advancements.

Mindmap

Keywords

💡Open Source Image Generation Models

These are AI models used to generate images based on text prompts and are freely available for the public to use and modify. The video discusses the evolution and competition among such models, particularly in the wake of dissatisfaction with Stable Diffusion 3. The mention of models like AA Flow and Flux 1.0 highlights the active development and innovation in this space.

💡Stable Diffusion 3

Stable Diffusion 3 is a version of an image generation model developed by Stability AI. The video critiques this model for its perceived shortcomings, particularly in handling certain image elements like 'putting women on grass,' and contrasts it with newer models like Flux 1.0, which are seen as more advanced.

💡Flux 1.0

Flux 1.0 is an image generation model developed by Black Forest, a company formed by the former SDXL team. It is presented as a significant advancement over other open-source models, with better handling of complex image elements such as human fingers and more refined outputs overall. The video emphasizes Flux 1.0's superiority and its commercial and non-commercial versions.

💡AA Flow Model

The AA Flow model is another open-source image generation model that was released as an alternative to Stable Diffusion 3. It is discussed in the video as a significant improvement over its predecessors, particularly before the release of Flux 1.0. The video compares it with other models, highlighting its strengths and weaknesses.

💡Black Forest

Black Forest is the company behind the Flux 1.0 model. Comprising the former SDXL team, the company is portrayed as a major player in the open-source image generation community, despite having some commercial interests. The video highlights Black Forest's commitment to the open-source community through the release of different versions of the Flux model.

💡ComfyUI

ComfyUI is a user interface framework that allows users to interact with and utilize various AI models, including those for image generation. The video explains how to install and use the Flux models within ComfyUI, offering a step-by-step guide for setting up the necessary files and configurations.

💡Schnell Model

The Schnell Model is a commercial version of the Flux 1.0 model, designed for ready use in commercial projects. The video contrasts it with the non-commercial 'Dev' model, noting that while the Schnell Model is impressive, it doesn't reach the same level of refinement as the Dev Model.

💡Safe Tensor Files (.sft)

These are the file formats used to store the models and their associated data, which users need to download and place in specific folders to use the models within ComfyUI. The video guides users on how to handle these files when setting up the Flux models.

💡Text Encoder

A text encoder is a component of AI models that converts text prompts into a format that the model can understand and use to generate images. The video discusses the importance of using the correct text encoder, such as the T5 XXL clip, to ensure optimal performance when generating images with Flux and other models.

💡Sampler Custom Advanced

This is a specific type of sampler used within the image generation workflow in ComfyUI. It plays a crucial role in defining how the model generates images based on the input parameters, including the model, conditioning, and noise. The video explains how to set up and use this sampler when working with the Flux models.

Highlights

The release of the AA flow model has been seen as superior to Stable Diffusion 3.

Black Forest, the former sdxl team, released Flow 1.0, which is considered 'Next Level'.

Black Forest offers three versions of the model: Dev (non-commercial), Schnell (commercial-ready), and a closed-source API version.

The Dev model, despite being non-commercial, is highly impressive and may surpass the capabilities of the Schnell model.

The Schnell model is ready for commercial use and comes with clear terms of use.

The model solves the issue of generating images with women on grass, which was a problem for Stable Diffusion 3.

Instructions are provided for installing the model on Comfy UI, including downloading the model files and setting up the workflow.

The T5 XXL CLIP model is recommended for use with the Flux model, similar to its use with Stable Diffusion 3.

A detailed workflow is provided for setting up the Flux model in Comfy UI, including the use of custom nodes.

The Flux model is praised for its ability to generate images with correct human proportions and fewer issues with hands.

Comparisons between Flux, AA flow, and other models show Flux producing higher quality images.

The Dev model is noted for its exceptional quality, despite being non-commercial.

The Schnell model is shown to have unique capabilities, such as changing the number of steps leading to different image outcomes.

The Flux model is tested with various prompts, demonstrating its text encoding capabilities and the ability to handle different art styles.

The video concludes with the presenter's excitement for the future of open-source image generation models and the potential for rapid development.