SD3 Medium Base Model in ComfyUI: Not as Good as Expected – Better to Wait for Fine-Tuned Versions
TLDRStability AI's SD3, initially met with high expectations, has faced setbacks including leadership changes and financial struggles. Despite these, it was released as scheduled, featuring photorealism and improved text generation capabilities. The SD3 Medium model, requiring additional clip downloads, shows promise but has flaws, particularly in generating human figures. The community awaits fine-tuned versions for better performance, with the future hinging on third-party model adoption.
Takeaways
- 😀 Stability AI announced the release of SD3, a new major version expected to be widely used.
- 😔 The company faced leadership changes and financial difficulties, leading to concerns about SD3's future.
- 📅 SD3 was officially open-sourced and released on June 12th, as scheduled.
- 🖼️ SD3 showcases excellent photorealistic effects and adherence to complex prompts.
- 📝 Improvements in text generation are evident, with no artifacts or spelling errors in the examples provided.
- 🔧 The new architecture used by SD3 is the multimodal diffusion Transformer (DIT), which contributes to its advantages.
- 🔗 The official recommendation for using SD3 is through ComfyUI.
- 📚 Three model files were released, with the smallest being SD3 Medium at 4.34 GB, requiring separate CLIP downloads for ComfyUI.
- 💻 Users need to upgrade ComfyUI to the latest version for SD3 support.
- 🐑 SD3 demonstrates good understanding of spatial relationships and text prompts, as shown in the sheep with a yellow hat example.
- 😞 However, SD3 has flaws, particularly in generating human figures, which has been a point of complaint.
- 🔮 The future of SD3 depends on the adoption of third-party models and tools like ControlNet, with the hope for fine-tuned versions soon.
Q & A
What was the initial anticipation for SD3 based on previous versions?
-The initial anticipation for SD3 was that it could be another major version like SD 1.5 and SDXL, expected to be widely used and highly anticipated.
What challenges did Stability AI face leading up to the release of SD3?
-Stability AI faced several challenges including the resignation of the company's founder and CEO Emad Moake, the departure of the core research team, and funding difficulties due to their free open-source business model which put the company's financial situation in jeopardy.
When was SD3 officially released by Stability AI?
-SD3 was officially released by Stability AI on June 12th.
What are the notable capabilities showcased in the initial images of SD3?
-The initial images of SD3 showcased its excellent photorealistic effect, adherence to complex prompts involving spatial relationships, compositional elements, actions, and styles, and an evident improvement in text generation without artifacts or spelling errors.
What is the multimodal diffusion Transformer (DIT) and why is it significant for SD3?
-The multimodal diffusion Transformer (DIT) is the new architecture used by SD3. It is significant because it is responsible for the advantages mentioned, such as photorealism and prompt adherence.
How many model files were released for SD3 and what are their sizes?
-Three model files were released for SD3: SD3 medium at 4.34 GB, SD3 uncore mediumcore at 5.97 GB, and the largest package SD3 uncore mediumcore including _ Clips t5x XL fp8.
What is the recommended software to use with the released SD3 model?
-The official recommendation for using the released SD3 model is ComfyUI.
What are the hardware requirements for using SD3 in ComfyUI?
-Users should ensure they have a graphics card with sufficient VRM to handle the model and clips, as the maximum usage was observed to be around 15.2 GB, which is roughly the size of the model plus three clips.
What was the outcome when testing SD3's text generation ability with a specific prompt?
-When testing SD3's text generation ability with the prompt of a sheep with a yellow hat that says 'Mimi', SD3 correctly wrote the nickname on the hat, demonstrating its text generation capability.
What are some of the performance issues reported with SD3?
-There have been complaints about SD3's poor performance in generating human figures, with the results being described as scary or broken, even with different seeds.
What is the future outlook for SD3 and what factors will influence it?
-The future of SD3 depends on the adoption and speed of third-party models, fine-tuning, and developments in control mechanisms like Laura and ControlNet.
Why can't the Pony series model author adapt to SD3?
-The Pony series model author, Asly Har, confirmed that they cannot adapt to SD3 due to license issues.
Outlines
🚀 Launch of Stability AI's SD3 Model
Stability AI faced significant setbacks with the resignation of its CEO and research team, leading to funding issues and doubts about the release of SD3. However, the company overcame these challenges and officially launched SD3 on June 12th as promised. The new model, SD3, offers enhanced photorealism, improved prompt adherence, and better text generation capabilities. It utilizes a multimodal diffusion transformer architecture called DIT, which is responsible for its advanced features. The video script guides viewers through downloading and installing SD3 using Comfy UI, comparing its image quality to MJ, and discussing the hardware requirements. The script also mentions the different model files available, with the smallest being SD3 medium at 4.34 GB, requiring separate clip downloads for use in Comfy UI.
🐑 Testing SD3's Features and Limitations
The video script continues with a hands-on demonstration of SD3's capabilities, including its text generation feature, which successfully generated an image of a sheep with a hat labeled 'Mimi'. However, it also highlights SD3's limitations, particularly in generating human figures, which resulted in images that were not satisfactory even after multiple attempts with different seeds. The script also touches on the complexity of the workflow, such as the use of negative prompts and the integration of CLIP text and code with SD3's three separate prompt fields. Despite its flaws, the video expresses hope for future fine-tuned versions of SD3 and mentions the importance of third-party models, Laura, and control net for its development. The video concludes with a note of caution regarding licensing issues that prevent the adaptation of certain models, such as the Pony series, to SD3.
Mindmap
Keywords
💡SD3
💡ComfyUI
💡Photorealistic effect
💡Prompt adherence
💡Text generation
💡Multimodal diffusion Transformer (DIT)
💡Checkpoints
💡Hardware requirements
💡Fine-tuned versions
💡Third-party models
💡License issues
Highlights
Stability AI announced the release of SD3 in February, expected to be a major version like SD 1.5 and SDXL.
The company faced leadership and team changes, with the CEO stepping down and the core research team resigning.
Funding difficulties arose due to a free open-source business model, putting the company's financial situation in jeopardy.
SD3 was officially released on June 12th as scheduled, despite the challenges.
SD3 Medium Model showcases excellent photorealistic effects with completely photo-level realism.
The model demonstrates prompt adherence, understanding complex prompts involving spatial relationships and compositional elements.
Text generation in SD3 has improved, with no artifacts or spelling errors in generated text.
The new architecture used by SD3 is the multimodal diffusion Transformer (DIT), responsible for its advantages.
Comfy UI is the official recommendation for using the SD3 model.
Three checkpoints of the SD3 model were released, with the smallest being 4.34 GB and requiring separate clip downloads.
The largest model includes all necessary components, making it the 'Supreme full package.'
Comfy UI was updated to support SD3, and users are advised to upgrade for compatibility.
The SD3 model's VM usage peaks at 15.2 GB, roughly the size of the model plus three clips.
Users with low VR graphics cards are advised not to enable T5 for optimal performance.
SD3 has been criticized for its poor performance in generating human figures.
The model shows an understanding of spatial relationships and prompts, as demonstrated in image generation.
SD3's future depends on the adoption and speed of third-party models, such as Laura and ControlNet.
Due to license issues, the author of the Pony series model, Asly Har, confirmed they cannot adapt to SD3.