I Ran Stable Diffusion 3 Prompts in Midjourney | SD3 vs. Midjourney Prompt Battle

Lexie AI
31 Mar 202403:50

TLDRThe video script presents a comparison between Stable Diffusion 3 and Mid Journey, two AI art generation tools, based on five different prompts. The prompts range from a badass Elven Archer to a stack of animals, with Stable Diffusion 3 generally producing more accurate and detailed images. The video highlights the strengths and weaknesses of each tool, ultimately favoring Stable Diffusion 3 in the majority of the matchups, while also encouraging viewers to explore the potential of AI in art creation.

Takeaways

  • 🎨 The script discusses a comparison between Stable Diffusion 3 (SD3) and Mid Journey (MJ), two AI art generation models.
  • 🚀 Stable Diffusion 3 is not yet broadly available to the public, but some people have early preview access.
  • 🤖 The video compares five different image prompts processed by SD3 and MJ, highlighting the strengths and weaknesses of each.
  • 🏹 In the 'Faceoff Badass Elven Archer' prompt, SD3 missed a crucial detail (the bow), while MJ had a minor issue (arrow through the thumb).
  • 🦙 The 'Llama Kid' prompt shows SD3's image as more accurate and appealing compared to MJ's less convincing desert setting and character.
  • 👮‍♂️ 'Alien Banana Cop' prompt reveals SD3's creative interpretation with a Xenomorph police officer enjoying bananas in Hawaii, whereas MJ's version lacked the 'cop' aspect.
  • 👩‍🎤 For the 'Let's Go Girl' prompt, SD3 produced a near-perfect anime-style girl, while MJ's text rendering was still poor despite an overall decent image.
  • 🎄 The final prompt, 'Stack of Um Animals', showcased SD3's struggle with the turtle, but MJ managed to create a humorous and memorable scene with a chicken on a dog on a turtle.
  • 🏆 Throughout the video, SD3 generally outperforms MJ, winning most of the matchups based on accuracy and creativity.
  • 📺 The video is part of a series exploring AI-generated art and encourages viewers to like and subscribe for more content.
  • 🎉 The host also promotes their '17 minutes of Sora vids', suggesting more AI-generated content is available and forthcoming.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is a comparison between the results produced by Stable Diffusion 3 (SD3) and Mid Journey, two AI image generation models, based on various prompts.

  • How does the video begin?

    -The video begins with a dramatic introduction, mentioning that the creator has run stable, diffusion 3 prompts in mid journey and found the results to be mindboggling.

  • What is unique about the Stable Diffusion 3?

    -Stable Diffusion 3 is unique because it is not broadly available to the public, and the video creator has obtained prompts from someone with early preview access.

  • What is the first prompt comparison discussed in the video?

    -The first prompt comparison discussed in the video is of a badass Elven Archer, with a description of braided platinum hair, a rune-etched bow, glowing eyes, and aiming at a roaring Dragon.

  • What issue was found with the Stable Diffusion 3's image of the Elven Archer?

    -The issue found with the Stable Diffusion 3's image of the Elven Archer is that the bow is missing, and one of the arrows appears to be the elf's middle finger.

  • Which model performed better for the Elven Archer prompt?

    -Mid Journey version 6 performed better for the Elven Archer prompt, despite the arrow going through the elf's thumb, as it captured more elements of the prompt correctly compared to Stable Diffusion 3.

  • What is the second prompt comparison about?

    -The second prompt comparison is about a digital art picture of a child riding a llama with a bell on its tail through a desert.

  • What critique does the video offer for the Mid Journey version of the 'llama kid' prompt?

    -The critique for the Mid Journey version of the 'llama kid' prompt is that it's hard to tell if the figure is a kid, and it doesn't really look like a desert. The placement of the bell is also considered out of place.

  • Which model won the 'alien banana cop' prompt comparison?

    -Stable Diffusion 3 won the 'alien banana cop' prompt comparison, as it created an image of a Xenomorph police officer enjoying bananas in Hawaii during the golden hour, despite the uncertainty of how terrifying aliens eat bananas.

  • What is the final prompt comparison in the video?

    -The final prompt comparison in the video is a humorous one, depicting a stack of animals with a rooster standing on a cat, which is standing on a dog, which is standing on a mule, which is standing on a turtle.

  • How does the video conclude?

    -The video concludes by encouraging viewers to like and subscribe to the channel, and by highlighting the amusing result of the 'stack of um animals' prompt, particularly the chicken dog turtle image created by Mid Journey.

Outlines

00:00

🎨 Stable Diffusion 3 Art Comparison

The paragraph discusses a video comparing the outputs of Stable Diffusion 3 and Mid Journey, two AI art generation tools. The comparison involves five different prompts, with the first being an Elven Ranger with a missing bow and a humorous detail of a middle finger. Mid Journey's version has an arrow going through the elf's thumb. The second prompt involves a child riding a llama through a desert, with Stable Diffusion 3's image being more accurate to the prompt, while Mid Journey's version lacks the desert setting. The third prompt is about an alien banana cop, where Stable Diffusion 3 creates a more accurate and humorous image, while Mid Journey's version misses the 'cop' aspect. The fourth prompt is an anime girl, with Stable Diffusion 3 delivering a near-perfect image, and Mid Journey improving but still lacking in text rendering. The final prompt humorously describes a stack of animals, with Stable Diffusion 3 having minor issues, but Mid Journey creating a notably better image. The video ends with a call to action to like and subscribe to the channel.

Mindmap

Keywords

💡stable diffusion 3

Stable diffusion 3 is a term used in the video to refer to an advanced AI image generation model. It is capable of creating detailed and complex images based on textual prompts. In the context of the video, it is used to compare with another model, 'mid journey', in generating images from various prompts. The video showcases the results of this model, highlighting its strengths in rendering specific details and adhering closely to the prompts given.

💡prompts

In the context of the video, prompts are the textual descriptions or requests that are input into the AI models to generate specific images. They are the core vocabulary or concepts that guide the AI in creating the visual content. The effectiveness of the AI models is judged based on how well they can interpret and fulfill these prompts.

💡mid journey version 6

Mid journey version 6, often abbreviated as 'MJ', is another AI model discussed in the video. It is compared with stable diffusion 3 to evaluate which model better executes the given prompts. The video critiques the performance of both models based on their ability to generate images that closely match the details specified in the prompts.

💡roaring Dragon

A 'roaring Dragon' is a mythical creature often depicted as powerful and intimidating. In the video, it is part of the prompt for the 'Elfen Ranger' image, where the Elven Ranger is described as aiming at a roaring Dragon. This concept is used to challenge the AI models to create a dynamic and action-packed scene.

💡rune etched bow

A 'rune etched bow' refers to a bow that has mystical symbols or inscriptions carved into it, often associated with magical or enchanted weapons in fantasy settings. In the video, it is a specific detail from the prompt for the 'Elfen Ranger' image, indicating the level of detail and fantasy elements that the AI models are expected to incorporate into their generated images.

💡digital art

Digital art refers to the creation of artistic compositions or designs using digital technology or computer software. In the video, the term is used to describe the type of output that the AI models are generating, which are digital images based on textual prompts.

💡Xenomorph police officer

A 'Xenomorph police officer' is a fictional and creative concept that combines the idea of an extraterrestrial creature from the 'Alien' franchise with the role of a law enforcement officer. In the video, this concept is used as a prompt to challenge the AI models to generate an image that combines these two distinct elements in a humorous and imaginative way.

💡anime style

Anime style refers to a form of animation and art that originates from Japan, characterized by colorful artwork, fantastical themes, and vibrant characters. In the video, the term is used to describe the visual aesthetic that the AI models are tasked with replicating in their generated images.

💡speech bubble

A speech bubble is a graphic element often used in comics, cartoons, and other forms of visual storytelling to indicate spoken words or thoughts of a character. In the video, it is a specific detail included in the prompt for the 'let's go girl' image, demonstrating the AI models' ability to incorporate narrative elements into their visual outputs.

💡stack of um animals

The term 'stack of um animals' refers to a humorous and whimsical concept of different animals piled on top of each other, which is used as a prompt in the video. It challenges the AI models to create a visually amusing and unconventional scene that combines elements of fantasy and surrealism.

💡golden hour

Golden hour refers to a period shortly after sunrise or before sunset when the sunlight is softer, warmer, and often considered ideal for photography and other visual arts. In the video, it is part of the prompt for the 'alien banana cop' image, indicating the desired lighting and mood for the generated scene.

Highlights

Introduction to stable diffusion 3 and its early preview access.

Comparison of stable diffusion 3 and mid journey version 6 in image generation.

Description of the first prompt featuring an Elven Ranger with a unique detail.

Evaluation of the results for the Elven Ranger prompt, with a humorous observation about the bow.

Discussion of the second prompt involving a child riding a llama through a desert.

Critique of the placement of the bell in the desert scene and the geographical accuracy.

Presentation of the third prompt with an alien banana cop in a surreal setting.

Mention of the perplexing way aliens might eat bananas in the generated image.

Analysis of the fourth prompt, which depicts an anime style girl with a live stage setting.

Comment on the improvement in text generation by mid journey, despite some shortcomings.

Description of the final prompt involving a stack of animals, adding humor to the discussion.

Observation of stable diffusion 3's difficulty with the stack of animals image.

Praise for mid journey's creative interpretation of the stack of animals prompt.

Invitation to explore more generated content from open AI and anticipation for future developments.

Closing remarks with a call to like and subscribe for updates on similar content.