InvokeAI 3.4 Release - LCM LoRAs, Multi-Image IP Adapter, SD1.5 High Res, and more

Invoke
22 Nov 202315:34

TLDRThe video discusses the release of version 3.4, highlighting new features such as the LCM scheduler for image generation, high-resolution fixes, and the ability to use control nets and TOI adapters simultaneously. It also introduces multi-image IP adapters for blending concepts and notes contributions from the community, including language translations and bug fixes.

Takeaways

  • ๐Ÿš€ Introduction of LCM (Latent Consistency Model) for optimizing diffusion process with a new scheduler, reducing steps needed to generate images.
  • ๐Ÿ“ท Quality trade-off with LCM: While more efficient, there's a slight loss in image detail compared to non-LCM generated images.
  • ๐ŸŒ LCM Laura can be downloaded from the Hugging Face repository and works with both SDXL and SD15 models.
  • ๐Ÿ”„ Changes to CFG scale affect adherence to the prompt and image saturation, with recommended values staying in lower ranges.
  • ๐ŸŒŸ Return of high-resolution fix feature for SD 1.5 models, enhancing image quality without repeating patterns.
  • ๐ŸŽจ Control net and TOI adapter features are now compatible, allowing for more complex and nuanced image generation.
  • ๐ŸŒˆ Multi-image IP adapters introduced for blending different concepts into a single image, enhancing creativity in image generation.
  • ๐Ÿ”ง Smaller features include recallable VAE metadata, RGBA value fields in the color picker, and numerous bug fixes and translations.
  • โš™๏ธ Backend updates have improved efficiency in various engine functions and text encoder loading times.
  • ๐Ÿ”œ Future updates are teased, encouraging users to stay tuned and join the community on Discord.
  • ๐Ÿ’ก The importance of community contributions is highlighted, with thanks given to various contributors for their work on the release.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is the release of version 3.4 and its new features, particularly focusing on the LCM technique and how it optimizes the diffusion process.

  • What does LCM stand for and what does it do?

    -LCM stands for Latent Consistency Modeling. It is a new technique used to optimize and make the diffusion process more efficient, reducing the number of steps needed to generate an image.

  • What is the LCM scheduler and how does it affect image generation?

    -The LCM scheduler is a new component introduced in version 3.4 that works with the LCM technique. It helps to make the image generation process extremely efficient but may result in some loss of detail compared to non-LCM generated images.

  • How does the video demonstrate the difference in quality between regular generation and LCM generation?

    -The video demonstrates this by generating four images using a normal generation process and then four more images using the LCM technique with adjusted settings. The comparison shows that while LCM generation is faster, it may lose some of the detailed elements of the image.

  • What is the significance of the CFG scale in LCM?

    -The CFG scale in LCM affects the adherence to the prompt and the quality of the generated image. Higher values increase adherence but may lead to saturation and quality adjustments, so it is recommended to stay in the lower ranges for optimal results.

  • What high-resolution fix feature is introduced in version 3.4?

    -The high-resolution fix feature allows for increasing the size of an original generation from the model's original training size of 512 x 512 to a larger width and height of the user's choice. It uses a two-step process involving upscaling and D noising at the higher resolution.

  • How do the control net feature and the T toi adapter feature work together in version 3.4?

    -In version 3.4, the control net feature and the T toi adapter feature are no longer mutually exclusive, meaning they can be used simultaneously on the same generation. This allows for more versatility and control over the image generation process.

  • What is the T toi color adapter mentioned in the video?

    -The T toi color adapter is a feature that processes the color of the generated image. It can be used to adjust the color of the image based on a threshold and a selected color, allowing for creative control over the final output.

  • What new nodes have been added to the workflow editor for advanced users?

    -The workflow editor now includes multi-image IP adapters, which allow users to add multiple IP adapters and blend different concepts together by adjusting the weights. This feature is referred to as instant lauras in the community.

  • How does the multi-image IP adapter feature work?

    -The multi-image IP adapter feature enables users to pass in multiple images of the same concept into the same IP adapter. It tries to average the general gist of the concept and blend that into the image, allowing for a more nuanced and detailed generation process.

  • What are some other updates and improvements included in version 3.4?

    -Other updates in version 3.4 include speed increases for lauras and other text encoder loading times, backend updates for more efficient functions, and the addition of new language translations, with Dutch, Italian, and Chinese being almost fully complete.

Outlines

00:00

๐Ÿš€ Introduction to Release 3.4 and LCM Scheduler

The video begins with an introduction to the delayed release of version 3.4 and an overview of the numerous features packed into this update. The first feature discussed is the Latent Consistency Models (LCM), a new technique for optimizing the diffusion process using the LCM scheduler. This scheduler reduces the steps needed to generate an image, allowing for efficient generation of high-quality images seen recently on the internet. The video then demonstrates the quality of a model before and after the application of the LCM scheduler, highlighting the trade-off between efficiency and detail loss. The process of generating four images with different settings is shown, emphasizing the impact of the CFG scale and the use of the LCM Laura model from the hugging face repo. The video also provides recommendations on the optimal CFG scale range for best results.

05:01

๐ŸŒŸ High-Resolution Fix and New Features in 3.4

This paragraph covers the return of the high-resolution fix in version 3.4, a feature that allows for the upscaling of images from the original training size to a larger resolution. The video demonstrates the process with an example of a cyborg King image, showing the absence of repeating patterns often seen in high-resolution generation. It is mentioned that the control net feature and the T-to-I adapter feature are no longer mutually exclusive, meaning they can be used simultaneously for more versatile image generation. The paragraph also discusses the use of the T-to-I color adapter, showcasing its application on an image and the impact on color processing and resolution.

10:03

๐ŸŽจ Advanced Workflow Editor Features and Multi-Image IP Adapters

The video delves into the advanced features added to the workflow editor, highlighting the ability to use multiple image prompt (IP) adapters for blending different concepts. The community-named 'instant lauras' feature is explained, demonstrating how it averages multiple images of the same concept to blend them into the generated image. The video provides examples of blending two distinct concepts, showing the potential for creating unique and complex images. The paragraph also touches on the importance of adjusting the weight of each concept to achieve a desired outcome, whether it's a subtle blend or a dramatic shift in the image's focus.

15:05

๐ŸŒ Community Contributions and Updates in 3.4

The final paragraph of the video script acknowledges the contributions of the community in the development of version 3.4. It mentions the addition of the vae metadata recall feature, the expansion of the color picker with rgba value fields, and the translation efforts that have led to almost fully complete translations of the invoke AI app in Dutch, Italian, and Chinese. The video also notes other smaller features and improvements, such as speed increases for lauras and text encoder loading times, and more efficient backend updates for certain functions within the engine. The video ends with a call to action for viewers to join the community on Discord and stay tuned for future updates.

Mindmap

Keywords

๐Ÿ’กLCM

LCM stands for Latent Consistency Modeling, a new technical technique introduced in the video for optimizing the diffusion process. It involves using a new scheduler called the LCM scheduler, which reduces the steps needed to generate an image, thereby increasing efficiency. However, this method may result in some loss of detail compared to other generation processes. In the context of the video, the presenter demonstrates how LCM can speed up image generation while also discussing its trade-offs in terms of quality.

๐Ÿ’กCFG scale

CFG scale refers to the Control Flow Graph scale, a parameter that influences the adherence of the generated images to the input prompt. Adjusting the CFG scale can result in variations in the output, with higher values leading to increased adherence but also potential saturation and quality adjustments. The video emphasizes the importance of finding a balance in the CFG scale for optimal results.

๐Ÿ’กHigh-resolution fix

The high-resolution fix is a feature that allows for the upscaling of images generated by the model. It works by initially creating the core composition at a lower resolution and then enlarging it using techniques like ESR Gan or straight resizing, followed by a denoising process at the higher resolution. This feature is particularly useful for achieving larger, detailed images without having to delve into complex workflows.

๐Ÿ’กControl net

The control net feature, as mentioned in the video, is a mechanism that can be used in conjunction with other features like the T-to-I adapter. It allows for the manipulation of certain aspects of the generated image, such as color or style, without affecting the overall composition. The control net can be adjusted to achieve desired effects, such as reducing jagged edges or altering the style of the image.

๐Ÿ’กT-to-I adapter

The T-to-I adapter, or Text-to-Image adapter, is a feature that enables the integration of textual descriptions into the image generation process. It allows users to input specific concepts or styles that they want to see in the generated images. The adapter processes this textual information and influences the final output accordingly.

๐Ÿ’กWorkflow editor

The workflow editor is an advanced tool within the software that allows users to construct complex image generation processes by connecting different nodes and settings. It provides more control over the generation process and enables the use of multiple inputs and features, such as IP adapters, to create detailed and nuanced images.

๐Ÿ’กMulti-image IP adapters

Multi-image IP adapters are a feature that allows users to input multiple images into a single IP adapter. This enables the blending of different concepts or styles into the generated image, creating a composite that represents an average of the input images. The feature is powerful for creating new and unique images by combining various visual elements.

๐Ÿ’กInstant lauras

Instant lauras is a term used in the community to describe the capability of passing multiple images of the same concept into an IP adapter. This feature aims to extract the general idea from the input images and blend these concepts into the final image, creating a more coherent representation of the input concept.

๐Ÿ’กVAE

VAE stands for Variational Autoencoder, a type of artificial intelligence model used in the software for image generation. The VAE is responsible for encoding and decoding the images, and its metadata can now be recalled according to the updates mentioned in the video.

๐Ÿ’กColor Picker

The Color Picker is a tool within the unified canvas that allows users to select and input colors for use in their image generation process. The video highlights a new feature where the Color Picker now includes RGBA (Red, Green, Blue, Alpha) value fields, enhancing the precision and flexibility of color selection.

๐Ÿ’กCommunity and Contributors

The community and contributors refer to the group of users and developers who actively participate in the improvement and translation of the software. The video acknowledges the significant contributions made by various individuals, including those who have helped with bug fixes, translations, and other enhancements.

Highlights

Introduction of LCM, a new technique for optimizing the diffusion process with a new scheduler called the LCM scheduler.

LCM scheduler reduces the number of steps needed to generate an image, leading to more efficient generation processes.

Quality loss is a trade-off when using the LCM scheduler, with some detail being lost in the generation process.

Demonstration of image generation using the LCM scheduler and the addition of the LCM Laura, available at the latent consistency Hugging Face repo.

Adjusting the CFG scale can change the adherence to the prompt and affect the quality of the generated images.

Recommendation to stay in the lower ranges of the CFG scale for optimal results.

Return of a simple high-resolution fix, allowing for larger images to be generated from the linear UI without complex workflows.

The control net feature and the T-to-I adapter feature are now compatible for use simultaneously in the same generation.

Introduction of multi-image IP adapters in the workflow editor, enabling the blending of different concepts into a single image.

Instant lauras, a community-adopted term, refers to the ability to pass multiple images of the same concept into the same IP adapter for a blended result.

Demonstration of blending two distinct concepts (spiders and Yeti-like creatures) using multi-image IP adapters.

Adjusting the weight of concepts in the IP adapter can significantly alter the resulting image, allowing for fine-tuning of concept blending.

Mention of new contributors and their contributions, such as the recall of VAE metadata and the addition of RGBA value fields in the Color Picker.

Acknowledgment of the community and contributors for their work on translations, making Invoke AI app available in multiple languages including Dutch, Italian, and Chinese.

Upcoming updates for Invoke AI app, including more speed increases and backend improvements for efficiency.

Invitation to join the Discord community for further engagement and updates.