How to Use Generative Audio | Runway Academy

Runway
8 May 202403:07

TLDRRunway Academy's tutorial introduces generative audio, covering text-to-speech, custom voice models, and lip-sync video creation. Users can input text and select a voice to generate spoken audio, save it, and even train a custom voice with clean audio. The tool also facilitates creating lip-sync videos with images or videos, integrating text-to-speech or recorded audio. Tips for seamless video workflow are provided, and community resources on Discord are highlighted for further learning.

Takeaways

  • 🎙️ Use Runway's generative audio tool to convert text into spoken audio.
  • 🔍 Preview and select from a list of default voices before generating audio.
  • ⏱️ Audio generation time varies based on script length but is typically quick.
  • 📂 Audio files are automatically saved in the 'generative audio' folder within the assets directory.
  • 🔄 Custom voice models can be trained with a few minutes of clean audio.
  • 📝 Ensure the audio used for training is as clear as possible for best results.
  • 🖼️ Create lip-sync videos with an image or video of a person, ensuring full face visibility.
  • 🎥 Lip-sync can be applied to text-to-speech, recorded, or uploaded audio.
  • 🔁 If the audio is longer than the video, the video will loop back to the beginning to match the audio duration.
  • 🎨 For video workflows, avoid camera motion parameters and use subject motion with a motion brush for a smoother effect.
  • 📢 Join the Runway community on Discord for more resources and support.

Q & A

  • What is the generative audio tool in Runway Academy?

    -The generative audio tool in Runway Academy is a feature that allows users to convert text into spoken audio files, create custom voice models, and produce lip-sync videos.

  • How do you access the generative audio tool in Runway?

    -You can access the generative audio tool by clicking on it from the top of your Runway dashboard.

  • What is the process for generating spoken audio from text?

    -After typing in the text you want to convert, you can preview it, choose a voice from the default list, and then click the generate button. The generation time varies based on the script length.

  • Where are the audio generations saved by default in Runway?

    -Audio generations are automatically saved to the 'generative audio' folder inside your main assets folder in Runway.

  • Can you save the audio generations in a different location?

    -Yes, you can choose to save the audio generations in a different location by selecting an alternative from the drop-down menu.

  • What is required to train a custom voice model in Runway?

    -To train a custom voice model, you need a few minutes of clean audio, which can be imported into Runway or recorded directly within the generative audio tool.

  • How should the audio be when recording for a custom voice model?

    -The audio should be as clean as possible, ensuring that the voice is clear and free from background noise for the best training results.

  • What is the purpose of creating a lip-sync video in Runway?

    -The purpose of creating a lip-sync video is to synchronize the audio with the movements of a person's lips in an image or video, making it appear as if they are speaking the audio content.

  • What are the requirements for the image or video used in lip-sync?

    -The image or video used for lip-sync should have the full face of the person clearly visible within the frame.

  • Can lip-sync be used with different types of audio?

    -Yes, lip-sync can be used with generated audio from text to speech, recorded audio, or uploaded audio.

  • What happens if the audio is longer than the video in a lip-sync project?

    -If the audio is longer than the video, once the video ends, it will reverse and go back to the beginning to continue syncing with the audio for the remaining duration.

  • What is a pro tip for using the video workflow in Runway's generative audio tool?

    -A pro tip is to avoid using camera motion parameters and instead add subject motion with a motion brush to make the reversing effect less noticeable.

  • How can users find more information or get help with Runway?

    -Users can join the Runway community on Discord for more information, experimentation, and to find specific answers to their questions. They can also use the help button on their dashboard at any time.

Outlines

00:00

🎙️ Introduction to Generative Audio

The video script begins with an introduction to the Runway Academy's focus on generative audio, which encompasses text-to-speech, custom voice models, and lip sync videos. The process starts with accessing the generative audio tool from the Runway dashboard, where users can input text and select a voice from a default list to generate spoken audio files. The script mentions the quick generation times and the automatic saving of audio files to a specific folder within the assets directory. Additionally, it provides guidance on how to train a custom voice model using a few minutes of clean audio and emphasizes the importance of audio quality for effective training.

🔍 Custom Voice Model Training

This section delves deeper into the process of training a custom voice model. It explains that users can import their own audio or record directly within the tool, and it's crucial to ensure the audio is clean for the best results. The script provides instructions on naming the voice model and mentions the quick turnaround time for the model to become ready for use with text-to-speech. The video also touches on the ability to create lip sync videos using an image or video of a person, with the full face visible, and offers the option to use preset characters or upload custom media.

🎥 Creating Lip Sync Videos

The script then moves on to the creation of lip sync videos, detailing the process of adding an image or video and using generated or recorded audio for synchronization. It demonstrates how to add new text-to-speech content and select a voice before generating the audio. The video provides a tip for creating a video using Gen 2 and then uploading it to the generative audio tool to add lip sync. It also addresses a common issue where the audio might be longer than the video, causing the video to reverse and loop back to the beginning once it reaches the end, and offers a pro tip to make this effect less noticeable by avoiding camera motion parameters and using subject motion with a motion brush.

📚 Conclusion and Additional Resources

The video concludes by thanking viewers for their time and encouraging them to engage with the community on Discord for more resources and experimentation with Runway. It also directs viewers to use the dashboard for finding specific answers to their questions. The script wraps up by reiterating the invitation to join the community and seek further assistance or information.

Mindmap

Keywords

💡Generative Audio

Generative audio refers to the creation of audio content using AI tools. In the video, it includes text to speech, custom voice models, and creating lip sync videos, highlighting the versatility of AI in audio production.

💡Text to Speech

Text to speech is a technology that converts written text into spoken audio. In the video, it is demonstrated as a tool in Runway where users can type text, choose a voice, and generate an audio file, making it useful for creating voiceovers.

💡Custom Voice Models

Custom voice models involve training AI to replicate specific voices using a few minutes of clean audio. The video shows how to import or record audio in Runway to create a personalized voice model for more tailored text-to-speech applications.

💡Lip Sync Videos

Lip sync videos synchronize the movement of a person's lips with generated audio. The video explains how to use Runway to create lip sync videos using images or videos, making it possible to create realistic talking characters.

💡Runway Dashboard

The Runway dashboard is the main interface for accessing Runway tools. The video starts by guiding users to the generative audio tool via the dashboard, emphasizing its role as the control center for various AI-powered creative tools.

💡Generate Button

The generate button initiates the creation of the audio file or model in Runway. The video mentions clicking the generate button after selecting a voice, which starts the audio generation process, demonstrating the ease of use.

💡Clean Audio

Clean audio refers to high-quality, noise-free sound recordings. The video advises using clean audio for training custom voice models, as clear audio ensures better accuracy and performance of the AI-generated voice.

💡Main Assets Folder

The main assets folder is where generated audio files are automatically saved in Runway. The video highlights this default save location, helping users manage their projects and find their generated content easily.

💡Motion Brush

Motion brush is a tool in Runway used to add subject motion to videos. The video recommends using motion brush instead of camera motion parameters for a smoother lip sync effect, illustrating a practical tip for better video quality.

💡Gen 2

Gen 2 is a feature in Runway for generating videos from images. The video mentions using Gen 2 to turn an image into a video, which can then be synced with audio, showcasing an advanced capability of the platform.

Highlights

Introduction to Runway Academy's tutorial on generative audio.

Exploring text to speech, custom voice models, and lip sync videos in Runway.

Accessing the generative audio tool from the Runway dashboard.

Typing in text and converting it into a spoken audio file.

Previewing and selecting a voice from the default voice list.

Understanding the generation time based on script length.

Automatic saving of audio generations to the generative audio folder.

Customizing the save location using the drop-down menu.

Training a custom voice model with a few minutes of clean audio.

Recording or importing audio for the custom voice model.

Instructions on ensuring clean audio for the voice model.

Naming the voice model and its readiness for text to speech.

Creating a lip sync video with an image or video of a person.

Using preset characters or uploading custom media for lip sync.

Adding text to speech and selecting a voice for lip sync.

Generating a personalized message for the lip sync video.

Using Gen 2 to convert an image into a video for lip sync.

Adding lip sync to a video and handling audio longer than the video.

Professional tip on avoiding camera motion parameters for smoother reversing effect.

Invitation to join the Runway community on Discord for further resources and experimentation.

Accessing specific answers through the dashboard at any time.