Run Hugging Faces Spaces Demo on your own Colab GPU or Locally

28 Oct 202209:29

TLDRThe video tutorial guides viewers on how to run popular Hugging Face Spaces demos on their own Google Colab environment to avoid waiting in queues. It covers creating a Google Colab notebook, selecting GPU hardware acceleration, cloning the Hugging Face Spaces repository, installing necessary libraries, and making adjustments for an external URL share. The process is detailed for both Gradient and Streamlit applications, with tips on enhancing efficiency by separating model download from application run and avoiding repeated downloads.


  • πŸš€ **Popular Hugging Face Spaces**: The script introduces a popular Hugging Face Spaces demo with many users, leading to potential queues and wait times.
  • πŸ›  **Running Demos Locally**: To avoid queues and utilize the same compute resources, the video suggests running the demo on a personal Google Colab environment.
  • πŸ’‘ **Google Colab Advantages**: Google Colab can provide a T4 GPU, allowing users to skip queues and achieve similar speeds as if they were first in line on Hugging Face Spaces.
  • πŸ“š **Setting Up Google Colab**: The first step is to create a new Google Colab notebook and select GPU hardware acceleration if the demo requires it.
  • πŸ” **Checking GPU Availability**: The script explains how to verify GPU availability in Google Colab using Nvidia SMI or by importing torch and checking for CUDA availability.
  • πŸ“¦ **Cloning the Hugging Face Spaces Repo**: The tutorial demonstrates how to clone the relevant Hugging Face Spaces repository into the Google Colab notebook using `!git clone`.
  • πŸ“‚ **Navigating Directories**: It's important to navigate to the correct directory within the cloned repository to access the necessary files for the demo.
  • πŸ“ **Installing Required Libraries**: The script outlines the necessity of installing required libraries using `pip install -r requirements.txt` and possibly additional packages like `radio` or `streamlit`.
  • πŸ”‘ **Handling Hugging Face Tokens**: If the demo requires a Hugging Face token, the script explains how to perform a notebook login using `from huggingface_hub import notebook_login`.
  • πŸ”„ **Running the Demo**: The process of running the demo is detailed, including modifying the `` file to enable sharing the external URL for non-local access.
  • πŸ’» **Downloading and Running Models**: The script discusses the initial model download time and suggests separating the download process from the application run for efficiency.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is about running a popular Hugging Face Spaces demo on Google Colab to avoid waiting in queues and to utilize the GPU resources available there.

  • Why might running the demo on Google Colab be beneficial?

    -Running the demo on Google Colab can be beneficial because it allows users to skip queues and potentially get access to a Tesla T4 GPU, which can significantly speed up the processing time.

  • How can one check if their Google Colab has a GPU available?

    -To check if a GPU is available in Google Colab, users can run the command 'Nvidia SMI' or import the torch library and check if CUDA is available, which indicates that the GPU is accessible.

  • What is the first step in setting up the demo on Google Colab?

    -The first step is to create a new Google Colab notebook and, if the demo requires a GPU, change the runtime type to 'GPU' in the 'Runtime' menu.

  • How does one clone the Hugging Face Spaces repository in Google Colab?

    -To clone the repository, users can copy the repository URL and use the command '!git clone ' in a Google Colab cell.

  • What is the purpose of the 'requirements.txt' file in the Hugging Face Spaces demo?

    -The 'requirements.txt' file lists all the necessary libraries and dependencies needed for the demo to run. Users can install these by using the command 'pip install -r requirements.txt'.

  • When might a Hugging Face token be required for running the demo?

    -A Hugging Face token might be required if the demo involves accessing private resources or models on the Hugging Face platform. In such cases, users need to authenticate using the 'notebook login' command from the Hugging Face Hub library.

  • How can users run the demo application with an external URL that can be shared?

    -To run the demo with an external shareable URL, users need to modify the '' file and set the 'share' parameter to 'True' when launching the application.

  • What is a potential enhancement to the process described in the video?

    -A potential enhancement is to separate the model downloading process from the running of the application. This way, the model is downloaded only once, and subsequent runs do not need to re-download the model, saving time and resources.

  • What is the final outcome of following the tutorial?

    -The final outcome is that users can successfully run the Hugging Face Spaces demo on their Google Colab notebook, utilizing a GPU without waiting in queues, and potentially share their application with others via an external URL.

  • How can users avoid downloading the model multiple times in their Google Colab notebook?

    -Users can avoid multiple downloads by carefully examining the code, identifying the section responsible for model downloading, and separating it from the main application code. This way, the model is downloaded only once, and the application can run without re-downloading the model in subsequent runs.



πŸš€ Running Hugging Face Demos on Google Colab

This paragraph discusses the process of running a popular Hugging Face demo on Google Colab to avoid waiting in queues on Hugging Face Spaces. It highlights the benefits of using Google Colab, such as potentially getting a T4 machine or Tesla T4, which can offer the same speed as being first in the queue. The speaker guides the audience on creating a Google Colab notebook, selecting GPU hardware acceleration, and checking for GPU availability using Nvidia SMI or PyTorch. The paragraph also covers the steps to clone the Hugging Face Spaces repo and the importance of installing required libraries from the 'requirements.txt' file.


πŸ“š Customizing and Launching the Demo

The second paragraph delves into customizing the demo for a smoother experience. It explains the need to check '' for any Hugging Face token requirements and the necessity of logging in via the Hugging Face Hub if a token is used. The paragraph also emphasizes the importance of modifying the 'demo.launch' command to include 'share=True' for generating an external URL, allowing the application to be accessible outside of the local environment. The speaker demonstrates how to run the application, upload an image, select a model, and execute the diffusion process, all on a GPU without having to wait in a queue. Additionally, the paragraph suggests enhancements such as separating the model download process from the application run to improve efficiency and avoid repeated downloads.



πŸ’‘Hugging Face Spaces

Hugging Face Spaces refers to a platform where developers and researchers can share and explore various machine learning models, particularly in the field of natural language processing. In the context of the video, it is a place where a popular demo with fine-tuned diffusion models is hosted, which many people are trying out, leading to potential queues for access.

πŸ’‘Google Colab

Google Colab is a cloud-based platform for machine learning and data analysis that allows users to run Python in a notebook-like environment. It provides free access to GPUs, which can be utilized for running computationally intensive tasks. In the video, Google Colab is suggested as an alternative to running Hugging Face Spaces demos to avoid queues and potentially get access to a Tesla T4 GPU.

πŸ’‘Diffusion Models

Diffusion models are a class of generative models used in machine learning for generating data, such as images or text. They work by gradually transforming a random noise distribution into the desired data distribution. In the video, the focus is on fine-tuned diffusion models available in a particular Hugging Face Spaces demo.

πŸ’‘GPU (Graphics Processing Unit)

A Graphics Processing Unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. In the context of the video, a GPU is important for running computationally intensive tasks, such as machine learning models, more efficiently.


In the context of the video, a queue refers to the waiting list for access to computational resources or services. When a Hugging Face Spaces demo is very popular, users may have to wait in a queue to use it. The video suggests using Google Colab to avoid such queues and gain immediate access to the demo.


To clone, in the context of the video, refers to the process of creating a copy of a repository, such as the one on Hugging Face Spaces. This action is essential for replicating the demo's environment in a different platform like Google Colab.


A requirements.txt file is a common way to list dependencies for Python projects. It specifies the exact versions of libraries required for a project to run correctly. In the video, the requirements.txt file from the Hugging Face Spaces demo is used to install the necessary libraries on Google Colab.

πŸ’‘Notebook Login

Notebook login refers to the process of authenticating with a platform like Hugging Face to access certain resources or features. In some cases, a token is required to use specific models or datasets. The video explains that for the discussed demo, a notebook login is not necessary, but it might be required for other Hugging Face Spaces demos.

πŸ’‘Share Parameter

The share parameter, as mentioned in the video, is a setting that allows the user to share a local application URL as an external URL. This is useful for making the application accessible to others or for avoiding manual tunneling setup.

πŸ’‘Model Download

Model download refers to the process of acquiring the necessary machine learning models required for a particular application. In the video, it is mentioned that when running the demo for the first time, the user will need to download the models, which can take a significant amount of time.


Streamlit is an open-source Python library used to create and share custom web applications quickly. It is particularly useful for data scientists and machine learning engineers to present their work in an interactive format. In the video, the user is prompted to install Streamlit if it is not included in the requirements.txt file.


Learning how to take a demo from Hugging Face Spaces and run it on your own Google Colab.

Popularity of Hugging Face Spaces leading to queues and shared compute consumption.

The possibility of skipping queues by running the code on a personal Google Colab with a Tesla T4 GPU.

Creating a new Google Colab notebook and selecting GPU hardware acceleration if required.

Verifying the availability of GPU using Nvidia SMI or by checking if torch CUDA is available.

Cloning the Hugging Face Spaces repository into the Google Colab notebook.

Entering the specific directory of the fine-tuned diffusion model within the cloned repository.

Checking the contents of the directory and identifying necessary files.

Installing required libraries using pip and the provided requirements.txt file.

Installing additional libraries such as Gradle or Streamlit if not mentioned in requirements.txt.

The necessity of using a Hugging Face token for certain models and performing a notebook login.

Modifying the file to enable sharing of the application with an external URL.

Running the file to download models and set up the application.

Accessing the local and public URLs to use the application without waiting in a queue.

Uploading an image and selecting a style to apply diffusion models for image generation.

The potential to separate model downloading from application running for efficiency.

The overall goal of running Hugging Face models on Google Colab to avoid queues and utilize free GPU resources.