An LLM journey speed run: Going from Hugging Face to Vertex AI

Google for Developers
16 May 202438:23

TLDRIn this session, Google Cloud's Solutions Architect Rajesh Thallam and Solution Manager Skander Hannachi discuss integrating open source models from Hugging Face with Vertex AI to enhance a Gemini-based generative AI solution. They focus on adding time series forecasting capabilities using Google's TimesFM model. The talk covers the Vertex AI platform's features for building generative AI applications, the importance of model selection, and a practical demonstration of integrating TimesFM for demand forecasting in retail and e-commerce, showcasing the platform's ability to handle complex tasks and provide scalable solutions.

Takeaways

  • πŸ˜€ Skander Hannachi and Rajesh Thallam from Google Cloud's Vertex AI team discussed integrating open source models from Hugging Face with Gemini-based large language models (LLMs).
  • 🌟 They highlighted the use of Vertex AI to augment an e-commerce product catalog with a time series forecasting model called TimesFM.
  • πŸ› οΈ Vertex AI offers a platform for building generative AI applications, providing tools for model training, deployment, and augmentation with extensions and grounding.
  • πŸ” The session differentiated between Google AI Studio, a prototyping tool, and Vertex AI, an end-to-end machine learning platform on Google Cloud.
  • 🌐 The Vertex AI platform includes the Model Garden, Model Builder, and Agent Builder layers, catering to a wide range of users from business analysts to ML researchers.
  • πŸ“ˆ TimesFM is a new open-source time series foundation model by Google that can perform zero-shot learning for various time series forecasting tasks.
  • πŸ’‘ The presentation showcased a practical example of enhancing product catalog data with demand forecasting capabilities using TimesFM and Gemini.
  • πŸ›‘ The importance of choosing the right model for specific tasks was emphasized, as no single model can cover all use cases effectively.
  • πŸ”§ The process of deploying the TimesFM model from Hugging Face to Vertex AI and integrating it with other tools was demonstrated step by step.
  • πŸ”— The integration of function calling with Gemini models and the use of orchestration frameworks like LangChain were discussed to build scalable and production-ready agents.

Q & A

  • What is the main topic discussed in the video?

    -The main topic discussed in the video is how to augment a Gemini-based generative AI solution with open source models from Hugging Face, specifically by adding a time series forecasting model to enhance retail and e-commerce product catalog metadata and content.

  • Who are the speakers in the video?

    -The speakers in the video are Rajesh Thallam, a Solutions Architect at Google Cloud, and Skander Hannachi, a Solution Manager with Google Cloud.

  • What is Vertex AI?

    -Vertex AI is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and enhancing retail and e-commerce product catalog metadata and content with demand forecasting capabilities.

  • What is the role of Google's open source time series foundation model called TimesFM in the discussed solution?

    -TimesFM is used to add demand forecasting capabilities to a solution that uses Vertex Gemini to augment and enhance product catalog data in an e-commerce context. It is capable of zero-shot learning various time series forecasting tasks.

  • What is the purpose of Google AI Studio and Vertex AI Studio?

    -Google AI Studio is a prototyping tool for developers and data scientists to interact and test Gemini models, while Vertex AI Studio is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and more.

  • How does the integration between Vertex AI and Hugging Face benefit users?

    -The integration allows users to bring in all the models they need into the Vertex AI platform and choose those most suitable for their use cases, without managing any infrastructure or servers directly within the secure Google Cloud environment.

  • What is the significance of the 'Model Garden' in Vertex AI?

    -The 'Model Garden' in Vertex AI is a repository of all different foundational models available within the platform, providing users with a variety of models to choose from for their generative AI applications.

  • How does the video demonstrate the practical steps of integrating the TimesFM model into a generative AI application?

    -The video demonstrates the practical steps by showing the process of deploying the TimesFM model from Hugging Face on Vertex AI, defining Python functions as tools to interact with BigQuery and call the TimesFM model, and deploying the agent on Vertex AI's Reasoning Engine.

  • What is the importance of function calling in the context of the video?

    -Function calling is important as it allows the transformation of natural language into structured data and back again, enabling the Gemini model to return structured data responses that can be used to call external systems or APIs.

  • What is the role of LangChain in building generative AI agents as discussed in the video?

    -LangChain is used to define agents and bind models with tools, allowing for the creation of more complex, multi-step tasks that can be performed by the agent without constant human input, making the interactions with generative models more scalable and production-ready.

Outlines

00:00

🌐 Introduction to Vertex AI and Hugging Face Integration

The video begins with Skander Hannachi and Rajesh Thallam introducing themselves as part of Google Cloud's Vertex AI applied engineering team. They express excitement about discussing the augmentation of a Gemini-based generative AI solution with open source models from Hugging Face, specifically focusing on integrating a time series forecasting model into a large language model. The conversation aims to guide viewers on utilizing the Vertex AI platform to develop generative AI applications, with a spotlight on a recent use case involving Google's open source time series foundation model, TimesFM, launched on Hugging Face. The hosts walk through the steps of integrating TimesFM into a generative AI application, emphasizing the importance of choosing the right models for specific use cases and the benefits of integrating Vertex AI with Hugging Face.

05:01

πŸ› οΈ Building Generative AI Applications with Vertex AI

Skander delves into the thought process behind building generative AI applications using the Vertex AI platform. He outlines the challenges of designing a Gemini-based solution, such as deciding whether to fine-tune a model or use a retrieval augmented generation approach. The discussion highlights the importance of not relying on a single model for all tasks, but instead considering a variety of models available through Vertex AI and Hugging Face. Rajesh then distinguishes between Google AI Studio and Vertex AI Studio, explaining that Google AI Studio is a prototyping tool, while Vertex AI is an end-to-end machine learning platform. He emphasizes the full data control, enterprise security, and data governance provided by Vertex AI Studio. Skander further elaborates on the Vertex AI platform's components, including the Vertex Model Garden, Model Builder, and Agent Builder, catering to a wide range of users from business analysts to ML researchers.

10:03

πŸ“ˆ Enhancing E-commerce with Time Series Forecasting

The focus shifts to a concrete industry example, where Skander discusses the challenges of managing product catalogs in e-commerce and how the Vertex AI team has addressed these with a solution that uses Vertex Gemini to enhance product catalog data. The solution streamlines the process of generating and enhancing new product content using a category detection module, a filtering process for detailed attributes, and Gemini to generate detailed product descriptions. Skander describes the system design as a single path state machine, where each part of the user journey represents a specific state. He then explores the idea of adding demand forecasting capabilities to the system, noting the distinction between listing new products and forecasting demand, and introduces the concept of an intent detection layer to determine the merchant's goals.

15:04

πŸ“Š TimesFM: A New Approach to Time Series Forecasting

Skander introduces TimesFM, Google's newly released open source time series foundational model, which is trained on synthetic time series data and leverages a decoder-only architecture similar to large language models. He explains that TimesFM is capable of zero-shot learning for various time series forecasting tasks, allowing it to generate forecasts on the fly without prior training on historical data sets. This capability is particularly beneficial for merchants who do not have the operational overhead for traditional forecasting tools. Skander also mentions that TimesFM has been benchmarked against dedicated models and has shown superior performance in terms of accuracy and the ability to generate forecasts instantly.

20:06

πŸ”§ Practical Integration of TimesFM with Vertex AI

Rajesh takes over to demonstrate the practical steps of integrating the TimesFM model from Hugging Face with Vertex AI. He outlines the user journey for reviewing item performance by building an agent using foundation models from Vertex AI and Hugging Face. Rajesh explains the concept of a generative AI agent, which is an application that attempts to complete complex tasks or user queries by understanding intent and acting on it with the help of external systems or APIs. He discusses the components needed to build performant Gen AI applications, including models, tools, orchestration, and runtime. Rajesh then walks through the process of deploying the TimesFM model on Vertex AI, defining Python functions as tools, and using LangChain templates to create executable tools for the model to perform information retrieval or transactions via API calls.

25:07

πŸš€ Deploying the TimesFM Model and Building an Agent

The video script details the process of deploying the TimesFM model from Hugging Face onto Vertex AI's prediction endpoint. It includes steps for data preparation, such as downloading a dataset from Kaggle, transforming it for time series forecasting, and uploading it to BigQuery. Rajesh demonstrates how to deploy the model using Vertex AI SDK, create a custom predictor, and test it locally before pushing it to the model registry and deploying it to a Vertex AI endpoint. The script also covers defining functions as tools to interact with BigQuery and call the TimesFM model, and using these tools to fetch data and generate forecasts. The process of defining a LangChain agent, binding the model with these tools, and deploying the agent on Vertex AI's Reasoning Engine is also explained, highlighting the ability to productionize and scale the agent for business-critical applications.

30:09

πŸ“ Conclusion and Call to Action

In the concluding part of the video script, Skander and Rajesh summarize the key takeaways from their discussion. They emphasize that there is no single foundational model that can handle all tasks, and the importance of using a variety of models to augment Gemini-based solutions. They also highlight the ease of deploying models from Hugging Face directly to Vertex AI and the potential of combining generative AI with predictive models to unlock the full potential of Gen AI solutions. The hosts conclude by encouraging viewers to explore function calling and Reasoning Engine for building scalable Gen AI agents and invite them to seek additional information on the use case presented.

Mindmap

Keywords

πŸ’‘LLM (Large Language Model)

A Large Language Model (LLM) refers to advanced artificial intelligence models designed to understand and generate human-like text based on vast amounts of data. In the context of the video, LLMs are central to the discussion around augmenting AI solutions with capabilities like time series forecasting. The video specifically mentions integrating an open-source LLM from Hugging Face with Vertex AI to enhance a solution.

πŸ’‘Vertex AI

Vertex AI is Google Cloud's platform for building and deploying machine learning models. It is highlighted in the video as a key component in the process of augmenting AI applications. The speakers discuss how Vertex AI's platform can be used to integrate and deploy models, such as the time series forecasting model TimesFM, to enhance business solutions.

πŸ’‘Hugging Face

Hugging Face is a company that provides a platform for developers to build, train, and deploy machine learning models, particularly in the field of natural language processing. In the video, it is mentioned as the source of the open-source time series forecasting model, TimesFM, which the speakers demonstrate how to integrate with Vertex AI.

πŸ’‘Time Series Forecasting

Time series forecasting is a method used in statistics and machine learning to predict future data points based on previously observed values. The video focuses on how to add this capability to a Gemini-based large language model using the TimesFM model. This is showcased as a way to enhance retail and e-commerce product catalog metadata with demand forecasting.

πŸ’‘Gemini

Gemini, in the context of the video, refers to a specific type of large language model offered by Google Cloud's Vertex AI platform. The video discusses how to augment a Gemini-based solution with additional capabilities, such as time series forecasting, to create more robust AI applications.

πŸ’‘Model Garden

The Model Garden is a component of the Vertex AI platform that serves as a repository of various foundational models available for use within Vertex AI. The video mentions the integration of Hugging Face Hub with Vertex AI's Model Garden, allowing users to choose and deploy models with a single click.

πŸ’‘Function Calling

Function calling is a feature of the Gemini API that allows the model to return structured data responses, which can then be used to interact with external systems or APIs. The video demonstrates how function calling can be used to transform natural language prompts into actionable data that can be used by the model to perform tasks like querying a database or making API calls.

πŸ’‘LangChain

LangChain is a framework for building AI agents that can perform complex tasks by orchestrating multiple AI models and tools. In the video, LangChain is used in conjunction with function calling to create an agent that can interact with BigQuery and the TimesFM model to perform time series forecasting.

πŸ’‘Reasoning Engine

The Reasoning Engine is a component of Vertex AI that allows for the deployment of AI agents at scale with the necessary security and reliability for business-critical applications. The video explains how agents built with function calling and orchestration frameworks like LangChain can be deployed using the Reasoning Engine for use as remote services.

πŸ’‘E-commerce Product Catalog

An e-commerce product catalog refers to the collection of all the products that an online retailer offers for sale. The video discusses a use case where Vertex AI and Hugging Face models are used to augment and enhance product catalog data, making it more engaging and effective for consumers.

Highlights

Vertex AI and Hugging Face collaboration allows integrating open source models with Gemini-based solutions.

Introduction of a time series forecasting model to enhance large language models.

Practical steps for integrating the TimesFM model into a generative AI application.

Google's open source time series foundation model, TimesFM, launched on Hugging Face.

Vertex AI platform's role in building Gen AI applications and recent use cases in retail and e-commerce.

The importance of choosing the right model for specific tasks in generative AI applications.

Google AI Studio and Vertex AI Studio's functionalities and distinctions.

Vertex AI's unified platform for both generative and predictive AI.

Components of the Vertex AI platform: Model Garden, Model Builder, and Agent Builder.

130-plus curated foundation models available in the Vertex Model Garden.

Integration of Hugging Face Hub with Vertex AI for easy deployment of models.

Industry example of integrating an open source model to add demand forecasting to an e-commerce product catalog.

Challenges in managing product catalogs in e-commerce and how Gemini can help.

The process of generating new product content and enhancing it with Gemini and Imagine.

System design perspective on using Gemini to create a state machine for user journeys.

The need for intent detection in adding demand forecasting capabilities to product catalog solutions.

TimesFM's ability for zero-shot learning in time series forecasting tasks.

Comparison of TimesFM with other dedicated models in terms of accuracy and on-the-fly forecasting.

Practical demonstration of integrating TimesFM with Vertex AI to review item performance.

Defining generative AI agents and their components for complex task completion.

Function calling as a native feature of Gemini models to return structured data.

Using LangChain on Vertex AI to turn Python functions into executable tools for models.

Deploying agents on Vertex AI's Reasoning Engine for scalability and reliability.

Steps to deploy the TimesFM model from Hugging Face on Vertex AI prediction endpoint.

Defining Python functions as tools to interact with BigQuery and call the TimesFM model.

Creating a LangChain agent, binding the model with tools, and deploying it on Reasoning Engine.

Testing the defined agent locally and deploying it as a remote service on Cloud Run.

Final thoughts on the importance of combining different models and deploying scalable Gen AI agents.