An LLM journey speed run: Going from Hugging Face to Vertex AI
TLDRIn this session, Google Cloud's Solutions Architect Rajesh Thallam and Solution Manager Skander Hannachi discuss integrating open source models from Hugging Face with Vertex AI to enhance a Gemini-based generative AI solution. They focus on adding time series forecasting capabilities using Google's TimesFM model. The talk covers the Vertex AI platform's features for building generative AI applications, the importance of model selection, and a practical demonstration of integrating TimesFM for demand forecasting in retail and e-commerce, showcasing the platform's ability to handle complex tasks and provide scalable solutions.
Takeaways
- π Skander Hannachi and Rajesh Thallam from Google Cloud's Vertex AI team discussed integrating open source models from Hugging Face with Gemini-based large language models (LLMs).
- π They highlighted the use of Vertex AI to augment an e-commerce product catalog with a time series forecasting model called TimesFM.
- π οΈ Vertex AI offers a platform for building generative AI applications, providing tools for model training, deployment, and augmentation with extensions and grounding.
- π The session differentiated between Google AI Studio, a prototyping tool, and Vertex AI, an end-to-end machine learning platform on Google Cloud.
- π The Vertex AI platform includes the Model Garden, Model Builder, and Agent Builder layers, catering to a wide range of users from business analysts to ML researchers.
- π TimesFM is a new open-source time series foundation model by Google that can perform zero-shot learning for various time series forecasting tasks.
- π‘ The presentation showcased a practical example of enhancing product catalog data with demand forecasting capabilities using TimesFM and Gemini.
- π The importance of choosing the right model for specific tasks was emphasized, as no single model can cover all use cases effectively.
- π§ The process of deploying the TimesFM model from Hugging Face to Vertex AI and integrating it with other tools was demonstrated step by step.
- π The integration of function calling with Gemini models and the use of orchestration frameworks like LangChain were discussed to build scalable and production-ready agents.
Q & A
What is the main topic discussed in the video?
-The main topic discussed in the video is how to augment a Gemini-based generative AI solution with open source models from Hugging Face, specifically by adding a time series forecasting model to enhance retail and e-commerce product catalog metadata and content.
Who are the speakers in the video?
-The speakers in the video are Rajesh Thallam, a Solutions Architect at Google Cloud, and Skander Hannachi, a Solution Manager with Google Cloud.
What is Vertex AI?
-Vertex AI is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and enhancing retail and e-commerce product catalog metadata and content with demand forecasting capabilities.
What is the role of Google's open source time series foundation model called TimesFM in the discussed solution?
-TimesFM is used to add demand forecasting capabilities to a solution that uses Vertex Gemini to augment and enhance product catalog data in an e-commerce context. It is capable of zero-shot learning various time series forecasting tasks.
What is the purpose of Google AI Studio and Vertex AI Studio?
-Google AI Studio is a prototyping tool for developers and data scientists to interact and test Gemini models, while Vertex AI Studio is an end-to-end machine learning platform on Google Cloud that offers tools for model training, deployment, and more.
How does the integration between Vertex AI and Hugging Face benefit users?
-The integration allows users to bring in all the models they need into the Vertex AI platform and choose those most suitable for their use cases, without managing any infrastructure or servers directly within the secure Google Cloud environment.
What is the significance of the 'Model Garden' in Vertex AI?
-The 'Model Garden' in Vertex AI is a repository of all different foundational models available within the platform, providing users with a variety of models to choose from for their generative AI applications.
How does the video demonstrate the practical steps of integrating the TimesFM model into a generative AI application?
-The video demonstrates the practical steps by showing the process of deploying the TimesFM model from Hugging Face on Vertex AI, defining Python functions as tools to interact with BigQuery and call the TimesFM model, and deploying the agent on Vertex AI's Reasoning Engine.
What is the importance of function calling in the context of the video?
-Function calling is important as it allows the transformation of natural language into structured data and back again, enabling the Gemini model to return structured data responses that can be used to call external systems or APIs.
What is the role of LangChain in building generative AI agents as discussed in the video?
-LangChain is used to define agents and bind models with tools, allowing for the creation of more complex, multi-step tasks that can be performed by the agent without constant human input, making the interactions with generative models more scalable and production-ready.
Outlines
π Introduction to Vertex AI and Hugging Face Integration
The video begins with Skander Hannachi and Rajesh Thallam introducing themselves as part of Google Cloud's Vertex AI applied engineering team. They express excitement about discussing the augmentation of a Gemini-based generative AI solution with open source models from Hugging Face, specifically focusing on integrating a time series forecasting model into a large language model. The conversation aims to guide viewers on utilizing the Vertex AI platform to develop generative AI applications, with a spotlight on a recent use case involving Google's open source time series foundation model, TimesFM, launched on Hugging Face. The hosts walk through the steps of integrating TimesFM into a generative AI application, emphasizing the importance of choosing the right models for specific use cases and the benefits of integrating Vertex AI with Hugging Face.
π οΈ Building Generative AI Applications with Vertex AI
Skander delves into the thought process behind building generative AI applications using the Vertex AI platform. He outlines the challenges of designing a Gemini-based solution, such as deciding whether to fine-tune a model or use a retrieval augmented generation approach. The discussion highlights the importance of not relying on a single model for all tasks, but instead considering a variety of models available through Vertex AI and Hugging Face. Rajesh then distinguishes between Google AI Studio and Vertex AI Studio, explaining that Google AI Studio is a prototyping tool, while Vertex AI is an end-to-end machine learning platform. He emphasizes the full data control, enterprise security, and data governance provided by Vertex AI Studio. Skander further elaborates on the Vertex AI platform's components, including the Vertex Model Garden, Model Builder, and Agent Builder, catering to a wide range of users from business analysts to ML researchers.
π Enhancing E-commerce with Time Series Forecasting
The focus shifts to a concrete industry example, where Skander discusses the challenges of managing product catalogs in e-commerce and how the Vertex AI team has addressed these with a solution that uses Vertex Gemini to enhance product catalog data. The solution streamlines the process of generating and enhancing new product content using a category detection module, a filtering process for detailed attributes, and Gemini to generate detailed product descriptions. Skander describes the system design as a single path state machine, where each part of the user journey represents a specific state. He then explores the idea of adding demand forecasting capabilities to the system, noting the distinction between listing new products and forecasting demand, and introduces the concept of an intent detection layer to determine the merchant's goals.
π TimesFM: A New Approach to Time Series Forecasting
Skander introduces TimesFM, Google's newly released open source time series foundational model, which is trained on synthetic time series data and leverages a decoder-only architecture similar to large language models. He explains that TimesFM is capable of zero-shot learning for various time series forecasting tasks, allowing it to generate forecasts on the fly without prior training on historical data sets. This capability is particularly beneficial for merchants who do not have the operational overhead for traditional forecasting tools. Skander also mentions that TimesFM has been benchmarked against dedicated models and has shown superior performance in terms of accuracy and the ability to generate forecasts instantly.
π§ Practical Integration of TimesFM with Vertex AI
Rajesh takes over to demonstrate the practical steps of integrating the TimesFM model from Hugging Face with Vertex AI. He outlines the user journey for reviewing item performance by building an agent using foundation models from Vertex AI and Hugging Face. Rajesh explains the concept of a generative AI agent, which is an application that attempts to complete complex tasks or user queries by understanding intent and acting on it with the help of external systems or APIs. He discusses the components needed to build performant Gen AI applications, including models, tools, orchestration, and runtime. Rajesh then walks through the process of deploying the TimesFM model on Vertex AI, defining Python functions as tools, and using LangChain templates to create executable tools for the model to perform information retrieval or transactions via API calls.
π Deploying the TimesFM Model and Building an Agent
The video script details the process of deploying the TimesFM model from Hugging Face onto Vertex AI's prediction endpoint. It includes steps for data preparation, such as downloading a dataset from Kaggle, transforming it for time series forecasting, and uploading it to BigQuery. Rajesh demonstrates how to deploy the model using Vertex AI SDK, create a custom predictor, and test it locally before pushing it to the model registry and deploying it to a Vertex AI endpoint. The script also covers defining functions as tools to interact with BigQuery and call the TimesFM model, and using these tools to fetch data and generate forecasts. The process of defining a LangChain agent, binding the model with these tools, and deploying the agent on Vertex AI's Reasoning Engine is also explained, highlighting the ability to productionize and scale the agent for business-critical applications.
π Conclusion and Call to Action
In the concluding part of the video script, Skander and Rajesh summarize the key takeaways from their discussion. They emphasize that there is no single foundational model that can handle all tasks, and the importance of using a variety of models to augment Gemini-based solutions. They also highlight the ease of deploying models from Hugging Face directly to Vertex AI and the potential of combining generative AI with predictive models to unlock the full potential of Gen AI solutions. The hosts conclude by encouraging viewers to explore function calling and Reasoning Engine for building scalable Gen AI agents and invite them to seek additional information on the use case presented.
Mindmap
Keywords
π‘LLM (Large Language Model)
π‘Vertex AI
π‘Hugging Face
π‘Time Series Forecasting
π‘Gemini
π‘Model Garden
π‘Function Calling
π‘LangChain
π‘Reasoning Engine
π‘E-commerce Product Catalog
Highlights
Vertex AI and Hugging Face collaboration allows integrating open source models with Gemini-based solutions.
Introduction of a time series forecasting model to enhance large language models.
Practical steps for integrating the TimesFM model into a generative AI application.
Google's open source time series foundation model, TimesFM, launched on Hugging Face.
Vertex AI platform's role in building Gen AI applications and recent use cases in retail and e-commerce.
The importance of choosing the right model for specific tasks in generative AI applications.
Google AI Studio and Vertex AI Studio's functionalities and distinctions.
Vertex AI's unified platform for both generative and predictive AI.
Components of the Vertex AI platform: Model Garden, Model Builder, and Agent Builder.
130-plus curated foundation models available in the Vertex Model Garden.
Integration of Hugging Face Hub with Vertex AI for easy deployment of models.
Industry example of integrating an open source model to add demand forecasting to an e-commerce product catalog.
Challenges in managing product catalogs in e-commerce and how Gemini can help.
The process of generating new product content and enhancing it with Gemini and Imagine.
System design perspective on using Gemini to create a state machine for user journeys.
The need for intent detection in adding demand forecasting capabilities to product catalog solutions.
TimesFM's ability for zero-shot learning in time series forecasting tasks.
Comparison of TimesFM with other dedicated models in terms of accuracy and on-the-fly forecasting.
Practical demonstration of integrating TimesFM with Vertex AI to review item performance.
Defining generative AI agents and their components for complex task completion.
Function calling as a native feature of Gemini models to return structured data.
Using LangChain on Vertex AI to turn Python functions into executable tools for models.
Deploying agents on Vertex AI's Reasoning Engine for scalability and reliability.
Steps to deploy the TimesFM model from Hugging Face on Vertex AI prediction endpoint.
Defining Python functions as tools to interact with BigQuery and call the TimesFM model.
Creating a LangChain agent, binding the model with tools, and deploying it on Reasoning Engine.
Testing the defined agent locally and deploying it as a remote service on Cloud Run.
Final thoughts on the importance of combining different models and deploying scalable Gen AI agents.