Skip to main content

One post tagged with "gcp"

View All Tags

LLMs in Production: Deploying the TitanML Takeoff server on Google Cloud Run

· 10 min read
Fergus Finn
CTO, TitanML

Hi! In the last post, we deployed a simple LLM endpoint using the Titan Takeoff server using an AWS EC2 instance. We compared performance between a GPU enabled instance and a CPU only instance, and between the Takeoff server, and raw huggingface/pytorch. In this post, we'll look at another cloud provider, and try out their tooling for deploying LLM endpoints. We'll use the same Takeoff server, but this time we'll deploy it using Google Cloud, specifically, their Google Cloud Run service . On the way, we'll discuss a little bit about serverless, and how the Takeoff server means that we can use serverless tools to deploy LLM endpoints.