Skip to main content
Version: 0.21.x

Deployments

We have two levels of guides that you can follow to deploy your LLM applications.

The Takeoff StackThe Takeoff Engine
SummaryUsers who want LLM APIs with all production features.Best in class single container solution to run LLM models.
TargetAnyone looking to deploy production LLM APIs to use and build applications upon.Researchers/developers who want to run LLM models on a machine or a single node.
Cloud agnostic Kubernetes engine Docker engine
Metrics Cluster wide Prometheus instance Container level metrics available at /metrics
Horizontal Scaling Set different custom behaviors to scale up and down, thresholds and decide what are the key metrics to make decisions on. Single container solution, needs involved development to achieve.
Scale to Zero Set additional policies that are different to Horizontal Scaling when you Scale to Zero saving costs. Single container solution, needs involved development to achieve.
Logging Accrue logs from all distributed instances into Loki for smart searching. Meaningful logs from the container streamed into standard output.
Alerting Alarm on request status codes, increased processing duration or any metric exposed from the cluster. Single container solution, needs involved development to achieve.
Multi-stage Different deployments can be partitioned into their own namespaces or with prefixes to distinguish between different stage's resources. Single container solution, needs involved development to achieve.
Multi-node Uses Kubernetes so can operate across any collection of nodes. Single container solution, needs involved development to achieve.
Self-healing Meaningful health checks that increases resilience, awareness and helps resolve issues faster. Single container solution, needs involved development to achieve.