Deployments
We have two levels of guides that you can follow to deploy your LLM applications.
The Takeoff Stack | The Takeoff Engine | |
---|---|---|
Summary | Users who want LLM APIs with all production features. | Best in class single container solution to run LLM models. |
Target | Anyone looking to deploy production LLM APIs to use and build applications upon. | Researchers/developers who want to run LLM models on a machine or a single node. |
Cloud agnostic | ✅ Kubernetes engine | ✅ Docker engine |
Metrics | ✅ Cluster wide Prometheus instance | ⭕ Container level metrics available at /metrics |
Horizontal Scaling | ✅ Set different custom behaviors to scale up and down, thresholds and decide what are the key metrics to make decisions on. | ❌ Single container solution, needs involved development to achieve. |
Scale to Zero | ✅ Set additional policies that are different to Horizontal Scaling when you Scale to Zero saving costs. | ❌ Single container solution, needs involved development to achieve. |
Logging | ✅ Accrue logs from all distributed instances into Loki for smart searching. | ⭕ Meaningful logs from the container streamed into standard output. |
Alerting | ✅ Alarm on request status codes, increased processing duration or any metric exposed from the cluster. | ❌ Single container solution, needs involved development to achieve. |
Multi-stage | ✅ Different deployments can be partitioned into their own namespaces or with prefixes to distinguish between different stage's resources. | ❌ Single container solution, needs involved development to achieve. |
Multi-node | ✅ Uses Kubernetes so can operate across any collection of nodes. | ❌ Single container solution, needs involved development to achieve. |
Self-healing | ✅ Meaningful health checks that increases resilience, awareness and helps resolve issues faster. | ❌ Single container solution, needs involved development to achieve. |
🗃️ Stack
3 items
🗃️ Container
6 items