Skip to main content

Deployments

We have two levels of guides that you can follow to deploy your LLM applications.

	The Doubleword Inference Platform	The Takeoff Engine
Summary	Users who want LLM APIs with all production features.	Best in class single container solution to run LLM models.
Target	Anyone looking to deploy production LLM APIs to use and build applications upon.	Researchers/developers who want to run LLM models on a machine or a single node.
Cloud agnostic	✅ Kubernetes engine	✅ Docker engine
Metrics	✅ Cluster wide Prometheus instance	⭕ Container level metrics available at `/metrics`
Horizontal Scaling	✅ Set different custom behaviors to scale up and down, thresholds and decide what are the key metrics to make decisions on.	❌ Single container solution, needs involved development to achieve.
Scale to Zero	✅ Set additional policies that are different to Horizontal Scaling when you Scale to Zero saving costs.	❌ Single container solution, needs involved development to achieve.
Logging	✅ Accrue logs from all distributed instances into Loki for smart searching.	⭕ Meaningful logs from the container streamed into standard output.
Alerting	✅ Alarm on request status codes, increased processing duration or any metric exposed from the cluster.	❌ Single container solution, needs involved development to achieve.
Multi-stage	✅ Different deployments can be partitioned into their own namespaces or with prefixes to distinguish between different stage's resources.	❌ Single container solution, needs involved development to achieve.
Multi-node	✅ Uses Kubernetes so can operate across any collection of nodes.	❌ Single container solution, needs involved development to achieve.
Self-healing	✅ Meaningful health checks that increases resilience, awareness and helps resolve issues faster.	❌ Single container solution, needs involved development to achieve.

🗃️ Platform

3 items

🗃️ Engine

5 items