Skip to main content
Version: 0.21.x

Model Management via Config Manifest


Takeoff can be configured to run multiple models on a single machine by specifying each reader individually. These are specified as an array (readers_config) of ReaderConfigs within a config.yaml file, which is then mounted to /code/config.yaml inside of the Takeoff container. More details on using config manifest files can be found here.

Example​

This example launches two consumer groups, one for embedding and one for generation, and puts a single embedding model in the embedding consumer group and two copies of one LLaMA model in the generation group. A total of three models are concurrently hosted on a single machine. A copy of llama-2-7b is placed on each of the available GPUs, and a smaller embedding model is hosted on the cpu, all administered from a single Takeoff container.

config.yaml
takeoff:
server_config: #Shared across readers

readers_config:
reader1:
model_name: "intfloat/e5-small-v2"
device: "cpu"
consumer_group: "embed"
max_sequence_length: 1024
max_batch_size: 64
reader2:
model_name: "meta-llama/Llama-2-7b-chat-hf"
device: "cuda"
quant_type: "awq"
consumer_group: "generate"
max_batch_size: 32
max_sequence_length: 1024
cuda_visible_devices: "0" #Put on first gpu i.e. with device_id 0
reader3:
model_name: "meta-llama/Llama-2-7b-chat-hf"
device: "cuda"
quant_type: "awq"
consumer_group: "generate"
max_batch_size: 32
max_sequence_length: 1024
cuda_visible_devices: "1"

This file can then be mounted into the container, and Takeoff launched with docker run. Note that in this example we also forward port 3001, allowing us to manage the launched readers via the Management API.

Example launch with multiple models
docker run --gpus all \
-p 3000:3000 \ #Port to forward from container, for inference
-p 3001:3001 \ #Port to forward from container, for model management
-v ~/.takeoff_cache:/code/models \ #Volume mount for models folder
-v ./config.yaml:/code/config.yaml \ #Volume mount for config file
tytn/takeoff-pro:0.21.0-gpu #Specify gpu or cpu image