Version: 0.13.x

Model management via config manifest

Takeoff can be configured to run multiple models on a single machine by specifying each reader individually. These are specified as an array (readers_config) of ReaderConfigs within a config.yaml file, which is then mounted to /code/config.yaml inside of the Takeoff container. More details on using config manifest files can be found here.

Example

This example launches two consumer groups, one for embedding and one for generation, and puts a single embedding model in the embedding consumer group and two copies of one LLaMA model in the generation group. A total of three models are concurrently hosted on a single machine. A copy of llama-2-7b is placed on each of the available gpus, and a smaller embedding model is hosted on the cpu, all administered from a single Takeoff container.

config.yaml
takeoff:
  server_config: #Shared across readers
    batch_duration_millis: 200
    max_batch_size: 64 #Will apply across all embedding models (there's only 1 here though)
  readers_config:
    reader1:
      model_name: "intfloat/e5-small-v2"
      device: "cpu"
      consumer_group: "embed"
      max_sequence_length: 1024
    reader2:
      model_name: "meta-llama/Llama-2-7b-chat-hf"
      device: "cuda"
      quant_type: "awq"
      consumer_group: "generate"
      max_batch_size: 32
      max_sequence_length: 1024
      cuda_visible_devices: "0" #Put on first gpu i.e. with device_id 0
    reader3:
      model_name: "meta-llama/Llama-2-7b-chat-hf"
      device: "cuda"
      quant_type: "awq"
      consumer_group: "generate"
      max_batch_size: 32
      max_sequence_length: 1024
      cuda_visible_devices: "1"

This file can then be mounted into the container, and Takeoff launched with docker run. Note that in this example we also forward port 3001, allowing us to manage the launched readers via the Management API.

Example launch with multiple models
docker run --gpus all \
    -p 3000:3000 \ #Port to forward from container, for inference
    -p 3001:3001 \ #Port to forward from container, for model management
    -v ~/.takeoff_cache:/code/models \ #Volume mount for models folder
    -v ./config.yaml:/code/config.yaml \ #Volume mount for config file
    tytn/takeoff-pro:0.13.1-gpu  #Specify gpu or cpu image

Model management via config manifest

Example​

Example