Skip to main content
Version: Next

Running Takeoff with LoRAs


This guide will cover how to start Takeoff with Low Rank Adapters (LoRA) modules attached to the models. LoRAs are a popular way to fine-tune LLMs in a way that is designed to be low cost while performing comparably to the much more resource intensive full-fine tuning.

LoRA training works by freezing the original weights of a model, and instead only training small Adapters that are added to the model. These adapters are much smaller than the original model, commonly between 1-5% of the size of the original model.

You can load many LoRAs simultaneously with a single model, and include in your request which LoRA model you want to interact with.

Launching Takeoff with LoRAs.​

To launch a container that attaches LoRAs to a generative model you must add an environment parameter called TAKEOFF_LORAS. This will be a comma-separated list of huggingface LoRA names, passed in as a single string.

docker run \
-e TAKEOFF_MODEL_NAME=meta-llama/Llama-3.2-3B \
-e TAKEOFF_DEVICE=cuda \
-e TAKEOFF_LORAS=hf-repo/your-lora-1,hf-repo/your-lora-2,hf-repo/your-lora-3
-p 3000:3000 \
-v ~/.takeoff_cache:/code/models \
-it \
--gpus all \
tytn/takeoff-pro:0.21.2-gpu

The above command will attach the 3 specified LoRAs to the original.

If you are using a manifest file then you can pass in the loras in the following way:

takeoff:
server_config:
readers_config:
reader1:
model_name: "meta-llama/Llama-3.2-3B"
loras: "hf-repo/your-lora-1,hf-repo/your-lora-2,hf-repo/your-lora-3"
device: "cuda"
consumer_group: "generate"

Restrictions on LoRAs​

To use a LoRA with a model the base model of the LoRA must match the model specified in the TAKEOFF_MODEL_NAME parameter. You can check that the base models match by looking in the adapter_config.json file of the LoRA. The base_model_name_or_path field must match the model name. Other than that the LoRAs may apply to different weights within the model, and can have different values of r and alpha and they can still be inferenced in parallel.

Interacting with LoRAs​

When you make a generation request, you can specify that you want to interact with a particular LoRA attached to that model. You do that by specifying the lora_id parameter in the generation request.

Here is an example request that requests a specific LoRA:

curl -X POST \
"http://localhost:3000/generate_stream" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{
\"text\":\"List 3 things to do in London.\",
\"sampling_temperature\":0.1,
\"lora_id\":\"hf-repo/your-lora-1\"
}"

This will use the LoRA with the name hf-repo/your-lora-1. Specifying no LoRA will send the request to the base model, with no LoRAs attached.