Version: 0.20.x

Model Cards

In this section, we showcase our domain-specific models through interactive model cards that provide essential information at a glance. Each model card highlights key attributes such as model type, size, compatible hardware, and use cases, allowing users to quickly explore and evaluate the models available in our library.

Popular models

Huggingface models

Takeoff supports most generation & embedding models natively supported by HuggingFace Transformers, which includes most models available on the HuggingFace Hub.

Models from the Llama-2, Mistral or Mixtral families benefit from further optimisations to the base Takeoff optimisations.

Multi-GPU support is also available for models from these families, enabled by specifying the devices to use with the TAKEOFF_CUDA_VISIBLE_DEVICES variable.

Models which are quantized using AWQ are supported, with AWQ being the recommended method with which to run large models on smaller hardware. Read more about AWQ and Quantization here. Suitable AWQ models are available here.

Choosing the right model

Selecting the right model requires optimising performance under your hardware constraints. Models are often issued in different sizes, and can be quantized to different levels, each affecting the performance and memory usage. We discuss balancing these factors in more details here.

To help you avoid Out of Memory errors, we have also created a memory calculator that will estimate the amount of memory a model will use. This can be accessed from the Takeoff inference GUI. You can also specify your hardware's specifications to determine if a specific model can be run on your configuration. See more about using the calculator here.

Using your own models

How can I use a model I have saved locally?

If you have fine-tuned a model already, you might want to run that in the Takeoff server instead of a huggingface model. There are two ways to do this.

Volume mounting
Upload to Huggingface

Save the model locally and volume mount it.

Example:

Lets say we have trained a model and saved it locally. For example, using this python code:

model = AutoModelForCausalLM.from_pretrained('...')

# Your training code ...

tokenizer.save_pretrained('my_model')
model.save_pretrained('my_model')

Then on the command line when running the takeoff server you can mount the model directory onto Takeoff's internal /code/models/jf folder.

docker run --gpus all \
    -v /path/to/\<my_model>:/code/models/jf/<my_model> \
    -e TAKEOFF_MODEL_NAME=my_model \
    -e TAKEOFF_DEVICE=cuda \
    tytn/takeoff-pro:0.20.0-gpu

Upload the model to a private Huggingface Hub, and pass in your token to allow Takeoff to download the model.

Example:

docker run --gpus all \
    -e TAKEOFF_MODEL_NAME=<My-HF-Account/My-Model> \
    -e TAKEOFF_ACCESS_TOKEN=<My-HF-Token> \
    -e TAKEOFF_DEVICE=cuda \
    tytn/takeoff-pro:0.20.0-gpu

Supported Models Table

Here's a list of all the supported models categorized by size that are readily available on our Hugging Face Hub. These models are divided into three categories based on their parameter size: Small Models (less than 5 billion parameters), Medium Models (between 5 billion and 13 billion parameters), and Large Models (more than 13 billion parameters).

Our Hugging Face Hub hosts various pre-trained models optimized for different use cases, from natural language processing tasks like text generation, question answering, and embeddings, to advanced multi-modal models for specific domains. This allows developers, researchers, and machine learning engineers to deploy these models directly or fine-tune them according to their needs, saving time and computational resources.

We strive to keep these models updated and continuously add new versions with optimized performance and smaller memory footprints (like quantized models). If you're looking for a model that fits your project’s specific requirements, explore the comprehensive list below:

Model Size	Generative	VLM	Model Size	Embedding	Reranker
Small (< 5B)	TitanML/Qwen2-1.5B	TitanML/Qwen2-VL-1.5B	Small (< 100m)	jinaai/jina-embeddings-v2-small-en	jinaai/jina-reranker-v1-tiny-en
	TitanML/Qwen2-Math-1.5B	TitanML/InternVL-1B		sentence-transformers/all-MiniLM-L6-v2	jinaai/jina-reranker-v1-turbo-en
	TitanML/gemma-2-2b	google/paligemma-3b-pt-896		mixedbread-ai/mxbai-embed-large-v1
	Qwen/Qwen2-0.5B	Qwen/Qwen2-VL-2B-Instruct
		OpenGVLab/InternVL2-4B
Medium (5B ~ 13B)	TitanML/Meta-Llama-3.1-8B	llava-hf/llava-v1.6-mistral-7b-hf	Medium(100m ~ 300m)	TitanML/jina-v2-base-en-embed	BAAI/bge-reranker-base
	TitanML/Qwen2-7B	llava-hf/llama3-llava-next-8b-hf		jinaai/jina-embeddings-v2-base-en	ibm/re2g-reranker-nq
	TitanML/Mistral-7B-Instruct-v0.3-AWQ-4bit	llava-hf/llava-1.5-7b-hf		intfloat/multilingual-e5-small
	mistralai/Mistral-7B-v0.3	OpenGVLab/InternVL2-8B		nomic-ai/nomic-embed-text-v1.5
Large (> 13B)	TitanML/Meta-Llama-3.1-70B-Instruct	TitanML/InternVL2-Llama3-76B-AWQ	Large(> 300m)	intfloat/multilingual-e5-large-instruct	BAAI/bge-reranker-v2-m3
	meta-llama/Meta-Llama-3.1-405B-Instruct	Qwen/Qwen2-VL-72B-Instruct		BAAI/bge-large-en-v1.5	BAAI/bge-reranker-v2-gemma
	TitanML/Qwen-72B-Chat	llava-hf/llava-1.5-13b-hf		BAAI/bge-reranker-v2.5-gemma2-lightweight	openbmb/MiniCPM-Reranker
	Qwen/Qwen2-57B-A14B	OpenGVLab/InternVL2-26B
		OpenGVLab/InternVL2-40B