Skip to main content
Version: Next

Model Cards


In this section, we showcase our domain-specific models through interactive model cards that provide essential information at a glance. Each model card highlights key attributes such as model type, size, compatible hardware, and use cases, allowing users to quickly explore and evaluate the models available in our library.


Huggingface models


Takeoff supports most generation & embedding models natively supported by HuggingFace Transformers, which includes most models available on the HuggingFace Hub.

Models from the Llama-2, Mistral or Mixtral families benefit from further optimisations to the base Takeoff optimisations.

Multi-GPU support is also available for models from these families, enabled by specifying the devices to use with the TAKEOFF_CUDA_VISIBLE_DEVICES variable.

Models which are quantized using AWQ are supported, with AWQ being the recommended method with which to run large models on smaller hardware. Read more about AWQ and Quantization here. Suitable AWQ models are available here.

Choosing the right model


Selecting the right model requires optimising performance under your hardware constraints. Models are often issued in different sizes, and can be quantized to different levels, each affecting the performance and memory usage. We discuss balancing these factors in more details here.

To help you avoid Out of Memory errors, we have also created a memory calculator that will estimate the amount of memory a model will use. This can be accessed from the Takeoff inference GUI. You can also specify your hardware's specifications to determine if a specific model can be run on your configuration. See more about using the calculator here.

Using your own models


How can I use a model I have saved locally?​

If you have fine-tuned a model already, you might want to run that in the Takeoff server instead of a huggingface model. There are two ways to do this.

Save the model locally and volume mount it.

Example:

Lets say we have trained a model and saved it locally. For example, using this python code:

model = AutoModelForCausalLM.from_pretrained('...')

# Your training code ...

tokenizer.save_pretrained('my_model')
model.save_pretrained('my_model')

Then on the command line when running the takeoff server you can mount the model directory onto Takeoff's internal /code/models/jf folder.

docker run --gpus all \
-v /path/to/\<my_model>:/code/models/jf/<my_model> \
-e TAKEOFF_MODEL_NAME=my_model \
-e TAKEOFF_DEVICE=cuda \
tytn/takeoff-pro:0.20.0-gpu

Supported Models Table


Here's a list of all the supported models categorized by size that are readily available on our Hugging Face Hub. These models are divided into three categories based on their parameter size: Small Models (less than 5 billion parameters), Medium Models (between 5 billion and 13 billion parameters), and Large Models (more than 13 billion parameters).

Our Hugging Face Hub hosts various pre-trained models optimized for different use cases, from natural language processing tasks like text generation, question answering, and embeddings, to advanced multi-modal models for specific domains. This allows developers, researchers, and machine learning engineers to deploy these models directly or fine-tune them according to their needs, saving time and computational resources.

We strive to keep these models updated and continuously add new versions with optimized performance and smaller memory footprints (like quantized models). If you're looking for a model that fits your project’s specific requirements, explore the comprehensive list below:

Model SizeGenerativeVLMModel SizeEmbeddingReranker
Small (< 5B)TitanML/Qwen2-1.5BTitanML/Qwen2-VL-1.5BSmall (< 100m)jinaai/jina-embeddings-v2-small-enjinaai/jina-reranker-v1-tiny-en
TitanML/Qwen2-Math-1.5BTitanML/InternVL-1Bsentence-transformers/all-MiniLM-L6-v2jinaai/jina-reranker-v1-turbo-en
TitanML/gemma-2-2bgoogle/paligemma-3b-pt-896mixedbread-ai/mxbai-embed-large-v1
Qwen/Qwen2-0.5BQwen/Qwen2-VL-2B-Instruct
OpenGVLab/InternVL2-4B
Medium (5B ~ 13B)TitanML/Meta-Llama-3.1-8Bllava-hf/llava-v1.6-mistral-7b-hfMedium(100m ~ 300m)TitanML/jina-v2-base-en-embedBAAI/bge-reranker-base
TitanML/Qwen2-7Bllava-hf/llama3-llava-next-8b-hfjinaai/jina-embeddings-v2-base-enibm/re2g-reranker-nq
TitanML/Mistral-7B-Instruct-v0.3-AWQ-4bitllava-hf/llava-1.5-7b-hfintfloat/multilingual-e5-small
mistralai/Mistral-7B-v0.3OpenGVLab/InternVL2-8Bnomic-ai/nomic-embed-text-v1.5
Large (> 13B)TitanML/Meta-Llama-3.1-70B-InstructTitanML/InternVL2-Llama3-76B-AWQLarge(> 300m)intfloat/multilingual-e5-large-instructBAAI/bge-reranker-v2-m3
meta-llama/Meta-Llama-3.1-405B-InstructQwen/Qwen2-VL-72B-InstructBAAI/bge-large-en-v1.5BAAI/bge-reranker-v2-gemma
TitanML/Qwen-72B-Chatllava-hf/llava-1.5-13b-hfBAAI/bge-reranker-v2.5-gemma2-lightweightopenbmb/MiniCPM-Reranker
Qwen/Qwen2-57B-A14BOpenGVLab/InternVL2-26B
OpenGVLab/InternVL2-40B