Version: 0.21.x

Supported Hardware

Takeoff is designed to work across as wide a range of hardware as possible, to lower the barrier to start working with LLMs. However, to maximize performance on commonly used hardware, hardware-specific optimizations are sometimes used, which means not all hardware types can support all model optimizations. The biggest difference is between Ampere (and later) generation GPUs vs pre-Ampere generation GPUs and CPUs.

Post-Ampere specific optimizations are used for most commonly used model types and for all AWQ models. This means that on CPUs and pre-Ampere GPUs, you cannot use models like Llama, Mistral, or Mixtral out of the box, or any AWQ model.

Hardware Type	Can Use Base Model?	Can Use AWQ Model?
Post-Ampere GPUs	✔️	✔️
Pre-Ampere GPUs	✔️	❌
CPUs	✔️	❌

Pre-Ampere GPUs are those from the Turing and Volta generations or earlier. This includes the T4, V100, and Quadro RTX 8000 GPUs. It also includes any GPU from the 10xx and 20xx series.

Post-Ampere GPUs are those from the Ampere or Hopper generation of GPUs or later. This includes the A10, A6000, A100, H100, L4, and L40S GPUs. It also includes any GPU from the 30xx and 40xx series.