0.17.0

August 14, 2024

Added a detokenization endpoint: use takeoff to turn tokens into text.
Enhanced gemma 2 support.
Chunked prefilling is now enabled by default.
Various internal optimizations: should see increased throughput throughout takeoff.
Decreased memory usage for prefix caching.
Fix chat templates for distributed takeoff setups.
Fix for a bug that could reduce performance for long context Llama 3.1.
Fix some overly verbose logging.