Takeoff 0.21.2 is released 🎉 Speak with us to find out more: hello@titanml.co
- Added a detokenization endpoint: use takeoff to turn tokens into text.
- Enhanced gemma 2 support.
- Chunked prefilling is now enabled by default.
- Various internal optimizations: should see increased throughput throughout takeoff.
- Decreased memory usage for prefix caching.
- Fix chat templates for distributed takeoff setups.
- Fix for a bug that could reduce performance for long context Llama 3.1.
- Fix some overly verbose logging.