Skip to main content

0.17.0

  • Added a detokenization endpoint: use takeoff to turn tokens into text.
  • Enhanced gemma 2 support.
  • Chunked prefilling is now enabled by default.
  • Various internal optimizations: should see increased throughput throughout takeoff.
  • Decreased memory usage for prefix caching.
  • Fix chat templates for distributed takeoff setups.
  • Fix for a bug that could reduce performance for long context Llama 3.1.
  • Fix some overly verbose logging.