Skip to main content

CHANGELOG

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.15.1

  • Bugfixes and performance improvements

0.15.0

  • Distributed takeoff: distribute a set of takeoff containers over multiple machines

0.14.4

  • Snowflake Integration with Takeoff! See our docs for more information.
  • New AWQ kernels with improved performance.
  • Internal throughput optimisations.

0.14.3

  • Internal bugfixes and optimisations relating to: docker permissions when volume mounting model cache, better python GIL management, and token caching.

0.14.1

  • Support for Llama 3

0.14.0

  • Fully enabled SSD for static models
  • Tokenization endpoint to get tokenized text for any live reader
  • Support for Llava 1.6 models
  • Introduce new AWQ kernel with significantly lower memory overhead.
  • Updated LangChain integration, unified TitanTakeoff and TitanTakeoffPro, integrations use management api to spin up models, added text embedding support with TitanTakeoffEmbed.

0.13.2

  • Fixed issue with multi-gpu inference with models that have a bias in their attention linear layers.

0.13.1

  • Fixed the configuration issue with the entrypoint for Mistral embedding models.
  • Fixed the issue with continuous batching that was causing performance degradation.
  • Added tokenization endpoint in takeoff.

0.13.0

  • Support for inline images in image to text models. You can now supply an image to the image_generate (and image_generate_stream) endpoint in the form: <image:https://url.com/image.jpg>.
  • Debug script for diagnosing issues with takeoff deployments.
  • Support for Jina's long context embedding models.
  • Support for Mistral based embedding models
  • Support for API based (openAI) model calls from takeoff.
  • Changes to default memory usage parameters to reduce the likelihood of OOM errors.
  • Fix a bug where model downloading was not properly atomic. This means that a failed model download will no longer cause issues for subsequent launches.
  • Fix a bug where the CPU container was larger than it should have been
  • Assorted performance improvements and bugfixes
  • Remove the ability to manually specify the backend that's used by takeoff.