Skip to main content

CHANGELOG

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.15.1

July 10, 2024

Bugfixes and performance improvements

0.15.0

July 9, 2024

Distributed takeoff: distribute a set of takeoff containers over multiple machines

0.14.4

June 5, 2024

Snowflake Integration with Takeoff! See our docs for more information.
New AWQ kernels with improved performance.
Internal throughput optimisations.

0.14.3

May 1, 2024

Internal bugfixes and optimisations relating to: docker permissions when volume mounting model cache, better python GIL management, and token caching.

0.14.2

April 24, 2024

Support Question Answering models in Takeoff - see this section in the classification docs for examples of how to use QA models.

0.14.1

April 19, 2024

Support for Llama 3

0.14.0

April 17, 2024

Fully enabled SSD for static models
Tokenization endpoint to get tokenized text for any live reader
Support for Llava 1.6 models
Introduce new AWQ kernel with significantly lower memory overhead.
Updated LangChain integration, unified TitanTakeoff and TitanTakeoffPro, integrations use management api to spin up models, added text embedding support with TitanTakeoffEmbed.

0.13.2

March 27, 2024

Fixed issue with multi-gpu inference with models that have a bias in their attention linear layers.

0.13.1

March 25, 2024

Fixed the configuration issue with the entrypoint for Mistral embedding models.
Fixed the issue with continuous batching that was causing performance degradation.
Added tokenization endpoint in takeoff.

0.13.0

March 21, 2024

Support for inline images in image to text models. You can now supply an image to the image_generate (and image_generate_stream) endpoint in the form: <image:https://url.com/image.jpg>.
Debug script for diagnosing issues with takeoff deployments.
Support for Jina's long context embedding models.
Support for Mistral based embedding models
Support for API based (openAI) model calls from takeoff.
Changes to default memory usage parameters to reduce the likelihood of OOM errors.
Fix a bug where model downloading was not properly atomic. This means that a failed model download will no longer cause issues for subsequent launches.
Fix a bug where the CPU container was larger than it should have been
Assorted performance improvements and bugfixes
Remove the ability to manually specify the backend that's used by takeoff.