CHANGELOG
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Bugfixes and performance improvements
- Distributed takeoff: distribute a set of takeoff containers over multiple machines
- Snowflake Integration with Takeoff!
See our docs for more information.
- New AWQ kernels with improved performance.
- Internal throughput optimisations.
- Internal bugfixes and optimisations relating to: docker permissions when volume mounting model cache, better python GIL management, and token caching.
- Fully enabled SSD for static models
- Tokenization endpoint to get tokenized text for any live reader
- Support for Llava 1.6 models
- Introduce new AWQ kernel with significantly lower memory overhead.
- Updated LangChain integration, unified
TitanTakeoff
and TitanTakeoffPro
, integrations use management api to spin up models, added text embedding support with TitanTakeoffEmbed.
- Fixed issue with multi-gpu inference with models that have a bias in their attention linear layers.
- Fixed the configuration issue with the entrypoint for Mistral embedding models.
- Fixed the issue with continuous batching that was causing performance degradation.
- Added tokenization endpoint in takeoff.
- Support for inline images in image to text models.
You can now supply an image to the
image_generate
(and image_generate_stream
) endpoint in the form: <image:https://url.com/image.jpg>
.
- Debug script for diagnosing issues with takeoff deployments.
- Support for Jina's long context embedding models.
- Support for Mistral based embedding models
- Support for API based (openAI) model calls from takeoff.
- Changes to default memory usage parameters to reduce the likelihood of OOM errors.
- Fix a bug where model downloading was not properly atomic.
This means that a failed model download will no longer cause issues for subsequent launches.
- Fix a bug where the CPU container was larger than it should have been
- Assorted performance improvements and bugfixes
- Remove the ability to manually specify the backend that's used by takeoff.