0.5.0
0.5.0​
Features​
This release was focused on tools to integrate RAG funtionalities within Takeoff. We add support for embedding models with the BERT arcitechture. This gives an easy way to embed thousands of documents quickly. A single GPU can host a BERT model alongside one or more generative models, meaning multiple applications can be powered by a single GPU.
We also introduce controlled generation to the API. You can specify a regex string or a json scheme in the api which will guarantee that the output will match the schema / regex.
- Add structured generation: JSON + regex outputs
- Support multiple readers dynamically
- Add "prompt_max_tokens" generation parameter across backends, for truncating prompts to max number of tokens
- Frontend for model management, model selection for chat and playground UI
- Embedding (Bert) model support