Skip to main content

0.10.0

  • Introduced a new custom takeoff inference engine, which standardizes backend processes and offers an enhanced interface for generation models.
  • In light of the unified backend, continuous batching now works for all generation models.
  • Implemented GPU/CPU utilization tracking metrics.
  • Released takeoff_client, a Python client package on PyPI for server interaction.
  • Removed the option to select backends from the management frontend.
  • Overhauled all documentation. Add API References section.
  • Added support for Mixtral