Skip to main content
Version: 0.13.x

Embedding endpoints

Embedding with Takeoff requires making a request to the single embedding endpoint: embed.

The endpoint takes only a consumer_group (defaults to primary) and the text to be embedded, and returns a JSON response containing the embedded text.


These are the docs for interfacing with embedding models. If you wish to interface with generative models, see the docs here, or for classification & reranking models, see here.


Takeoff can be interfaced with via the REST API, the GUI, or through our Python client.

# Ensure the 'takeoff_client' package is installed
# To install it, use the command: `pip install takeoff_client`
from takeoff_client import TakeoffClient

client = TakeoffClient(base_url="http://localhost", port=3000)
input_text = 'How expensive was the Hubble telescope to build?'

embedding_response = client.embed(input_text, consumer_group='primary')

Batching requests

Embedding models use Dynamic batching. In Dynamic batching, the batch size is fixed, and incoming requests are buffered until an entire batch is waiting or a timeout is reached, allowing for optimal hardware utilisation. For more information - including how to choose a suitable timeout value - see our conceptual guide to batching.

The timeout and max batch size can be configured by setting the TAKEOFF_BATCH_DURATION_MILLIS and TAKEOFF_MAX_BATCH_SIZE environment variables:

# Timeout of 100ms and max batch size of 32
docker run -it --gpus all -e TAKEOFF_BATCH_DURATION_MILLIS=100 -e TAKEOFF_MAX_BATCH_SIZE=32...