Skip to main content
Version: 0.10.x

Embedding endpoints

Embedding with Takeoff requires making a request to the single embedding endpoint: embed.

The endpoint takes only a consumer_group (defaults to primary) and the text to be embedded, and returns a JSON response containing the embedded text.


These are the docs for interfacing with embedding models. If you wish to interface with generative models, see the docs here.


Takeoff can be interfaced with via the REST API, the GUI, or through our Python client.

# Ensure the 'takeoff_client' package is installed
# To install it, use the command: `pip install takeoff_client`
from takeoff_client import TakeoffClient

client = TakeoffClient(base_url="http://localhost", port=3000)
input_text = 'How expensive was the Hubble telescope to build?'

embedding_response = client.embed(input_text, consumer_group='primary')

Batching requests

Embedding models use Dynamic batching. In Dynamic batching, the batch size is fixed, and incoming requests are buffered until an entire batch is waiting or a timeout is reached, allowing for optimal hardware utilisation. For more information - including how to choose a suitable timeout value - see our conceptual guide to batching.

The timeout and max batch size can be configured by setting the TAKEOFF_BATCH_DURATION_MILLIS and TAKEOFF_MAX_BATCH_SIZE environment variables:

# Timeout of 100ms and max batch size of 32
docker run -it --gpus all -e TAKEOFF_BATCH_DURATION_MILLIS=100 -e TAKEOFF_MAX_BATCH_SIZE=32...