Version: 0.16.x

Embedding endpoints

Embedding with Takeoff requires making a request to the single embedding endpoint: embed.

The endpoint takes only a consumer_group (defaults to primary) and the text to be embedded, and returns a JSON response containing the embedded text.

note

These are the docs for interfacing with embedding models. If you wish to interface with generative models, see the docs here, or for classification & reranking models, see here.

Examples

Takeoff can be interfaced with via the REST API, the GUI, or through our Python client.

Python (API Client)
Python (requests)
Javascript
cURL

# Ensure the 'takeoff_client' package is installed
# To install it, use the command: `pip install takeoff_client`
from takeoff_client import TakeoffClient

client = TakeoffClient(base_url="http://localhost", port=3000)
input_text = 'How expensive was the Hubble telescope to build?'

embedding_response = client.embed(input_text, consumer_group='primary')
print(embedding_response)

generation_parameters.py
import requests

input_text = 'How expensive was the Hubble telescope to build?'
url = "http://localhost:3000/embed"

# add the generation parameters to the json payload
json = {
    "text":input_text,
    }

response = requests.post(url, data=json)
print(response.json())

const axios = require('axios');

const inputText = 'How expensive was the Hubble telescope to build?';
const url = 'http://localhost:3000/embed';

// add the generation parameters to the JSON payload
const data = {
  'text': inputText,
};

axios.post(url, data)
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.error('Error:', error);
  });

generation_parameters.sh
curl -X POST "http://localhost:3000/generate_stream" 
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{\"text\":\"How expensive was the Hubble telescope to build?\", \"consumer_group\":\"primary\"}"

Batching requests

Embedding models use Dynamic batching. In Dynamic batching, the batch size is fixed, and incoming requests are buffered until an entire batch is waiting or a timeout is reached, allowing for optimal hardware utilisation. For more information - including how to choose a suitable timeout value - see our conceptual guide to batching.

The timeout and max batch size can be configured by setting the TAKEOFF_BATCH_DURATION_MILLIS and TAKEOFF_MAX_BATCH_SIZE environment variables:

Example

# Timeout of 100ms and max batch size of 32
docker run -it --gpus all -e TAKEOFF_BATCH_DURATION_MILLIS=100 -e TAKEOFF_MAX_BATCH_SIZE=32...

Embedding endpoints

Examples​

Batching requests​

Examples

Batching requests