Embedding endpoints
Embedding with Takeoff requires making a request to the single embedding endpoint: embed.
The endpoint takes only a consumer_group
(defaults to primary
) and the text
to be embedded, and returns a JSON response containing the embedded text.
These are the docs for interfacing with embedding models. If you wish to interface with generative models, see the docs here.
Examples​
Takeoff can be interfaced with via the REST API, the GUI, or through our Python client.
- Python (API Client)
- Python (requests)
- Javascript
- cURL
# Ensure the 'takeoff_client' package is installed
# To install it, use the command: `pip install takeoff_client`
from takeoff_client import TakeoffClient
client = TakeoffClient(base_url="http://localhost", port=3000)
input_text = 'How expensive was the Hubble telescope to build?'
embedding_response = client.embed(input_text, consumer_group='primary')
print(embedding_response)
import requests
input_text = 'How expensive was the Hubble telescope to build?'
url = "http://localhost:3000/embed"
# add the generation parameters to the json payload
json = {
"text":input_text,
}
response = requests.post(url, data=json)
print(response.json())
const axios = require('axios');
const inputText = 'How expensive was the Hubble telescope to build?';
const url = 'http://localhost:3000/embed';
// add the generation parameters to the JSON payload
const data = {
'text': inputText,
};
axios.post(url, data)
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error('Error:', error);
});
curl -X POST "http://localhost:3000/generate_stream"
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{\"text\":\"How expensive was the Hubble telescope to build?\", \"consumer_group\":\"primary\"}"
Batching requests​
Embedding models use Dynamic batching. In Dynamic batching, the batch size is fixed, and incoming requests are buffered until an entire batch is waiting or a timeout is reached, allowing for optimal hardware utilisation. For more information - including how to choose a suitable timeout value - see our conceptual guide to batching.
The timeout and max batch size can be configured by setting the TAKEOFF_BATCH_DURATION_MILLIS
and TAKEOFF_MAX_BATCH_SIZE
environment variables:
# Timeout of 100ms and max batch size of 32
docker run -it --gpus all -e TAKEOFF_BATCH_DURATION_MILLIS=100 -e TAKEOFF_MAX_BATCH_SIZE=32...