Skip to main content
Version: Next

OpenAI Compatibility API


Takeoff has an integrated API layer to provide compatibility with OpenAI's Chat Completion API. This means that developers can now use OpenAI's existing client libraries or minimally adapt existing codebases to interact seamlessly with Takeoff via the API layer.

A full API schema for the compatibility layer is provided here.

Launch Takeoff with OpenAI compatibility layer​


The default port for this OpenAI-compatibility layer is set to 3003. As with the other ports used by Takeoff, this port should be forwarded (using -p) to allow interaction via the OpenAI-compatibility layer - as seen in the example below:

Launching with OpenAI compatibility
docker run --gpus all \
-e TAKEOFF_MODEL_NAME=gpt-3.5-turbo \
-e TAKEOFF_DEVICE=cuda \
-e LICENSE_KEY=[your_license_key] \
-e TAKEOFF_MAX_SEQUENCE_LENGTH=1024 \
-p 3000:3000 \
-p 3003:3003 \
-v ~/.takeoff_cache:/code/models \
tytn/takeoff-pro:0.21.0-gpu

By using this setup, you can directly pass queries in the OpenAI query schema to Takeoff, which will process these requests and return results using the OpenAI response schema. This feature is designed to offer developers an easy and effective way to leverage Takeoff's capabilities while maintaining compatibility with existing OpenAI-based workflows.

Interfacing with Takeoff in OpenAI schema​


For non-Streaming Response​

curl --location 'http://localhost:3003/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "primary",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is inference server?"
}
],
"stream": false
}'

For Streaming Response​

curl --location 'http://localhost:3003/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "primary",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is inference server?"
}
],
"stream": true
}'

Parameter mapping

Important Details on Supported Parameters​

The OpenAI-compatibility API accommodates a specific set of parameters:

  • model: This parameter identifies the consumer group within Takeoff that you wish to use. By default, it is set to 'primary', directing requests to the main processing group.
  • messages: This is an array containing the sequence of messages that represent the conversation history. It's crucial for contextual continuity in interactions.
  • stream: If set, returns a streaming server sent event. Default to 'false'.
  • temperature: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

Using other parameters (such as function_call or functions) are not supported, and will be disregarded if passed to Takeoff.