Generate (Streamed)
POST/generate_stream
Generate (Streamed)
The /generate_stream
endpoint is used to communicate with the LLM. Use this endpoint when you want to
receive a stream of responses from the LLM, token by token. If you want your response to be
returned all at once, see the /generate
endpoint.
To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.
The response is a stream of server sent events, where each event is a token generated by the LLM. If you've supplied a batch of inputs:
{
"text": ["1 2 3 4", "a b c d"]
}
The server sent events data fields will be a stream of json payloads, with each payload having a text
field containing the token, and a batch_id
field containing the index of the batch that the
token belongs to.
data:{"text": "5", "batch_id": 0}
data:{"text": "e", "batch_id": 1}
data:{"text": "6", "batch_id": 0}
data:{"text": "f", "batch_id": 1}
The specific order in which the various batches' tokens are returned is not guaranteed.
Request​
- application/json
Body
required
- MOD1
- MOD2
Array [
]
text
object
required
oneOf
string
string
Responses​
- 200
- 400
- 422
- 503
Takes in a JSON payload and returns the response token by token, as a stream of server sent events.
- application/json
- Schema
- Example (from schema)
Schema
- MOD1
- MOD2
Array [
]
text
object
required
oneOf
string
string
{
"text": "string"
}
Bad request
Malformed request body
The server is not ready to process requests yet.