Skip to main content
Version: 0.21.x

Generate (Streamed)

POST 

/generate_stream

Generate (Streamed)

The /generate_stream endpoint is used to communicate with the LLM. Use this endpoint when you want to receive a stream of responses from the LLM, token by token. If you want your response to be returned all at once, see the /generate endpoint.

To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.

The response is a stream of server sent events, where each event is a token generated by the LLM. If you've supplied a batch of inputs:

{
"text": ["1 2 3 4", "a b c d"]
}

The server sent events data fields will be a stream of json payloads, with each payload having a text field containing the token, and a batch_id field containing the index of the batch that the token belongs to.

data:{"text": "5", "batch_id": 0}

data:{"text": "e", "batch_id": 1}

data:{"text": "6", "batch_id": 0}

data:{"text": "f", "batch_id": 1}

The specific order in which the various batches' tokens are returned is not guaranteed.

Request​

Body

required

    constrained_decoding_backend stringnullable
    consumer_group stringnullable
    image_paths string[]nullable
    json_schema nullable
    max_new_tokens int64nullable
    min_new_tokens int64nullable
    no_repeat_ngram_size int64nullable
    prompt_max_tokens int64nullable
    regex_string stringnullable
    repetition_penalty floatnullable
    sampling_temperature floatnullable
    sampling_topk int64nullable
    sampling_topp floatnullable

    text

    object

    required

    Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

    oneOf

    string

Responses​

Takes in a JSON payload and returns the response token by token, as a stream of server sent events.

Schema

    text

    object

    required

    Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

    oneOf

    string

Loading...