Skip to main content
Version: 0.20.x

Generate

POST 

/invocations

Generate Effectively a proxy for the /generate endpoint from the inference API.

The /generate endpoint is used to communicate with the LLM. Use this endpoint when you want to receive a full response from the LLM, all at once.

To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.

Request​

Body

required

    consumer_group stringnullable
    json_schema nullable
    max_new_tokens int64nullable
    min_new_tokens int64nullable
    no_repeat_ngram_size int64nullable
    prompt_max_tokens int64nullable
    regex_string stringnullable
    repetition_penalty floatnullable
    sampling_temperature floatnullable
    sampling_topk int64nullable
    sampling_topp floatnullable

    text

    object

    required

    oneOf

    string

Responses​

Takes in a JSON payload and returns the response all at once.

Schema

    text

    object

    required

    oneOf

    string

Loading...