Generate

POST /invocations

Effectively a proxy for the /generate endpoint from the inference API.

The /generate endpoint is used to communicate with the LLM. Use this endpoint when you want to receive a full response from the LLM, all at once.

To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.

Request

Responses

Takes in a JSON payload and returns the response all at once.

Generate

/invocations

Request​

Responses​

Request

Responses