Generate
POST/generate_vertex
Generate
Generate a full response, designed for use with Vertex AI, it is effectively a proxy for the /generate endpoint on the inference API, with some tweaks for compatibility.
The /generate
endpoint is used to communicate with the LLM. Use this endpoint when you want to
receive a full response from the LLM, all at once.
To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.
Request​
- application/json
Body
required
Array [
- MOD1
- MOD2
Array [
]
]
instances
object[]
required
text
object
required
Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.
oneOf
string
string
Responses​
- 200
- 400
- 422
- 503
Takes in a JSON payload and returns the response all at once.
- application/json
- Schema
- Example (from schema)
Schema
Array [
- MOD1
- MOD2
Array [
]
]
predictions
object[]
required
text
object
required
Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.
oneOf
string
string
{
"predictions": [
{
"text": "string"
}
]
}
Bad request
Malformed request body
The server is not ready to process requests yet.