Generate
POST/generate_vertex
Generate
Generate a full response, designed for use with Vertex AI, it is effectively a proxy for the /generate endpoint on the inference API, with some tweaks for compatibility.
The /generate
endpoint is used to communicate with the LLM. Use this endpoint when you want to
receive a full response from the LLM, all at once.
To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.
Request​
Responses​
- 200
- 400
- 422
- 503
Takes in a JSON payload and returns the response all at once.
Bad request
Malformed request body
The server is not ready to process requests yet.