Version: Next

Generate

POST /generate_vertex

Generate

Generate a full response, designed for use with Vertex AI, it is effectively a proxy for the /generate endpoint on the inference API, with some tweaks for compatibility.

The /generate endpoint is used to communicate with the LLM. Use this endpoint when you want to receive a full response from the LLM, all at once.

To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.

Request

application/json

Body

required

instances

object[]

required

Array [

constrained_decoding_backend stringnullable

consumer_group stringnullable

image_paths string[]nullable

json_schema nullable

max_new_tokens int64nullable

min_new_tokens int64nullable

no_repeat_ngram_size int64nullable

prompt_max_tokens int64nullable

regex_string stringnullable

repetition_penalty floatnullable

sampling_temperature floatnullable

sampling_topk int64nullable

sampling_topp floatnullable

text

object

required

Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

oneOf

MOD1
MOD2

string

]

Responses

Takes in a JSON payload and returns the response all at once.

application/json

Schema
Example (from schema)

Schema

predictions

object[]

required

Array [

text

object

required

Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

oneOf

MOD1
MOD2

string

]

{
  "predictions": [
    {
      "text": "string"
    }
  ]
}

Generate

/generate_vertex

Request​

Body

Responses​

Request

Responses