Skip to main content
Version: 0.21.x

This module contains the TakeoffClient class, which is used to interact with the Takeoff server.

TakeoffClient Objects

class TakeoffClient()

__init__

def __init__(base_url: str = "http://localhost",
port: int = 3000,
mgmt_port: int = None)

TakeoffClient is used to interact with the Takeoff server.

Arguments:

  • base_url str, optional - base url that takeoff server runs on. Defaults to "http://localhost".
  • port int, optional - port that main server runs on. Defaults to 8000.
  • mgmt_port int, optional - port that management api runs on. Usually be port + 1. Defaults to None.

get_readers

def get_readers() -> dict

Get a list of information about all readers.

Returns:

  • dict - List of information about all readers.

embed

def embed(text: Union[str, List[str]],
consumer_group: str = "primary") -> dict

Embed a batch of text.

Arguments:

  • text str | List[str] - Text to embed.
  • consumer_group str, optional - consumer group to use. Defaults to "primary".

Returns:

  • dict - Embedding response.

classify

def classify(text: Union[str, List[str], List[List[str]]],
consumer_group: str = "classify") -> dict

Classify a batch of text.

Text that is passed in as a list of list of strings will be concatenated on the innermost list, and the outermost list treated as a batch of concatenated strings.

Concatenation happens server-side, as it needs information from the model tokenizer.

Arguments:

  • text str | List[str] | List[List[str]] - Text to classify.
  • consumer_group str, optional - consumer group to use. Defaults to "classify".

Returns:

  • dict - Classification response.

generate

def generate(text: Union[str, List[str]],
sampling_temperature: float = None,
sampling_topp: float = None,
sampling_topk: int = None,
repetition_penalty: float = None,
no_repeat_ngram_size: int = None,
max_new_tokens: int = None,
min_new_tokens: int = None,
regex_string: str = None,
json_schema: Any = None,
prompt_max_tokens: int = None,
consumer_group: str = "primary",
image_path: Optional[Path] = None) -> dict

Generates text, seeking a completion for the input prompt. Buffers output and returns at once.

Arguments:

  • text str - Input prompt from which to generate
  • sampling_topp float, optional - Sample from set of tokens whose cumulative probability exceeds this value
  • sampling_temperature float, optional - Sample predictions from the top K most probable candidates
  • sampling_topk int, optional - Sample with randomness. Bigger temperatures are associated with more randomness.
  • repetition_penalty float, optional - Penalise the generation of tokens that have been generated before. Set to > 1 to penalize.
  • no_repeat_ngram_size int, optional - Prevent repetitions of ngrams of this size.
  • max_new_tokens int, optional - The maximum number of (new) tokens that the model will generate.
  • min_new_tokens int, optional - The minimum number of (new) tokens that the model will generate.
  • regex_string str, optional - The regex string which generations will adhere to as they decode.
  • json_schema dict, optional - The JSON Schema which generations will adhere to as they decode. Ignored if regex_str is set.
  • prompt_max_tokens int, optional - The maximum length (in tokens) for this prompt. Prompts longer than this value will be truncated.
  • consumer_group str, optional - The consumer group to which to send the request.
  • image_path Path, optional - Path to the image file to be used as input. Defaults to None.
  • Note - This is only available if the running model supports image to text generation, for example with LlaVa models.

Returns:

  • Output dict - The response from Takeoff containing the generated text as a whole.

generate_stream

def generate_stream(text: Union[str, List[str]],
sampling_temperature: float = None,
sampling_topp: float = None,
sampling_topk: int = None,
repetition_penalty: float = None,
no_repeat_ngram_size: int = None,
max_new_tokens: int = None,
min_new_tokens: int = None,
regex_string: str = None,
json_schema: dict = None,
prompt_max_tokens: int = None,
consumer_group: str = "primary",
image_path: Optional[Path] = None) -> Iterator[Event]

Generates text, seeking a completion for the input prompt.

Arguments:

  • text str | List[str] - Input prompt from which to generate
  • sampling_temperature float, optional - Sample predictions from the top K most probable candidates
  • sampling_topp float, optional - Sample from set of tokens whose cumulative probability exceeds this value
  • sampling_topk int, optional - Sample with randomness. Bigger temperatures are associated with more randomness.
  • repetition_penalty float, optional - Penalise the generation of tokens that have been generated before. Set to > 1 to penalize.
  • no_repeat_ngram_size int, optional - Prevent repetitions of ngrams of this size.
  • max_new_tokens int, optional - The maximum number of (new) tokens that the model will generate.
  • min_new_tokens int, optional - The minimum number of (new) tokens that the model will generate.
  • regex_string str, optional - The regex string which generations will adhere to as they decode.
  • json_schema dict, optional - The JSON Schema which generations will adhere to as they decode. Ignored if regex_str is set.
  • prompt_max_tokens int, optional - The maximum length (in tokens) for this prompt. Prompts longer than this value will be truncated.
  • consumer_group str, optional - The consumer group to which to send the request.
  • image_path Path, optional - Path to the image file to be used as input. Defaults to None.
  • Note - This is only available if the running model supports image to text generation, for example with LlaVa models.

Returns:

  • Iterator[sseclient.SSEClient.Event] - An iterator of server-sent events.

tokenize

def tokenize(text: str, reader_id: str) -> List[str]

Tokenize a single text item.

The tokenize endpoint can be used to send a string to a models tokenizer for tokenization. The result is a list of tokens. For example, if "my_reader" is the id of a model that uses a Llama tokenizer, The following code will tokenize the string "hello, world" using the Llama tokenizer:

takeoff_client.tokenize("hello, world", reader_id="my_reader") ... ['▁hello', ',', '▁world']

NOTE: The reader_id parameter is not the same as the consumer_group parameter used in other endpoints. Because tokenization is specific to a specific loaded model, we need to specify a unique id that identifies a particular reader. To find this ID for the models currently loaded into your takeoff server, try the following

readers = takeoff_client.get_readers() for reader_group in readers.values(): for reader in reader_group: print(reader["reader_id"])

Arguments:

  • text str - Text to tokenize.
  • reader_id str - The id of the reader to use.

Returns:

  • List[str] - Tokenized text.

detokenize

    def detokenize(self, tokens: Union[list[str],list[int]], reader_id: str, skip_special_tokens: bool=True) -> str:

Detokenizes a list of tokens (as strings) or token ids.

The detokenize endpoint can be used to send a list of tokens to a models tokenizer for detokenization/decoding. The result is an output string. For example, if "my_reader" is the id of a model that uses a Llama tokenizer, The following code will tokenize the string "hello, world" using the Llama tokenizer:

takeoff_client.detokenize( ... ['▁Fish', '▁is', '▁very', '▁nut', 'rit', 'ious', '.'] ... reader_id="my_reader", ... ) ... "Fish is very nutritious."

takeoff_client.detokenize( ... [1, 12030, 338, 1407, 18254, 768, 2738, 29889] ... reader_id="my_reader", ... skip_special_tokens=True, ... ) ... "Fish is very nutritious." NOTE: The reader_id parameter is not the same as the consumer_group parameter used in other endpoints. Because (de)tokenization is specific to a specific loaded model, we need to specify a unique id that identifies a particular reader. To find this ID for the models currently loaded into your takeoff server, try the following

readers = takeoff_client.get_readers() for reader_group in readers.values(): for reader in reader_group: print(reader["reader_id"])

Args: tokens (List[str] or List[int]): Tokens/token ids to detokenize/decode. If providing token ids, special tokens can optionally be included. reader_id (str): The id of the reader to use. skip_special_tokens (bool): Whether to include special tokens in the decoded output.

Returns: List[str]: Tokenized text.

create_reader

def create_reader(reader_config: Dict[str, Any]) -> Dict[str, Any]

Create a new reader.

Arguments:

  • reader_config Dict[str, Any] - Dict containing all the reader configuration parameters.

delete_reader

def delete_reader(reader_id: str) -> None

Delete a reader, using their reader_id.

Arguments:

  • reader_id str - Reader id.

list_all_readers

def list_all_readers() -> Dict[str, Dict[str, Any]]

List all readers, ordering by consumer group.

Returns:

Dict[str, Dict[str, Any]]: List of reader ids.

get_reader_config

def get_reader_config(reader_id: str) -> Dict[str, Any]

Get the config.json that a reader is running.

Arguments:

  • reader_id str - Reader id.

Returns:

Dict[str, Any]: Reader configuration.