takeoff_client
This module contains the TakeoffClient class, which is used to interact with the Takeoff server.
TakeoffClient Objectsβ
class TakeoffClient()
__init__β
def __init__(base_url: str = "http://localhost",
port: int = 3000,
mgmt_port: int = None)
TakeoffClient is used to interact with the Takeoff server.
Arguments:
base_url
str, optional - base url that takeoff server runs on. Defaults to "http://localhost".port
int, optional - port that main server runs on. Defaults to 8000.mgmt_port
int, optional - port that management api runs on. Usually beport + 1
. Defaults to None.
get_readersβ
def get_readers() -> dict
Get a list of information about all readers.
Returns:
dict
- List of information about all readers.
embedβ
def embed(text: Union[str, List[str]],
consumer_group: str = "primary") -> dict
Embed a batch of text.
Arguments:
text
str | List[str] - Text to embed.consumer_group
str, optional - consumer group to use. Defaults to "primary".
Returns:
dict
- Embedding response.
classifyβ
def classify(text: Union[str, List[str], List[List[str]]],
consumer_group: str = "classify") -> dict
Classify a batch of text.
Text that is passed in as a list of list of strings will be concatenated on the innermost list, and the outermost list treated as a batch of concatenated strings.
Concatenation happens server-side, as it needs information from the model tokenizer.
Arguments:
text
str | List[str] | List[List[str]] - Text to classify.consumer_group
str, optional - consumer group to use. Defaults to "classify".
Returns:
dict
- Classification response.
generateβ
def generate(text: Union[str, List[str]],
sampling_temperature: float = None,
sampling_topp: float = None,
sampling_topk: int = None,
repetition_penalty: float = None,
no_repeat_ngram_size: int = None,
max_new_tokens: int = None,
min_new_tokens: int = None,
regex_string: str = None,
json_schema: Any = None,
prompt_max_tokens: int = None,
consumer_group: str = "primary",
image_path: Optional[Path] = None) -> dict
Generates text, seeking a completion for the input prompt. Buffers output and returns at once.
Arguments:
text
str - Input prompt from which to generatesampling_topp
float, optional - Sample from set of tokens whose cumulative probability exceeds this valuesampling_temperature
float, optional - Sample predictions from the top K most probable candidatessampling_topk
int, optional - Sample with randomness. Bigger temperatures are associated with more randomness.repetition_penalty
float, optional - Penalise the generation of tokens that have been generated before. Set to > 1 to penalize.no_repeat_ngram_size
int, optional - Prevent repetitions of ngrams of this size.max_new_tokens
int, optional - The maximum number of (new) tokens that the model will generate.min_new_tokens
int, optional - The minimum number of (new) tokens that the model will generate.regex_string
str, optional - The regex string which generations will adhere to as they decode.json_schema
dict, optional - The JSON Schema which generations will adhere to as they decode. Ignored if regex_str is set.prompt_max_tokens
int, optional - The maximum length (in tokens) for this prompt. Prompts longer than this value will be truncated.consumer_group
str, optional - The consumer group to which to send the request.image_path
Path, optional - Path to the image file to be used as input. Defaults to None.Note
- This is only available if the running model supports image to text generation, for example with LlaVa models.
Returns:
Output
dict - The response from Takeoff containing the generated text as a whole.
generate_streamβ
def generate_stream(text: Union[str, List[str]],
sampling_temperature: float = None,
sampling_topp: float = None,
sampling_topk: int = None,
repetition_penalty: float = None,
no_repeat_ngram_size: int = None,
max_new_tokens: int = None,
min_new_tokens: int = None,
regex_string: str = None,
json_schema: dict = None,
prompt_max_tokens: int = None,
consumer_group: str = "primary",
image_path: Optional[Path] = None) -> Iterator[Event]
Generates text, seeking a completion for the input prompt.
Arguments:
-
text
str | List[str] - Input prompt from which to generate -
sampling_temperature
float, optional - Sample predictions from the top K most probable candidates -
sampling_topp
float, optional - Sample from set of tokens whose cumulative probability exceeds this value -
sampling_topk
int, optional - Sample with randomness. Bigger temperatures are associated with more randomness. -
repetition_penalty
float, optional - Penalise the generation of tokens that have been generated before. Set to > 1 to penalize. -
no_repeat_ngram_size
int, optional - Prevent repetitions of ngrams of this size. -
max_new_tokens
int, optional - The maximum number of (new) tokens that the model will generate. -
min_new_tokens
int, optional - The minimum number of (new) tokens that the model will generate. -
regex_string
str, optional - The regex string which generations will adhere to as they decode. -
json_schema
dict, optional - The JSON Schema which generations will adhere to as they decode. Ignored if regex_str is set. -
prompt_max_tokens
int, optional - The maximum length (in tokens) for this prompt. Prompts longer than this value will be truncated. -
consumer_group
str, optional - The consumer group to which to send the request. -
image_path
Path, optional - Path to the image file to be used as input. Defaults to None. -
Note
- This is only available if the running model supports image to text generation, for example with LlaVa models.
Returns:
Iterator[sseclient.SSEClient.Event]
- An iterator of server-sent events.
tokenizeβ
def tokenize(text: str, reader_id: str) -> List[str]
Tokenize a single text item.
The tokenize endpoint can be used to send a string to a models tokenizer for tokenization. The result is a list of tokens. For example, if "my_reader" is the id of a model that uses a Llama tokenizer, The following code will tokenize the string "hello, world" using the Llama tokenizer:
takeoff_client.tokenize("hello, world", reader_id="my_reader") ... ['βhello', ',', 'βworld']
NOTE: The reader_id
parameter is not the same as the consumer_group
parameter used in other endpoints.
Because tokenization is specific to a specific loaded model, we need to specify a unique id that identifies
a particular reader. To find this ID for the models currently loaded into your takeoff server, try the following
readers = takeoff_client.get_readers() for reader_group in readers.values(): for reader in reader_group: print(reader["reader_id"])
Arguments:
text
str - Text to tokenize.reader_id
str - The id of the reader to use.
Returns:
List[str]
- Tokenized text.
create_readerβ
def create_reader(reader_config: Dict[str, Any]) -> Dict[str, Any]
Create a new reader.
Arguments:
reader_config
Dict[str, Any] - Dict containing all the reader configuration parameters.
delete_readerβ
def delete_reader(reader_id: str) -> None
Delete a reader, using their reader_id.
Arguments:
reader_id
str - Reader id.
list_all_readersβ
def list_all_readers() -> Dict[str, Dict[str, Any]]
List all readers, ordering by consumer group.
Returns:
Dict[str, Dict[str, Any]]: List of reader ids.
get_reader_configβ
def get_reader_config(reader_id: str) -> Dict[str, Any]
Get the config.json that a reader is running.
Arguments:
reader_id
str - Reader id.
Returns:
Dict[str, Any]: Reader configuration.