Skip to main content
Version: 0.21.x

Create reader

POST 

/reader

Create reader

Takes in a ReaderConfig payload and loads up new python reader

Request​

Body

required

    access_token stringnullable
    cold_cache_cpu_size stringnullable
    cold_cache_cuda_size stringnullable
    constrained_decoding_backend stringnullable
    consumer_group stringrequired
    cuda_graph_cache_capacity int32nullable
    cuda_visible_devices stringnullable
    device stringrequired
    disable_cuda_graph int32nullable
    internal_gateway_ip stringnullable
    lmfe_max_consecutive_whitespaces int32nullable
    log_level stringnullable
    max_batch_size int64nullable
    max_sequence_length int32nullable
    model_name stringrequired
    nvlink_unavailable int32nullable
    page_cache_size stringnullable
    prefill_chunk_size int64nullable
    quant_type stringnullable
    quantize_cache_bits int32nullable
    reader_log_as_json stringnullable
    ssd_cache_size integernullable
    tensor_parallel int32nullable

Responses​

Takes in a JSON payload and loads up new model and/or backend

Schema

    _reader_config

    object

    required

    access_token stringnullable
    cold_cache_cpu_size stringnullable
    cold_cache_cuda_size stringnullable
    constrained_decoding_backend stringnullable
    consumer_group stringrequired
    cuda_graph_cache_capacity int32nullable
    cuda_visible_devices stringnullable
    device stringrequired
    disable_cuda_graph int32nullable
    internal_gateway_ip stringnullable
    lmfe_max_consecutive_whitespaces int32nullable
    log_level stringnullable
    max_batch_size int64nullable
    max_sequence_length int32nullable
    model_name stringrequired
    nvlink_unavailable int32nullable
    page_cache_size stringnullable
    prefill_chunk_size int64nullable
    quant_type stringnullable
    quantize_cache_bits int32nullable
    reader_log_as_json stringnullable
    ssd_cache_size integernullable
    tensor_parallel int32nullable
    _reader_id stringrequired
Loading...