Langchain
The Takeoff API also has an integration with LangChain, allowing you to inference LLMs or embed queries through the LangChain interface.
See also the official LangChain docs for the Titan Takeoff LLM and Embedding integrations.
Inferencing your LLM through LangChain​
Before making calls to your LLM, make sure the Takeoff Server is up and running. To access your LLM running on the Takeoff Server, import the TitanTakeoff LLM wrapper:
The TitanTakeoffPro
wrapper has been deprecated, but for backwards compatability purposes remains as an alias for TitanTakeoff
.
from langchain_community.llms import TitanTakeoff
llm = TitanTakeoff()
output = llm.invoke("What is the weather in London in August?")
print(output)
No arguments are needed to initialize the llm object if you haven't overwritten any of the default settings when you launched Takeoff. If you have, you can pass in the following parameters to the TitanTakeoff
object:
base_url
(str, optional): The base URL where the Takeoff Inference Server is listening. Defaults tohttp://localhost
.port
(int, optional): What port is Takeoff Inference API listening on? Defaults to 3000.mgmt_port
(int, optional): What port is Takeoff Management API listening on? Defaults to 3001.streaming
(bool, optional): Whether you want to by default use the generate_stream endpoint over generate to stream responses. Defaults to False. In reality, this is not significantly different as the streamed response is buffered and returned similar to the non-streamed response, but the run manager is applied per token generated.models
(List[ReaderConfig], optional): Any readers you'd like to spin up on. Defaults to [].
Specifying Generation Parameters​
You can also specify generation parameters when making a call to the LLM. The following example demonstrates how to do this:
llm = TitanTakeoff()
# A comprehensive list of parameters can be found at https://docs.titanml.co/docs/next/apis/Takeoff%20inference_REST_API/generate#request
output = llm.invoke(
"What is the largest rainforest in the world?",
consumer_group="primary",
min_new_tokens=128,
max_new_tokens=512,
no_repeat_ngram_size=0,
sampling_topk=1,
sampling_topp=1.0,
sampling_temperature=1.0,
repetition_penalty=1.0,
regex_string="",
json_schema=None
)
print(output)
Streaming​
Streaming is also supported via the streaming flag:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager
llm = TitanTakeoff(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
prompt = "What is the capital of France?"
output = llm.invoke(prompt)
print(output)