Why you should use Structured Generation when inferencing LLMs
TitanML is introducing a new feature, Structured Generation, which is designed to give you more control over your models and revolutionise how you interact with and process data outputs. Our NLP Engineer, Josh, recently showcased this development in the video above and we're here to walk you through our findings and what it means for you and your projects.
What is Structured Generation?​
Structured generation is a technique focused on producing data in a specific, pre-defined format. Unlike traditional unstructured or semi-structured outputs, structured generation ensures data consistency and adherence to a predetermined schema or format, making it highly efficient for further processing and analysis.
This has been realised in two forms - Regex decoding, and JSON Decoding. Regex decoding is designed for shorter outputs or those requiring character-level precision in their format. JSON decoding is more suited to producing primitive types structured into objects following a set schema, with this schema defined using the JSON-Schema language.
Why Structured Generation?​
The main advantage of using structured generation lies in its ease of integration with your downstream tasks. By forcing the model to produce data in a consistent format, it reduces the need for additional processing or data cleaning. This ensures not only a streamlined workflow, but also compliance with the typed ontologies and the ability to inject further prompt information local to individual values being generated.
Regex decoding example: Formatting dates​
In the video, we illustrated the practicality of structured generation with a simple example: generating the birth date of Bob Marley. The language model can output this information in three different formats: unstructured, semi-structured, and structured. While unstructured data might be suitable for casual conversations, structured data, like dates in ISO 8601 format, are immediately usable in a database or for further computational processes as they adhere to a predefined YYYY-MM-DD schema.
How Does It Work?​
This functionality can be achieved through masking out next tokens which do not adhere to the requirements of the regex. This is achieved by converting the regex into a finite state machine, and traversing it in tandem with the generation of tokens. A key challenge arises here with tokenisation - language models typically operate on tokens, which usually do not align with individual characters. As such, the FSM must be traversed with multiple transitions per token. The masking out of non-compliant tokens is achieved by driving down their assigned value in the generated logits, as to make it impossible for it to be selected as the next token.
Advanced Structuring with JSON Schemas​
Another aspect of structured generation is the use of JSON schemas. These schemas define the structure of the output, ensuring that it adheres to a specific format. They offer a blueprint for the data structure, specifying what fields are expected and their data types. For example, if you need information about a city, you can define a schema with fields like country, population, and even list of city districts etc. The language model then generates data that fits this schema, making it incredibly useful for structured data queries. A further benefit here is the proximity of the key to the tokens being generated - the model can be seen to be further prompted by virtue of the adjacent key.
JSON DeRuLO​
In order to address the challenges of structured generation in JSON, we are proposing a set of decoding rules called JSON DeRuLO (Decoding Rules for Language Output):
- Information extraction or generation into JSON according to a specified Schema
- Guaranteed adherence to required types
- Allowance for extra prompting within keys (tokens detailing specifics of required information are immediately local to those being generated)
- Option to extract data solely as spans from an input text (as a bulletproof RAG implementation)
Conclusion​
Structured generation represents a significant leap forward in improving the usability and application of language models. Whether you're working on complex data analysis, building databases, or simply need precise information extraction, structured generation is set to be a game changer.
About TitanML​
TitanML enables machine learning teams to effortlessly and efficiently deploy large language models (LLMs). Their flagship product, Titan Takeoff Inference Server, is already supercharging the deployments of a number of ML teams.
Founded by Dr. James Dborin, Dr. Fergus Finn and Meryem Arik, and backed by key industry partners including AWS and Intel, TitanML is a team of dedicated deep learning engineers on a mission to supercharge the adoption of enterprise AI.