📄️ JSON Structured Generation
Introduction
📄️ Batching Strategies
Introduction
📄️ Speculative Decoding
Introduction
📄️ Doubleword Caching Engine
Speculative decoding is one of the most powerful techniques for speeding up large language models to come out of the AI literature. In this article I'll talk a bit about how speculative decoding works, how the standard model of speculative decoding is 'doing it wrong', and how we implement speculative decoding in our Doubleword engine.