Inference Optimization

Speculative decoding is one of the most powerful techniques for speeding up large language models to come out of the AI literature. In this article I'll talk a bit about how speculative decoding works, how the standard model of speculative decoding is 'doing it wrong', and how we implement speculative decoding in our Doubleword engine.