Quantization

Training Models in Low Precision: the Promise, the Limitations, and the Scaling Laws

The last few years have seen an explosion of interest in Ai efficiency. One of the holy grails of the area has been training and inferencing models in end-to-end low-precision, for instance by leveraging the quantized matrix multiplication support on modern GPUs. In this talk, I will present some of our lab’s recent work on this topic, investigating low-precision training of LLMs. Specifically, I will cover a new state-of-the-art algorithm for quantized training called QuEST, discuss the limits of current approaches characterized via scaling laws, and about fast kernel support for low-precision training.

Overview Program