Speaker

Song Han

Song Han is an associate professor at MIT EECS. He earned his PhD from Stanford, pioneering efficient AI computing techniques such as “Deep Compression” (pruning, quantization) and the “Efficient Inference Engine,” which first introduced weight sparsity to modern AI chips, making it one of the top-5 most cited papers in the 50-year history of ISCA (1953-2023). His innovations, including TinyML and hardware-aware neural architecture search (Once-for-All Network), have advanced AI model deployment on resource-constrained devices.

His recent work on LLM quantization/acceleration (SmoothQuant, AWQ, StreamingLLM) has improved efficiency in LLM inference, adopted by NVIDIA TensorRT-LLM. Song received best paper awards at ICLR’16, FPGA’17, and MLSys’24, the NSF CAREER Award, “35 Innovators Under 35,” IEEE “AI’s 10 to Watch,” and the Sloan Research Fellowship. He co-founded DeePhi (now part of AMD) and OmniML (now part of NVIDIA) and developed the open lecture series EfficientML.ai to share advances in efficient ML research.

More Information:

Overview