Towards Principled Design of SLM Agents for Edge Devices
This talk introduces a principled three-stage design methodology for developing SLM agents suitable for edge-device deployment. The first stage involves designing an LLM agent, establishing the accuracy upper bound achievable without hardware constraints. In the second stage, the LLM agent is distilled into an efficient SLM agent via multi-task distillation. The third stage optimizes this SLM agent to achieve the desired trade-off between accuracy and performance through quantization, approximate inference, speculative decoding, and prompt distillation. A practical case study deploying SLM agents in a commercial video game illustrates key challenges and practical insights. The talk concludes with open research questions in SLM agent design. This work is conducted in collaboration with KRAFTON and NVIDIA.