Below is the list of accepted papers:
Awards
- Best paper award: Kinetics: Rethinking Test-Time Scaling Laws (Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, et al.)
- Best poster award: Gatekeeper: Improving Model Cascades Through Confidence Tuning (Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha, Petra Poklukar, Wittawat Jitkrittum, Sean Augenstein, Congchao Wang, Federico Tombari)
- Honorable mention: LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning (Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horváth)
Orals
- Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers (Joshua Barron, Devin White)
- Kinetics: Rethinking Test-Time Scaling Laws (Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, et al.)
- Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction (Yoonjun Cho, Dongjae Jeon, Soeun Kim, Albert No)
- WhisperKit: On-device Real-time ASR with Billion-Scale Transformers (Berkin Durmus, Arda Okan, Eduardo Pacheco, et al.)
- Offloaded Reasoning: Efficient Inference for Large Language Models via Modular Reasoning and Refinement (Ishan Jindal, Jayant Taneja, Badrinath Chandana, et al.)
- Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search (Dongge Han, Menglin Xia, Daniel Madrigal, et al.)
Posters
- Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation for Federated Learning (Grigory Malinovsky, Umberto Michieli, Hasan Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtarik)
- Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights (Yasuyuki Okoshi, Hikari Otsuka, Junnosuke Suzuki, Daichi Fujiki, Masato Motomura)
- Addition is almost all you need: Compressing neural networks with double binary factorization (Vladimír Boža, Vladimír Macko)
- Capability Transfer from Large to Small Models with Synthetically-Generated Data (Lillian Sun, Emma Yang, Arif Dayi)
- TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices (Mingxue Xu, Yao Lei Xu, Danilo Mandic)
- Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order LLM Fine-Tuning (Egor Petrov, Evseev Grigoriy, Aleksey Antonov, Andrey Veprikov, Pavel Plyusnin, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov)
- Higher Acceptance Rates for Speculative Decoding with Randomised Drafting (William Toner, Martin Asenov, Rajkarn Singh, Artjom Joosen)
- Zoop it! Efficient Zero-Order Optimization with Output Perturbation (Xixi Hu, Bo Liu, qiang liu, Xiaocong Du, Bhargav Bhushanam, Louis Feng, Chengyue Gong, Kaizhao Liang)
- MatMuls are Enough for Efficient and Performant Linear-Time Attention (Andrew Argatkiny, Ilya Makarov)
- Zeroth-Order Optimization is Secretly Single-Step Policy Optimization (Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, Yao Shu)
- Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding (Mingxiao Huo, Jiayi Zhang, Hewei Wang, Jinfeng Xu, Zheyu Chen, Huilin Tai, Ian Chen)
- LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning (Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horváth)
- FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training (Filipp Zmushko, Aleksandr Beznosikov, Martin Takac, Samuel Horváth)
- Towards understanding of orthogonalization in Muon (Valentyn Boreiko, Zhiqi Bu, Sheng Zha)
- FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression (Kuan-Ting Tu, Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien)
- Gatekeeper: Improving Model Cascades Through Confidence Tuning (Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha, Petra Poklukar, Wittawat Jitkrittum, Sean Augenstein, Congchao Wang, Federico Tombari)
- DiffusionBlocks: Continuous-Time Blockwise Training Through Score-Based Diffusion Models (Makoto Shing, Takuya Akiba)
- Compression of Large Language Models by Neuron Summary (Yancheng Wang, Dongfang Sun, Yingzhen Yang)
- Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models (Katrina Brown, Aneesh Muppidi, Rana Shahout)
- FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training (Haoyuan Li, Mathias Funk, Jindong Wang, Aaqib Saeed)
- Overcoming label shift in targeted federated learning (Adam Breitholtz, Edvin Listo Zec, Fredrik Johansson)
- Token-Efficient RL for LLM Reasoning (Alan Lee, Harry Tong)
- Dynamic Guardian Models: Realtime Content Moderation With User-Defined Policies (Monte Hoover, Vatsal Baherwani, Neel Jain, Khalid Saifullah, Joseph Vincent, Chirag Jain, Melissa Rad, C. Bayan Bruss, Ashwinee Panda, Tom Goldstein)
- First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions (Egor Shulgin, Grigory Malinovsky, Sarit Khirirat, Peter Richtarik)
- Lion Cub: Minimizing Communication Overhead in Distributed Lion (Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden)
- SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning (Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richtarik)