TTODLer-FM TTODLer-FM

Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)


The field of Machine Learning has recently witnessed an unprecedented explosion, providing “sparks of intelligence” across various fields, applications and modalities, and thus enabling enhanced understanding and generation capabilities. This intelligence has enabled applications ranging from protein-folding to vehicle automatic navigation and from multimodal perception in embodied agents to intelligent assistants that were previously unattainable.

A true enabler for this era of hyper-scale foundation models has been the advent of transformer-based architectures, along with the associated hardware acceleration breakthroughs to enable such computation. These models have scaled particularly successfully, offering emerging abilities, such as in-context learning and Chain-of-Thought, without their performance plateauing with more data. This poses a significant stress on data center resources required to keep up with the scaling laws of Foundation Learning, not only computationally but also in terms of gathering new data. This brings significant opportunities for expanding outside the data center and leveraging distributed and on-device resources, be it in the form of training and personalisation or inference at the edge.

Moreover, the emerging field of SLMs (Small Language Models) has been pushing the envelope of GenAI in the opposite direction, effectively compressing LLMs to consumer-friendly sizes. This not only provides a more sustainable alternative that can adjust the model size to the respective downstream task but also paves the way for privacy-sensitive use-cases or applications that require reduced latency. However, doing so in a computationally-friendly and accuracy-preserving manner poses several challenges.

While data center accelerators have undoubtedly shouldered the bulk of the computational burden, consumer and edge devices have become increasingly capable, with powerful SoCs being integrated across the board. Neural Processing Unis (NPUs) are gradually becoming the de facto accelerator for sustaining such workloads, from edge devices to smartphones and laptops. While boosting the computational efficiency of these emerging neural workloads, they still have to deal with various challenges, ranging from limited memory bandwidth interconnect to the thermal dissipation of the device. Moreover, designing such NPUs in a way that optimizes the available area while maintaining computational flexibility and efficiency is a non-trivial task.

Therefore, we are at a cross-road where many use-cases require that we expand from pure hosted solutions to on-device and distributed deployments for accessing the plethora of data at the edge, without breaching the privacy of users, which in turn demands a series of algorithmic, MLSys and privacy advances to happen in a multi-disciplinary manner.

To this end, we look forward to welcoming contributions in the following research areas:

We hope that TTODLer-FM will serve as a forum for researchers across different disciplines to bring forward and discuss challenging topics, share new ideas and exchange experience in the deployment of such systems, both from a theoretical and experimental perspective.

The workshop is co-located with ICML’25 will be held in the beautiful city of Vancouver.

The workshop will take place on the 18th or 19th of July 2025.

Deadline for paper submissions: 23rd of May 2025 AoE Submission portal: openreview.net

Call for Papers