mobiletransformers-docs

๐Ÿ“ฑ MobileTransformers: An On-Device LLM PEFT Framework for Fine-Tuning and Inference

MobileTransformers (or ORTransformersMobile) is a modular framework designed for fully on-device execution of large and small language models (LLM / SLM) on mobile and edge devices.
Built on top of ONNX Runtime, it leverages hardware-accelerated execution providers such as XNNPACK, NNAPI, and QNN for efficient inference and training on Android and similar platforms.

Example of MobileTransformers application

Example of MobileTransformers Android application running on Google Pixel 6 (2021) with support for on-device LLM training and inference with retrieval-augmented generation.


Code Repository

The main code base with implementation including:

MobileTransformers main codebase

Research

For a comprehensive understanding of the research behind MobileTransformers, including detailed explanations of Multi-Adapter Rank Sharing (MARS), on-device training methodologies, and experimental results:

Masterโ€™s Thesis - Parameter-Efficient Tuning of Large Language Models on Mobile Devices


๐Ÿ“ฅ Documentation

Installation instructions, training and inference examples, and API documentation.


๐Ÿš€ What is MobileTransformers?

A comprehensive, privacy-first framework that empowers researchers and developers to export, fine-tune, merge, and deploy transformer-based language models directly on your Android device. Eliminate dependency on cloud services while maintaining full control over your AI models in your pocket. Perfect for privacy-preserving NLP applications, offline AI assistants, personalized chatbots, and edge computing scenarios where data sovereignty and real-time responsiveness are crucial. Whether youโ€™re building the next generation of pocket AI or developing enterprise edge solutions, MobileTransformers provides the foundation for truly autonomous mobile intelligence.

Key Benefits:


๐Ÿ“ฆ Repository Contents

This comprehensive repository provides everything needed for on-device LLM deployment:


๐Ÿ“ฑ Android Application: ORTransformer

The Android app is split into two main parts:

๐Ÿ”ง Key features include:


โœ… Key Capabilities

Feature Description
โœ… Export custom PyTorch Huggingface SLM / LLM models Convert Huggingface models with PEFT methods to training & ONNX inference models for on-device use
โœ… On-device fine-tuning/training loop Perform parameter-efficient training (PEFT) directly on mobile devices
โœ… On-device generation loop with KV caching Efficient text generation using cached key-value tensors for faster autoregressive inference
โœ… Customizable training and generation Flexible configuration to adapt training and generation to specific tasks and hardware
โœ… On-device weight exporting Save trained or merged weights directly on-device (mobile filesystem)
โœ… On-device weight merging Merge base and PEFT weights on-device, with optional quantization for optimized size and speed
โœ… Direct inference from merged weights Load merged weights into the inference graph for seamless on-device model execution
โœ… Retrieval-Augmented Generation (RAG) Fully on-device vector database integration with ObjectBox for augmented generation

๐Ÿ”ง On-device example

Example of a model being adapted to a personalized smartphone automation dataset where users express intents and the model recommends appropriate automatic actions to perform on the device. This task-oriented dataset is specifically designed for on-device intelligence scenarios.

๐Ÿงฉ Base Model โš™๏ธ On-device Fine-tuned model
Base on-device model On-device trained LLM model

This example shows how a base model can be fine-tuned and personalized entirely on-device, meaning no data ever leaves the device. During the process, adapters are trained locally, then merged and integrated into the base model on the mobile phone to produce the final fine-tuned version.


๐Ÿ› ๏ธ Built On


๐ŸŽฏ Why MobileTransformers?


๐Ÿ”ง Extensibility and Future Work

MobileTransformers is designed as a flexible platform, allowing easy extension for advanced on-device ML workflows, such as:


Citation

MobileTransformers Framework

If you are using this framework for your own work, please cite:

@misc{mobiletransformers2025,
  author       = {Koreli\v{c}, Martin and Pejovi{\'c}, Veljko},
  title        = {MobileTransformers: An On-Device LLM PEFT Framework for Fine-Tuning and Inference},
  year         = {2025},
  howpublished = {\url{https://gitlab.fri.uni-lj.si/lrk/mobiletransformers}}
}

Masterโ€™s Thesis

If you find the research behind MobileTransformers and MARS useful, please also cite the Masterโ€™s Thesis:

@phdthesis{Koreliฤ_2025,
  title={Parameter-Efficient Tuning of Large Language Models on Mobile Devices},
  url={https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=175561},
  author={Koreliฤ, Martin},
  year={2025}
}

Acknowledgements

This work was supported by the Slovenian Research Agency grant no. N2-0393 approXimation for adaptable diStributed artificial intelligence and grant no. J2-3047 Context-Aware On-Device Approximate Computing.