MobileTransformers (or ORTransformersMobile) is a modular framework designed for fully on-device execution of large and small language models (LLM / SLM) on mobile and edge devices.
Built on top of ONNX Runtime, it leverages hardware-accelerated execution providers such as XNNPACK, NNAPI, and QNN for efficient inference and training on Android and similar platforms.

Example of MobileTransformers Android application running on Google Pixel 6 (2021) with support for on-device LLM training and inference with retrieval-augmented generation.
The main code base with implementation including:
MobileTransformers main codebase
For a comprehensive understanding of the research behind MobileTransformers, including detailed explanations of Multi-Adapter Rank Sharing (MARS), on-device training methodologies, and experimental results:
Masterโs Thesis - Parameter-Efficient Tuning of Large Language Models on Mobile Devices
Installation instructions, training and inference examples, and API documentation.
A comprehensive, privacy-first framework that empowers researchers and developers to export, fine-tune, merge, and deploy transformer-based language models directly on your Android device. Eliminate dependency on cloud services while maintaining full control over your AI models in your pocket. Perfect for privacy-preserving NLP applications, offline AI assistants, personalized chatbots, and edge computing scenarios where data sovereignty and real-time responsiveness are crucial. Whether youโre building the next generation of pocket AI or developing enterprise edge solutions, MobileTransformers provides the foundation for truly autonomous mobile intelligence.
Key Benefits:
This comprehensive repository provides everything needed for on-device LLM deployment:
The Android app is split into two main parts:
๐ฒ Kotlin UI Layer
A lightweight interface acting as a communication bridge, calling APIs from the backend on the mobile device
โ๏ธ Backend: MobileTransformers
The core engine of the entire framework, implemented in Kotlin and C++. Can be easily implemented in re-used in another application, pick and choose which features you need.
๐ง Key features include:
| Feature | Description |
|---|---|
| โ Export custom PyTorch Huggingface SLM / LLM models | Convert Huggingface models with PEFT methods to training & ONNX inference models for on-device use |
| โ On-device fine-tuning/training loop | Perform parameter-efficient training (PEFT) directly on mobile devices |
| โ On-device generation loop with KV caching | Efficient text generation using cached key-value tensors for faster autoregressive inference |
| โ Customizable training and generation | Flexible configuration to adapt training and generation to specific tasks and hardware |
| โ On-device weight exporting | Save trained or merged weights directly on-device (mobile filesystem) |
| โ On-device weight merging | Merge base and PEFT weights on-device, with optional quantization for optimized size and speed |
| โ Direct inference from merged weights | Load merged weights into the inference graph for seamless on-device model execution |
| โ Retrieval-Augmented Generation (RAG) | Fully on-device vector database integration with ObjectBox for augmented generation |
Example of a model being adapted to a personalized smartphone automation dataset where users express intents and the model recommends appropriate automatic actions to perform on the device. This task-oriented dataset is specifically designed for on-device intelligence scenarios.
| ๐งฉ Base Model | โ๏ธ On-device Fine-tuned model |
|---|---|
![]() |
![]() |
This example shows how a base model can be fine-tuned and personalized entirely on-device, meaning no data ever leaves the device. During the process, adapters are trained locally, then merged and integrated into the base model on the mobile phone to produce the final fine-tuned version.
MobileTransformers is designed as a flexible platform, allowing easy extension for advanced on-device ML workflows, such as:
If you are using this framework for your own work, please cite:
@misc{mobiletransformers2025,
author = {Koreli\v{c}, Martin and Pejovi{\'c}, Veljko},
title = {MobileTransformers: An On-Device LLM PEFT Framework for Fine-Tuning and Inference},
year = {2025},
howpublished = {\url{https://gitlab.fri.uni-lj.si/lrk/mobiletransformers}}
}
If you find the research behind MobileTransformers and MARS useful, please also cite the Masterโs Thesis:
@phdthesis{Koreliฤ_2025,
title={Parameter-Efficient Tuning of Large Language Models on Mobile Devices},
url={https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=175561},
author={Koreliฤ, Martin},
year={2025}
}
This work was supported by the Slovenian Research Agency grant no. N2-0393 approXimation for adaptable diStributed artificial intelligence and grant no. J2-3047 Context-Aware On-Device Approximate Computing.