Below is the layout of the examples/mediatek directory, which includes the necessary files for the example applications:
examples/mediatek
├── aot_utils # Utils for AoT export
├── llm_utils # Utils for LLM models
├── preformatter_templates # Model specific prompt preformatter templates
├── prompts # Calibration Prompts
├── tokenizers_ # Model tokenizer scripts
├── oss_utils # Utils for oss models
├── eval_utils # Utils for eval oss models
├── model_export_scripts # Model specifc export scripts
├── models # Model definitions
├── llm_models # LLM model definitions
├── weights # LLM model weights location (Offline) [Ensure that config.json, relevant tokenizer files and .bin or .safetensors weights file(s) are placed here]
├── executor_runner # Example C++ wrapper for the ExecuTorch runtime
├── pte # Generated .pte files location
├── shell_scripts # Shell scripts to quickrun model specific exports
├── CMakeLists.txt # CMake build configuration file for compiling examples
├── requirements.txt # MTK and other required packages
├── mtk_build_examples.sh # Script for building MediaTek backend and the examples
└── README.md # Documentation for the examples (this file)
- Follow the instructions in
backends/mediatek/README.mdto build the backend librarylibneuron_backend.so.
- Build the mediatek model runner by executing the script:
./mtk_build_examples.shThis will generate the required runners in executorch/cmake-android-out/examples/mediatek/
- Download Required Files
- Download the model files from the official Hugging Face website, and move the files to the respective folder in
examples/mediatek/models/llm_models/weights/EXCEPT theconfig.jsonfile.- The
config.jsonfile is already included in the model folders, which may include some modifications required for the model exportation.
- The
- Include the calibration data (if any) under
aot_utils/llm_utils/prompts/
- Exporting Models to
.pte
- In the
examples/mediatek/ directory, run:
source shell_scripts/export_<model_family>.sh <model_name> <num_chunks> <prompt_num_tokens> <cache_size> <calibration_data_file> <precision> <platform>-
Defaults:
model_name= Depends on model family. Check respectiveshell_scripts/export_<model_family>.shfor info.num_chunks= 4prompt_num_tokens= 128cache_size= 512calibration_data_file= Noneprecision= A16W4platform= DX4
-
Argument Explanations/Options:
model_name: View list 'Available model names' below.num_chunks: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Typical values are 1, 2 and 4.prompt_num_tokens: Number of tokens (> 1) consumed each forward pass for the prompt processing stage.cache_size: Cache Size.calibration_data_file: Name of calibration dataset with extension that is found inside theaot_utils/llm_utils/prompts/directory. Example:alpaca.txt. If"None", will use dummy data to calibrate.precision: Quantization precision for the model. Available options are["A16W4", "A16W8", "A16W16", "A8W4", "A8W8"]platform: The platform of the device.DX4for Mediatek Dimensity 9400 andDX3for Mediatek Dimensity 9300. Note: Export script example only tested on.txtfile.
-
Available model names:
- Llama:
- llama3.2-3b, llama3.2-1b, llama3, llama2
- Qwen:
- Qwen3-4B, Qwen3-1.7B, Qwen2-7B-Instruct, Qwen2.5-3B, Qwen2.5-0.5B-Instruct, Qwen2-1.5B-Instruct
- Gemma:
- gemma2, gemma3
- Phi:
- phi3.5, phi4
- Llama:
.ptefiles will be generated inexamples/mediatek/pte/- Users should expect
num_chunksnumber of pte files. - An embedding bin file will be generated in the weights folder where the
config.jsoncan be found in. [examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin] - eg. For
llama3-8B-instruct, embedding bin generated inexamples/mediatek/models/llm_models/weights/llama3-8B-instruct/ - AoT flow will take around 30 minutes to 2.5 hours to complete (Results will vary depending on device/hardware configurations and model sizes)
- Users should expect
- Exporting Model to
.pte
bash shell_scripts/export_oss.sh <model_name>- Argument Options:
model_name: deeplabv3/edsr/inceptionv3/inceptionv4/mobilenetv2/mobilenetv3/resnet18/resnet50/dcgan/wav2letter/vit_b_16/mobilebert/emformer_rnnt/bert/distilbert
Transfer the directory containing the .pte model files, the run_<model_name>_sample.sh script, the embedding_<model_config_folder>_fp32.bin, the tokenizer file, the mtk_llama_executor_runner binary and the 3 .so files to your Android device using the following commands:
adb push mtk_llama_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
adb push examples/mediatek/executor_runner/run_<model_name>_sample.sh <PHONE_PATH, e.g. /data/local/tmp>
adb push embedding_<model_config_folder>_fp32.bin <PHONE_PATH, e.g. /data/local/tmp>
adb push tokenizer.model <PHONE_PATH, e.g. /data/local/tmp>
adb push <PTE_DIR> <PHONE_PATH, e.g. /data/local/tmp>Make sure to replace <PTE_DIR> with the actual name of your directory containing pte files. And, replace the <PHONE_PATH> with the desired detination on the device.
At this point your phone directory should have the following files:
- libneuron_backend.so
- libneuronusdk_adapter.mtk.so
- libneuron_buffer_allocator.so
- mtk_llama_executor_runner
- <PTE_DIR>
- tokenizer.json / tokenizer.model(for llama3) / tokenizer.bin(for phi3 and gemma2)
- embedding_<model_config_folder>_fp32.bin
- run_<model_name>_sample.sh
adb push mtk_oss_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
adb push input_list.txt <PHONE_PATH, e.g. /data/local/tmp>
for i in input*bin; do adb push "$i" <PHONE_PATH, e.g. /data/local/tmp>; done;Execute the model on your Android device by running:
adb shell
cd <PHONE_PATH>
sh run_<model_name>_sample.shNote: The mtk_llama_executor_runner is applicable to the models listed in examples/mediatek/models/llm_models/weights/.
Note: For non-LLM models, please run adb shell "/data/local/tmp/mtk_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --iteration <ITER_TIMES>".
adb shell "/data/local/tmp/mtk_oss_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --input_list /data/local/tmp/input_list.txt --output_folder /data/local/tmp/output_<MODEL_NAME>"
adb pull "/data/local/tmp/output_<MODEL_NAME> ./"python3 eval_utils/eval_oss_result.py --eval_type <eval_type> --target_f <golden_folder> --output_f <prediction_folder>For example:
python3 eval_utils/eval_oss_result.py --eval_type piq --target_f edsr --output_f output_edsr
- Argument Options:
eval_type: topk/piq/segmentationtarget_f: folder contain golden data files. file name isgolden_<data_idx>_0.binoutput_f: folder contain model output data files. file name isoutput_<data_idx>_0.bin