Example: Running MobileNet

  • This page demonstrates compiling and running a TensorFlow Lite ML model on Coral NPU's scalar core and vector engine using scripted examples.

  • The MobileNet V1 model, used for computer vision tasks, is executed using reference kernels from TensorFlow Lite Micro on intermediate layers.

  • The process uses the CoreMini AXI high-memory simulator with real inputs and outputs, reporting simulation metrics like execution cycle count.

  • Prerequisites include working in the Google Coral NPU GitHub repository and completing specific software and programming tutorials.

  • Running the model on the vector execution engine demonstrates optimization and performance comparison to the scalar core example.

The scripted example given here let you compile and run a LiteRT (TensorFlow Lite) ML model on Coral NPU's scalar core. The model executed is MobileNet V1, a convolutional neural network (CNN) architecture designed for image classification, object detection, and other computer vision tasks.

The process uses reference kernels from LiteRT Micro, and intermediate layers of MobileNet are executed.

The model is run in the core mini AXI high-memory simulator, with real inputs and real outputs. The simulator output reports simulation metrics, notably the execution cycle count for performance assessment.

Prerequisites

This example assumes you are working in the Google Coral NPU repository on GitHub.

Run MobileNet on the scalar core

To run the script, enter the following command:

bazel run -c opt tests/cocotb/tutorial/tfmicro:cocotb_run_mobilenet_v1

The script performs the following sub-tasks, as illustrated in the figure below:

  1. Compiles run_mobilenet.cc using coralnpu_v2_binary.
    1. coralnpu_v2_binary is a bazel rule that composes flags and settings to compile for the coralnpu_v2 platform.
    2. run_mobilenet.cc is a TFMicro inference with reference kernels.
    3. Finds the example rule run_mobilenet_v1_025_partial_binary.
  2. Uses high-memory TCM (tightly-coupled memory) of Coral NPU to:
    1. Run high-memory programs such as ML model inferencing. See this page for information about high-memory TCM.
    2. Add a new data section .extdata.
    3. Memory buffers such as tensor arena, inputs, and outputs can be stored in this data section: (uint8_t tensor_arena[kTensorArenaSize] __attribute__((section(".extdata"), aligned(16), used, retain)))
  3. Runs the cocotb test suite.
    1. When the TFLite inference program is ready, the cocotb_test_suite bazel rule is used to simulate the program with rvv_core_mini_highmem_axi_model.
    2. Run the script cocotb_run_mobilenet_v1.py:
      • Loads run_mobilenet_v1_025_partial_binary.elf.
      • Invokes and executes the program to halt.
      • Reads the memory buffer inference_status in program memory.

example flow

cocotb is a coroutine-based, co-simulation test bench environment for verifying VHDL and SystemVerilog RTL using Python. cocotb is free, open source, and hosted on GitHub.