Compilers

MLIR and IREE compilers

Coral NPU will help foster an open-source software framework to accelerate the development of scalable and power-efficient ML systems on the edge, including open-source design tools and IP libraries.

Machine learning (ML) code is usually compiled on an ML-domain-specific MLIR (Multi-Level Intermediate Representation) compiler. MLIR is an open-source compilation infrastructure geared toward heterogeneous computing platforms. The name multi-level intermediate representation reflects the system’s ability to model computations at various abstraction levels and progressively lower them toward machine code. MLIR is part of the LLVM ecosystem.

IREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end compiler and runtime that lowers ML models to a unified IR. An IREE compiler performs a sequence of passes that lower a high-level representation to an IR expressed using dialects that can be customized to represent target-specific operations.

IREE compilers incorporate important components such as:

  • Linalg, TOSA, and LLVM-IR dialects
  • Genetic algorithms such as tiling
  • The LLVM compiler itself which compiles part of a graph targeting the CSS or the host CPU

Compilation workflow

IREE compilers require plugins for specific target hardware, for example a RISC-V processor such as Coral NPU. ML software developers can initially treat the compiler as a black box, but may eventually want to write customizations to optimize their ML kernels. See IREE GitHub repo and IREE developer site for information about IREE.

banner

The overall workflow to use an IREE compiler is as follows:

  • Convert your TensorFlow Lite, JAX, or PyTorch model to MLIR. For example, an IREE compiler may provide a command-line tool that converts a TFLite model to a binary MLIR file expressed in TOSA (Tensor Operator Set Architecture) dialect.
  • Use the PyTorch models to convert to MLIR using the torch-mlir toolchain. This process involves exporting a PyTorch model and converting it to MLIR in various dialects such as TORCH, TOSA, LINALG_ON_TENSORS, or STABLEHLO.
  • Input the resulting file to the IREE compiler.
  • The input type tells the compiler what kind of intermediate representation (IR) it’s processing, such as TOSA or PyTorch dialects, and selects the appropriate frontend pipeline.
  • To execute your model, specify the input structure for the IREE run module tool (iree-run-module). You must match the expected shape, data type, and order of the model’s inputs (model signature). Inputs can be provided directly as literals or loaded from files like .npz or .bin.

Starting from Python code (TensorFlow Lite, for example), the compiler converts it to a TFLite buffer which is then transformed into a TOSA file (the MLIR dialect); finally, the result is compiled into a binary for the target device. More details on the IREE TFLite tools can be found on the IREE site.

The guiding principle of MLIR is to have explicit lowering stages in the compilation, where each stage (dialect) represents a more concrete program representation. This allows for appropriate optimizations at each level.

For example, the Synaptics Torq NPU compiler (described below) adds two MLIR dialects: a low-level assembly language dialect and a higher-level abstract dialect for the matrix operations common in ML models.

Synaptics Torq NPU compiler

The SL2610 NPU by Synaptics is the first commercially available SoC that includes the Coral NPU core. Synaptics’ NPU IREE plugin enables the compilation of ML models for this SoC and provides simulation capabilities for the corresponding hardware.

In particular, the Torq NPU compiler is customized to take advantage of the SL2610’s matrix processing unit—which uses a custom instruction set. Note that this matrix engine is Synaptics-proprietary and is different from the Coral NPU matrix unit. For complete information, see Synaptics developer site.