Coral NPU datasheet

Overview

Coral NPU is an open-source IP designed by Google Research and is freely available for integration into ultra-low-power System-on-Chips (SoCs) targeting wearable devices such as hearables, augmented reality (AR) glasses and smart watches. Coral is a neural processing unit (NPU), also known as an AI accelerator or deep-learning processor.

Coral NPU is a highly efficient hardware accelerator for ML inferencing. It lowers SoC energy consumption by reducing both the control overhead of processor (ALU) operations and the distance data must travel within the memory hierarchy.

Coral NPU's instruction set is domain-optimized for ML, featuring application-specific arithmetic operations and data paths built to feed the compute engines efficiently.

This specialized design contrasts with traditional CPUs. CPUs are highly flexible, offering general-purpose instruction sets, an accessible memory model, and uniform software APIs. This allows them to use standard compilers and tools, leveraging broad ecosystem investments. However, this general-purpose programmability makes CPUs inefficient for AI, with a much lower compute-per-Joule, as they lack domain specialization.

Older ML accelerators, on the other hand, are not robust. While often efficient, they are difficult to program, have a limited ability to reuse compute resources for general tasks, and have poor support for execution-as-function-call. The Coral NPU, in contrast, is designed to be both efficient and robust. It is an energy-efficient, C-programmable accelerator that resolves this tradeoff.

Coral NPU is based on the 32-bit RISC-V Instruction Set Architecture (ISA). The extensible industry-standard RISC-V ISA empowers developers to create optimized, customizable computing solutions, from custom processors to hardware/software co-designs. Extending the ISA with custom compute features makes Coral NPU the ideal hardware accelerator for running ML inferencing operations such as:

  • Image classification
  • Person detection
  • Pose estimation
  • Transformer (neural network)

Coral NPU enables a software-focused approach to AI hardware, as well as a unified programming model across AI workloads running on CPUs, GPUs, and NPUs. RISC-V allows for domain-specific customization tailored to particular applications and workloads by enabling the selection of appropriate ratified extensions to the standard—for example vector and matrix operations. For the complete list of RISC-V specifications, see the official RISC-V specifications.

This Coral NPU documentation provides:

  • Hardware reference and IP integration guidelines for commercial silicon designers and academia
  • Guidance for using the software toolchain to compile and run ML models on the hardware

The Coral NPU open-source code base is available on GitHub.

Coral NPU components

Coral NPU includes three distinct processor components that work together: matrix, vector (SIMD), and scalar. The terms SIMD (single-instruction, multiple data) and vector are used synonymously here.

Coral NPU components

Subcomponent Description
Fetch / decode Retrieves and interprets program instructions
Scoreboard Manages instruction dependencies to prevent conflicts
ALU Arithmetic logic unit
BRU Branch unit: controls the program's flow (loops, jumps)
LSU Load/store unit: moves data between registers and memory
CSR Control/status registers
ITCM Instruction tightly-coupled memory
DTCM Data tightly-coupled memory
SIMD Single-instruction, multiple data
MAC Multiply-accumulate execution engine

Coral NPU features

Coral NPU offers the following top-level feature set:

  • RV32IMF_Zve32x RISC-V instruction set
  • 32-bit address space for applications and operating system kernels
  • Four-stage processor, in-order dispatch, out-of-order retire
  • Four-way scalar, two-way vector dispatch
  • 128-bit SIMD, 256-bit (future) pipeline
  • 8 KB ITCM memory (tightly-coupled memory for instructions)
  • 32 KB DTCM memory (tightly-coupled memory for data)
  • Both memories are single-cycle-latency SRAM, more efficient than cache memory
  • AXI4 bus interfaces, functioning as both master and slave, to interact with external memory and allow external CPUs to configure the Coral NPU
  • GNU Debugger (GDB) support

Coral NPU's RV32IMF_Zve32x specification includes the following extensions to the RISC-V ISA base:

  • RV32IM – this label indicates the RISC-V 32-bit base instruction set with I (integer) and M (multiplication, division) extensions (operations).
  • For the complete list of ratified and in-progress extensions, see RISC-V specifications; a partial list is given here for quick reference.
  • I – Base integer instruction set
  • M – Standard Extension for Integer Multiplication and Division
  • F – Standard Extension for Single-Precision Floating-Point
  • V – Standard Extension for Vector Operations; see RISC-V specification PDF
  • Zve32x – subset of V (vector operations), refer to "Zve* Vector Extensions for Embedded Processors" in the RISC-V proposed specification

Scalar core features

  • RV32IM frontend that drives command queues of the ML and SIMD backend
  • Thirty-one 32-bit scalar registers (x1 – x31), with x0 hardwired to zero
  • Various control and status registers (CSRs) per RV32_Zicsr

Vector execution engine features

  • Supports data widths of 8, 16, and 32 bits
  • Thirty-one 256-bit vector registers (v0 – v31)
    • For example, int32 x 8-bit
  • 8 x 8 x 32-bit accumulator (acc<8><8>)

Matrix execution engine features

  • Quantized outer-product multiply-accumulate (MAC) engine
  • 4 x 8-bit multiplies reduced into 32-bit accumulators
  • 256 MACs per-cycle

The custom vector and matrix compute features of Coral NPU will be migrated to adhere to the future RISC-V standards, when available.

A more detailed view of the Coral NPU architecture is shown below.

Coral NPU architecture