Introduction

AI-generated Key Takeaways

Coral NPU is an open-source machine learning accelerator core designed for energy-efficient AI at the edge, based on the RISC-V ISA.
It addresses edge AI ecosystem fragmentation by providing a unified, C-programmable environment with native tensor processing capabilities and support for multiple ML frameworks.
Coral NPU features an ML-first architecture with a dedicated quantized MAC engine, an integrated RISC-V vector core, and a simple scalar core for ultra-low power consumption.
Its design targets 512 GOP/S performance and ultra-low power consumption (~6mW) at 1GHz, making it suitable for ambient sensing and wearable devices.
Coral NPU is ideal for devices like hearables, smartwatches, smart glasses, and smart home devices, enabling use cases such as contextual awareness, audio processing, and image processing.

What is Coral NPU?

Coral NPU is a machine learning (ML) accelerator core designed for energy-efficient AI at the edge. Based on the open hardware RISC-V ISA, it is available as validated open-source IP, for commercial silicon integration.

Coral NPU’s open-source strategy aims to create a standard architecture to accelerate the edge AI ecosystem and is based on the prior work of Google Research in Coral.ai. First released in 2023 as a component of the Open Se Cura research project, it is now a dedicated initiative to drive this vision forward.

Coal NPU inspiration

The problem Coral NPU solves

Coral NPU directly addresses the significant fragmentation in the edge AI device ecosystem. Developers currently face a steep learning curve and major programming complexity because programming models differ between separate general-purpose (CPU) and ML compute blocks. These ML blocks often rely on command buffers generated by specialized, proprietary compilers. This fragmented approach makes it difficult to combine the strengths of each compute unit and forces developers to manage multiple proprietary and opaque toolchains for bespoke architectures.

Coral NPU is built on the RISC-V ISA standard, extending the C programming environment with native tensor processing capabilities. It supports multiple machine learning frameworks including: JAX, PyTorch, and LiteRT (TensorFlow Lite) using open standards-based tools like Multi-Level Intermediate Representation (MLIR), from the Low Level Virtual Machine (LLVM) project for compiler infrastructure.

This integration of native ML acceleration primitives with a general purpose computing ISA, delivers high ML performance without the usual system complexity, cost, and data movement associated with separate, proprietary CPU/NPU designs.

Key differentiators

Coral NPU's design is driven by several key principles:

ML-First Architecture: Coral NPU reverses the traditional processor design. Instead of starting with basic scalar computing, then adding vector (SIMD) and finally matrix capabilities, Coral NPU is built with matrix (ML) capabilities first, then integrates vector and scalar functions. This tight integration of scalar/vector/matrix in a single ISA approach optimizes the entire architecture for AI workloads from its foundation. See Architecture overview for more details.
Dedicated ML Engine: At the center of the design is a quantized outer product multiply-accumulate (MAC) engine, purpose-built for the fundamental calculations of neural networks. This specialized core processes 8-bit operations into 32-bit results with extreme efficiency.
Integrated Vector (SIMD) Core: The vector co-processor implements the RISC-V Vector Instruction Set (RVV) v1.0, using a 64 x 256 bit vector register file and a "strip-mining" mechanism where a single instruction triggers multiple operations, significantly boosting efficiency.
Simple, C-Programmable Scalar Core: A lightweight RISC-V (RV32IM) frontend acts as a simple controller, managing and feeding the powerful Matrix and Vector backend. This core is designed for a "run-to-completion" model, meaning it doesn't require a complex operating system or frequent interrupts, contributing to its ultra-low power consumption.
Efficient Memory Management: Coral NPU uses a single layer of small, fast cache (8KB for instructions, 16KB for data) to keep data close to the processing units, minimizing power and latency.
Unified Developer Experience: The platform is C-programmable and designed for easy integration with modern ML compilers like LiteRT Micro and IREE. This allows a unified, MLIR-based toolchain to support models using major frameworks including TensorFlow, JAX, and PyTorch.

Performance and efficiency highlights

Coral NPU's design delivers a highly efficient balance of power, performance, and size, making it ideal for ambient applications and scalable to multicore setups.

Performance: Targets 512 GOP/S (Giga Operations Per Second) with 256 MACs per cycle.
Power Objective: Ultra-low consumption, targeting ~6mW at 1GHz, using 22nm technology.

Target applications

Coral NPU is designed to enable ultra-low-power, always-on edge AI applications, particularly focused on ambient sensing systems. Its primary goal is to enable all day AI-experiences on wearable devices minimizing battery usage.

Potential use cases

Contextual Awareness: Detecting user activity (e.g., walking, running), proximity, or environment (e.g., indoors/outdoors, on-the-go) to enable "do-not-disturb" modes or other context-aware features.
Audio Processing: Voice and speech detection, keyword spotting, live translation, transcription, and audio-based accessibility features.
Image Processing: Person and object detection, facial recognition, gesture recognition, and low-power visual search.
User Interaction: Enabling control via hand gestures, audio cues, or other sensor-driven inputs.

Ideal device categories

Coral NPU's combination of high efficiency and low power makes it ideal for a wide range of hardware:

Hearables and smart earbuds
Smart glasses and AR headsets
Smartwatches and fitness trackers
Smart home and ambient IoT devices
Mobile phones (for ultra-low-power coprocessing)
Automotive and in-vehicle systems

Roadmap

The initial Coral NPU release includes our RISC-V compliant scalar core and vector execution unit. Future releases will introduce our RISC-V matrix execution unit that will provide a complete, end-to-end, open-source stack for a fully RISC-V compliant NPU. See Platform roadmap for more details.

Introduction Stay organized with collections Save and categorize content based on your preferences.