Platform roadmap

Our goal is to provide the industry with an NPU reference implementation, based on RISC-V standards, that is easy to adopt and integrate.

The initial Coral NPU release included our RISC-V compliant scalar core and the vector execution unit. A future release will introduce the RISC-V matrix execution unit that will provide a complete, end-to-end, open-source stack for a fully RISC-V compliant NPU. Below is a high-level roadmap for planned feature additions and enhancements to the platform.

Kelvin platform roadmap

Note that this roadmap is a tentative plan and is subject to change.

Milestone 2
Released: Q4 2025
Milestone 3
Coming in 2026
Focus RISC-V vector (RVV) execution engine launch Premier reference accelerator for Gemma
Highlights
  • For low-power wearable devices, adding always-on ambient sensing.
  • Neural encoding of sensor data, acts as a trigger for high-power domain or tethered device.
  • Scalar core achieves open standards ISA compliance.
  • Introduces new RISC-V vector execution unit, integrated with scalar core for RVV specification-compliant SIMD.
  • MLIR compiler support, from LiteRT (formerly TensorFlow Lite) frontend.
  • For low-power wearable devices, adding always-on ambient sensing.
  • Neural encoding of sensor data, acts as a trigger for high-power domain or tethered device.
  • Scalar core achieves open standards ISA compliance.
  • Improves the RVV vector execution unit to support larger models and transformer architectures.
  • MLIR compiler support, targeting LiteRT and JAX frontends.
Specifications
  • RISC-V 32I ISA 4-stage CPU
  • RISC-V 32V ISA vector unit with VLEN = 128
  • Supported numerics: int8, int16
  • Area = approx. 0.2 mm2 in TSMC 12nm (N12)
  • Performance = 800 MHz clock x 16 MACs per cycle
  • RISC-V 32I ISA 4-stage CPU
  • RISC-V 32V ISA vector unit with VLEN = 128, 256, or 512
  • Supports larger memory capacity, up to 4 GB maximum
  • Supported numerics: int8, int16, FP32 (FP16, BF16)
  • Area = approx. 0.2 mm2 in TSMC 12nm (N12)
  • Performance = 800 MHz clock x 16 MACs per cycle
Reference architectures

CNNs:

  • Lyra (audio)
  • MobileNet (image)
Target model size Less than 1 MB Less than 1 GB
Example use cases
  • Is someone talking
  • Keyword detection
  • Person or object detection
  • Is someone talking
  • Keyword detection
  • Person or object detection
  • Who is the person
  • What is the object
  • LLM transcription/translation
  • Token encoding for tethered device