Integration guide

If you are an SoC hardware designer, follow the guidance here to generate the SystemVerilog for Coral NPU and to integrate it into your system design. The Coral NPU core will be an AXI4/TileLink peripheral in the system.

AXI

A scalar-only Coral NPU configuration is provided that can integrate with an AXI-based system. The SystemVerilog can be generated with this build command:

bazel build //hdl/chisel/src/coralnpu:core_mini_axi_cc_library_emit_verilog

You can build the RISC-V vector (RV32IMF_Zve32x) version of Coral NPU with this command:

bazel build //hdl/chisel/src/coralnpu:rvv_core_mini_axi_cc_library_emit_verilog

Module interfaces

AXI bus

The interfaces to Coral NPU are as follows:

Signal Bundle Description
clk The clock of the AXI bus / Coral NPU core.
reset The active-low reset signal for the AXI bus/ Coral NPU core.
s_axi An AXI4 slave interface that can be used to write TCMs or touch Coral NPU CSRs.
m_axi An AXI4 master interface used by Coral NPU to read/write to memories/CSRs.
irqn Active-low interrupt to the Coral NPU core. Can be triggered by peripherals or other host processor.
wfi Active-high signal from the Coral NPU core, indicating that the core is waiting for an interrupt. While this is active, Coral NPU is clock-gated.
debug Debug interface to monitor Coral NPU instructions execution. This interface is typically only used for simulation.
s_log Debug interface to handle SLOG instruction. This interface is typically only used for simulation.
halted Output interface informing if the Core is running or not. Can be ignored.
fault Output interface to determine if the Core hit a fault. These signals should be connected to a system control CPU interrupt-line or status register for notification when Coral NPU faults or is halted.

AXI master signals

AR / AW channel

Signal Behavior
addr Address Coral NPU wishes to read/write
prot Always 2 (unprivileged, insecure, data)
id Always 0
len (Count of beats in the burst) - 1
size Bytes-per-beat (1, 2, or 4)
burst Always 1 (INCR)
lock Always 0 (normal access)
cache Always 0 (Device non-bufferable)
qos Always 0
region Always 0

R channel

Signal Behavior
data Response data from the slave
id Ignored, but should be 0 as Coral NPU only emits txns with an id of 0
resp Response code
last Whether the beat is the last in the burst

W channel

Signal Behavior
data Data Coral NPU wishes to write
last Whether the beat is the last in the burst
strb Which bytes in the data are valid

B channel

Signal Behavior
id Ignored, but should be 0 as Coral NPU only emits txns with an id of 0 (an RTL assertion exists for this)
resp Response code

AXI slave signals

AR / AW channel

Signal Behavior
addr Address the master wishes to read / write to
prot Ignored
id Transaction ID, should be reflected in the response beats
len (Count of beats in the burst) - 1
size Bytes-per-beat (1,2,4,8,16)
burst 0, 1, or 2 (FIXED, INCR, WRAP)
lock Ignored
cache Ignored
qos Ignored
region Ignored

R channel

Signal Behavior
data Response data from Coral NPU
id Transaction ID, should match with the id field from AR
resp Response code (0/OKAY or 2/SLVERR)
last Whether the beat is the last in the burst

W channel

Signal Behavior
data Data the master wishes to write to Coral NPU
last Whether the beat is the last in the burst
strb Which bytes in data is valid

B channel

Signal Behavior
id Transaction ID, should match with the id field from AW
resp Response code (0/OKAY or 2/SLVERR)

Debug signals

Signal Behavior
en 4-bit value, indicating which fetch lanes are active
addr 32-bit values, containing the PC for each fetch lane
inst 32-bit values, containing the instruction for each fetch lane
cycles cycle counter
dbus Information about internal LSU transactions
-> valid Whether the transaction is valid
-> bits addr: The 32-bit address for the transaction
write: If the transaction is a write
wdata: 128-bit write data for the transaction
dispatch Information about instructions which are dispatched for execution
-> fire If an instruction was dispatched in the slot, this cycle
-> addr The 32-bit address of the instruction
-> inst The 32-bit value of the instruction
regfile Information about writes to the integer register file
-> writeAddr Register addresses to which a future write is expected
->-> valid If an instruction was dispatched in this lane, which will write the regfile
->-> bits The 5-bit register address to which the write is expected
-> writeData For each port in the register file, information about writes
->-> valid If a write occurred on this port, this cycle
->-> bits_addr The 5-bit register address to which the write occurred
->-> bits_data The 32-bit value which was written to the register
float Information about write to the floating point register file
-> writeAddr Register addresses to which a future write is expected
->-> valid If an instruction was dispatched to floating point on this cycle
->-> bits The address of the register to which a write is expected
-> writeData For each port in the register file, information about writes
->-> valid If a write occured on this port, this cycle
->-> bits_addr The 5-bit register address to which the write occurred
->-> bits_data The 32-bit value which was written to the register

Coral NPU memory map

Memory accesses to the Coral NPU are defined as follows:

Region Range Size Alignment Description
ITCM 0x0000 - 0x1FFF 8 kB 4 bytes ITCM storage for code executed by Coral NPU.
DTCM 0x10000 - 0x17FFF 32 kB 1 bytes DTCM storage for data used by Coral NPU.
CSR 0x30000 - TBD TBD 4 bytes CSR interface used to query/control Coral NPU.

Reset considerations

Coral NPU uses a synchronous reset strategy. To ensure proper reset behavior, ensure that the clock runs for a cycle with reset active, before enabling either the internal clock gate (via CSR) or gating externally.

Booting Coral NPU

Note that in these examples, Coral NPU is located in the overall system memory map at 0x7000 0000.

1) The instruction memory of Coral NPU must be initialized:

Sample code

volatile uint8_t* coralnpu_itcm = (uint8_t*)0x00000000L;
for (int i = 0; i < coralnpu_binary_len; ++i) {
   coralnpu_itcm[i] = coralnpu_binary[i];
}

or

{Sample code}

volatile uint8_t* coralnpu_itcm = (uint8_t*)0x00000000L;
memcpy(coralnpu_itcm, coralnpu_binary, coralnpu_binary_len);

If something like a DMA engine is present in your system, that is probably a better option for initializing the ITCM.

2) Program the start PC value. If your program is linked such that the starting address is 0, you may skip this.

Sample code

volatile uint32_t* coralnpu_pc_csr = (uint32_t*)0x00030004L;
*coralnpu_pc_csr = start_addr;

3) Release clock gate

Sample code

volatile uint32_t* coralnpu_reset_csr = (uint32_t*)0x00030000L;
*coralnpu_reset_csr = 1;

After this, be sure to wait a cycle to allow Coral‘s reset to occur. If you want to configure something like an interrupt that is connected to Coral’s fault or halted outputs, this is a good time.

4) Release reset

Sample code

volatile uint32_t* coralnpu_reset_csr = (uint32_t*)0x00030000L;
*coralnpu_reset_csr = 0;

At this point, Coral NPU will begin executing at the PC value programmed in step 2.

5) Monitor for io_halted. The status of Coral's execution can be checked by reading the status CSR:

Sample code

volatile uint32_t* coralnpu_status_csr = (uint32_t*)0x00030008L;
uint32_t status = *coralnpu_status_csr;
bool halted = status & 1;
bool fault = status & 2;