Instruction decode and dispatch in Coral NPU

Coral NPU is a typical pipelined processor that performs instruction fetch, decode, dispatch, execute, and write-back.

Instruction decode/dispatch

Coral NPU is a 32-bit superscalar core that will attempt to dispatch four instructions per cycle by default. Execution is in-order, with no speculation. The execution stage can take multiple clock cycles, depending on which execution unit is involved (ALU, multiply/divide, vector, or matrix).

Instruction dispatch to execution engines

In decoding instructions, Coral NPU identifies operations and read or write dependencies. At the scalar level, vector operations are coarsely defined. Dispatch determines if an operation can be forwarded to an execution unit:

  • Since different instructions have different latencies, Coral must determine when an instruction is dispatched.
  • Coral will not speculate across branches.
  • Scoreboarding is used to track dependencies in code.

Coral NPU's instruction dispatch unit controls when the decoded instructions are dispatched to the execution pipelines (queues). Coral NPU uses the following rules for dispatching instructions:

  • In-order: Coral NPU is an in-order processor. If an instruction at address n cannot be dispatched, n+4 is not considered for dispatch.
  • Hazard handling: Coral NPU uses scoreboarding to track dependencies across instructions — this prevents RAW and WAW data hazards. All execution units read their operands from the register file the cycle after the instruction is dispatched. This prevents WAR hazards.
  • Execution unit constraints: There are a limited number of execution units to service instructions. While there are enough ALUs and BRUs to service each lane (x4), Coral NPU contains only one multiplier in the scalar core and so there can be only one multiply instruction per-cycle. Similarly, non-pipelined execution units such as the divider may exert backpressure to prevent an instruction from being dispatched while it is busy.
  • Control flow: Conservatively, Coral NPU will not dispatch past the following jump instructions: jal, jalr, ebreak, ecall, mret, wfi.
  • Special instructions: Instructions that can affect the core state beyond the PC (program counter) and register file are limited to executing out of the first slot. They are also typically treated as jump control flow instructions, so no other instructions will be dispatched in the same cycle. These instructions are: csrrw, csrrs, csrrc, ebreak, ecall, mret, fence, fenci, wfi.

The following table shows examples of encoded instructions along with the corresponding assembly language and whether Coral NPU will dispatch or not:

Encoded instruction
(opcode)
Assembly language
instruction
Dispatch?
0x32c03083 LD x1, 812(x0) Yes
0x32900113 ADDI x2, x0, 809 Yes
0xf420f0e3 BGEU x1, x2, -192 No
0x7d0000ef JAL x1, 2000 No

Scalar register file

The scalar register file is multiported, with eight read ports and six write ports:

  • Handles two reads and one write for each instruction dispatched (reads delayed one cycle)
  • One write for memory operations
  • One write for multiply or divide computations

Scalar register file

The register file has special bus (link) read ports, for jumps or memory operations. Coral NPU maintains a scoreboard in the register file:

  • An extra interface informs when there is a pending write from a dispatched operation.
  • On writes, the register bit in the scoreboard is cleared.

Write forwarding is enabled.

Execution units

Coral NPU's execution units include four scalar ALUs, one multiplier, one divider, the RISC-V RVV vector engine (including vector ALUs, multiplier, divider, MAC), and the matrix engine.

Execution units

Each execution unit typically has a command interface as well as register read and register write interfaces. "Valid" interfaces are used for execution units that execute in fixed durations, whereas "decoupled" interfaces are used for units with variable clock cycle counts — namely multiply/divide.

See Vector engine micro-architecture for more details about the Coral NPU vector execution unit.