RiscV Integrated Decoder

The objectives of this tutorial are:

  • Learn how the generated ISA and binary decoders fit together.
  • Write the necessary C++ code to create a full instruction decoder for RiscV RV32I that combines the ISA and binary decoders.

Understand the instruction decoder

The instruction decoder is responsible for, given an instruction address, read the instruction word from memory and return a fully initialized instance of the Instruction that represents that instruction.

The top level decoder implements the generic::DecoderInterface shown below:

// This is the simulator's interface to the instruction decoder.
class DecoderInterface {
 public:
  // Return a decoded instruction for the given address. If there are errors
  // in the instruciton decoding, the decoder should still produce an
  // instruction that can be executed, but its semantic action function should
  // set an error condition in the simulation when executed.
  virtual Instruction *DecodeInstruction(uint64_t address) = 0;
  virtual ~DecoderInterface() = default;
};

As you can see, there is only one method that has to be implemented: cpp virtual Instruction *DecodeInstruction(uint64_t address);

Now let's look at what is provided and what is needed by the generated code.

First, consider the top level class RiscV32IInstructionSet in the file riscv32i_decoder.h, which was generated at the end of the tutorial on the ISA decoder. To see the contents anew, navigate to the solution directory of that tutorial and rebuild all.

$ cd riscv_isa_decoder/solution
$ bazel build :all
...<snip>...

Now change your directory back to the repository root, then let's take a look at the sources that were generated. For that, change directory to bazel-out/k8-fastbuild/bin/riscv_isa_decoder (assuming you are on an x86 host - for other hosts, the k8-fastbuild will be another string).

$ cd ../..
$ cd bazel-out/k8-fastbuild/bin/riscv_isa_decoder

You will see the four source files that contain the generated C++ code listed:

  • riscv32i_decoder.h
  • riscv32i_decoder.cc
  • riscv32i_enums.h
  • riscv32i_enums.cc

Open up the first file riscv32i_decoder.h. There are three classes that we need to take a look at:

  • RiscV32IEncodingBase
  • RiscV32IInstructionSetFactory
  • RiscV32IInstructionSet

Note the naming of the classes. All the classes are named based on the Pascal-case version of the name given in the "isa" declaration in that file: isa RiscV32I { ... }

Let's start with the RiscVIInstructionSet class first. It is shown below:

class RiscV32IInstructionSet {
 public:
  RiscV32IInstructionSet(ArchState *arch_state,
                         RiscV32IInstructionSetFactory *factory);
  Instruction *Decode(uint64 address, RiscV32IEncodingBase *encoding);

 private:
  std::unique_ptr<Riscv32Slot> riscv32_decoder_;
  ArchState *arch_state_;
};

There are no virtual methods in this class, so this is a stand-alone class, but notice two things. First, the constructor takes a pointer to an instance of the RiscV32IInstructionSetFactory class. This is a class that the generated decoder uses to create an instance of the RiscV32Slot class, which is used to decode all the instructions defined for the slot RiscV32 as defined in the riscv32i.isa file. Second, the Decode method takes an additional parameter of type pointer to RiscV32IEncodingBase, this is a class that will provide the interface between the isa decoder generated in the first tutorial and the binary decoder generated in the second lab.

The class RiscV32IInstructionSetFactory is an abstract class from which we have to derive our own implementation for the full decoder. In most cases this class is trivial: just provide a method for calling the constructor for each slot class defined in our .isa file. In our case, it's very simple as there is only a single such class: Riscv32Slot (Pascal-case of the name riscv32 concatenated with Slot). The method is not generated for you as there are some advanced use cases where there might be utility in deriving a subclass from the slot, and calling its constructor instead.

We will go through the final class RiscV32IEncodingBase later in this tutorial, as that is the subject of another exercise.


Define top level instruction decoder

Define the factory class

If you rebuilt the project for the first tutorial, make sure you change back to the riscv_full_decoder directory.

Open up the file riscv32_decoder.h. All the necessary include files have already been added and the namespaces have been set up.

After the comment marked //Exercise 1 - step 1 define the class RiscV32IsaFactory inheriting from RiscV32IInstructionSetFactory.

class RiscV32IsaFactory : public RiscV32InstructionSetFactory {};

Next, define the override for CreateRiscv32Slot. Since we don't use any derived classes of Riscv32Slot, we simply allocate a new instance using std::make_unique.

std::unique_ptr<Riscv32Slot> CreateRiscv32Slot(ArchState *) override {
  return std::make_unique<Riscv32Slot>(state);
}

If you need help (or want to check your work), the full answer is here.

Define the decoder class

Constructors, destructor, and method declarations

Next it's time to define the decoder class. In the same file as above, go to the declaration of RiscV32Decoder. Expand the declaration into a class definition where RiscV32Decoder inherits from generic::DecoderInterface.

class RiscV32Decoder : public generic::DecoderInterface {
  public:
};

Next, before we write the constructor, let's take a quick look at the code generated in our second tutorial on the binary decoder. In addition to all the Extract functions, there is the function DecodeRiscVInst32:

OpcodeEnum DecodeRiscVInst32(uint32_t inst_word);

This function takes the instruction word that needs to be decoded, and returns the opcode that matches that instruction. On the other hand, the DecodeInterface class that RiscV32Decoder implements only passes in an address. Thus, the RiscV32Decoder class has to be able to access memory to read the instruction word to pass to DecodeRiscVInst32(). In this project the way to access memory is through a simple memory interface defined in .../mpact/sim/util/memory aptly named util::MemoryInterface, seen below:

  // Load data from address into the DataBuffer, then schedule the Instruction
  // inst (if not nullptr) to be executed (using the function delay line) with
  // context. The size of the data access is based on size of the data buffer.
  virtual void Load(uint64_t address, DataBuffer *db, Instruction *inst,
                    ReferenceCount *context) = 0;

In addition we need to be able to pass a state class instance to the constructors of the other decoder classes. The appropriate state class is riscv::RiscVState class, which derives from generic::ArchState, with added functionality for RiscV. This means we must declare the constructor so that it can take a pointer to the state and the memory:

RiscV32Decoder(riscv::RiscVState *state, util::MemoryInterface *memory);

Delete the default constructor and override the destructor:

RiscV32Decoder() = delete;
~RiscV32Decoder() override;

Next declare the DecodeInstruction method we need to override from generic::DecoderInterface.

generic::Instruction *DecodeInstruction(uint64_t address) override;

If you need help (or want to check your work), the full answer is here.


Data Member Definitions

The RiscV32Decoder class will need private data members to store the constructor parameters and a pointer to the factory class.

 private:
  riscv::RiscVState *state_;
  util::MemoryInterface *memory_;

It also needs a pointer to the encoding class that is derived from RiscV32IEncodingBase, let's call that RiscV32IEncoding (we will implement this in exercise 2). Additionally it needs a pointer to an instance of RiscV32IInstructionSet, so add:

  RiscV32IsaFactory *riscv_isa_factory_;
  RiscV32IEncoding *riscv_encoding_;
  RiscV32IInstructionSet *riscv_isa_;

Finally, we need to define a data member for use with our memory interface:

  generic::DataBuffer *inst_db_;

If you need help (or want to check your work), the full answer is here.

Define the Decoder Class Methods

Next, it's time to implement the constructor, destructor, and the DecodeInstruction method. Open up the file riscv32_decoder.cc. The empty methods are already in the file as well as namespace declarations and a couple of using declarations.

Constructor Definition

The constructor only needs to initialize the data members. First initialize the state_ and memory_:

RiscV32Decoder::RiscV32Decoder(riscv::RiscVState *state,
                               util::MemoryInterface *memory)
    : state_(state), memory_(memory) {

Next allocate instances of each of the decoder related classes, passing in the appropriate parameters.

  // Allocate the isa factory class, the top level isa decoder instance, and
  // the encoding parser.
  riscv_isa_factory_ = new RiscV32IsaFactory();
  riscv_isa_ = new RiscV32IInstructionSet(state, riscv_isa_factory_);
  riscv_encoding_ = new RiscV32IEncoding(state);

Finally, allocate the DataBuffer instance. It is allocated using a factory accessible through the state_ member. We allocate a data buffer sized to store a single uint32_t, as that is the size of the instruction word.

  inst_db_ = state_->db_factory()->Allocate<uint32_t>(1);

Destructor Definition

The destructor is simple, just free the objects we allocated in the constructor, but with one twist. The data buffer instance is reference counted, so instead off calling delete on that pointer, we DecRef() the object:

RiscV32Decoder::~RiscV32Decoder() {
  inst_db_->DecRef();
  delete riscv_isa_;
  delete riscv_isa_factory_;
  delete riscv_encoding_;
}

Method definition

In our case, the implementation of this method is pretty simple. We will assume that the address is properly aligned and no additional error checking is required.

First, the instruction word has to be fetched from memory using the memory interface and the DataBuffer instance.

  memory_->Load(address, inst_db_, nullptr, nullptr);
  uint32_t iword = inst_db_->Get<uint32_t>(0);

Next, we call into the RiscVIEncoding instance to parse the instruction word, which has to be done before calling the ISA decoder itself. Recall that the ISA decoder calls into the RiscVIEncoding instance directly to obtain the opcode and operands specified by the instruction word. We haven't implemented that class yet, but let's use void ParseInstruction(uint32_t) as that method.

  riscv_encoding_->ParseInstruction(iword);

Finally we call the ISA decoder, passing in the address and the Encoding class.

  auto *instruction = riscv_isa_->Decode(address, riscv_encoding_);
  return instruction;

If you need help (or want to check your work), the full answer is here.


The encoding class

The encoding class implements an interface that is used by the decoder class to obtain the instruction opcode, its source and destination operands, and resource operands. These objects all depend on information from the binary format decoder, such as the opcode, values of specific fields in the instruction word etc. This is separated from the decoder class to keep it encoding agnostic and enable support for multiple different encoding schemes in the future.

The RiscV32IEncodingBase is an abstract class. The set of methods we have to implement in our derived class is shown below.

class RiscV32IEncodingBase {
 public:
  virtual ~RiscV32IEncodingBase() = default;

  virtual OpcodeEnum GetOpcode(SlotEnum slot, int entry) = 0;

  virtual ResourceOperandInterface *
              GetSimpleResourceOperand(SlotEnum slot, int entry, OpcodeEnum opcode,
                                       SimpleResourceVector &resource_vec, int end) = 0;

  virtual ResourceOperandInterface *
              GetComplexResourceOperand(SlotEnum slot, int entry, OpcodeEnum opcode,
                                        ComplexResourceEnum resource_op,
                                        int begin, int end) = 0;

  virtual PredicateOperandInterface *
              GetPredicate(SlotEnum slot, int entry, OpcodeEnum opcode,
                           PredOpEnum pred_op) = 0;

  virtual SourceOperandInterface *
              GetSource(SlotEnum slot, int entry, OpcodeEnum opcode,
                        SourceOpEnum source_op, int source_no) = 0;

  virtual DestinationOperandInterface *
              GetDestination(SlotEnum slot, int entry, OpcodeEnum opcode,
                             DestOpEnum dest_op, int dest_no, int latency) = 0;

  virtual int GetLatency(SlotEnum slot, int entry, OpcodeEnum opcode,
                         DestOpEnum dest_op, int dest_no) = 0;
};

At first glance it looks a bit complicated, particularly with the number of parameters, but for a simple architecture like RiscV we actually ignore most of the parameters, as their values will be implied.

Let's go through each of the methods in turn.

OpcodeEnum GetOpcode(SlotEnum slot, int entry);

The GetOpcode method returns the OpcodeEnum member for the current instruction, identifying the instruction opcode. The OpcodeEnum class is defined in the generated isa decoder file riscv32i_enums.h. The method takes two parameters, both of which can be ignored for our purposes. The first of these is the slot type (an enum class also defined in riscv32i_enums.h), which, since RiscV only has a single slot, has only one possible value: SlotEnum::kRiscv32. The second is the instance number of the slot (in case there are multiple instances of the slot, which may occur in some VLIW architectures).

ResourceOperandInterface *
    GetSimpleResourceOperand(SlotEnum slot, int entry, OpcodeEnum opcode,
                                     SimpleResourceVector &resource_vec, int end)

ResourceOperandInterface *
    GetComplexResourceOperand(SlotEnum slot, int entry, OpcodeEnum opcode,
                                      ComplexResourceEnum resource_op,
                                      int begin, int end);

The next two methods are used for modeling hardware resources in the processor in order to improve cycle accuracy. For our tutorial exercises, we will not use these, so in the implementation, they will be stubbed out, returning nullptr.

PredicateOperandInterface *
            GetPredicate(SlotEnum slot, int entry, OpcodeEnum opcode,
                         PredOpEnum pred_op);

SourceOperandInterface *
            GetSource(SlotEnum slot, int entry, OpcodeEnum opcode,
                      SourceOpEnum source_op, int source_no);

DestinationOperandInterface *
            GetDestination(SlotEnum slot, int entry, OpcodeEnum opcode,
                           DestOpEnum dest_op, int dest_no, int latency);

These three methods return pointers to operand objects that are used within the instruction semantic functions to access the value of any instruction predicate operand, each of the instruction source operands, and write new values to the instruction destination operands. Since RiscV does not use instruction predicates, that method need only return nullptr.

The pattern of parameters is similar across these functions. First, just like GetOpcode the slot and the entry are passed in. Then the opcode for the instruction for which the operand has to be created. This is only used if the different opcodes need to return different operand objects for the same operand types, which is not the case for this RiscV simulator.

Next is the Predicate, Source, and Destination, operand enumeration entry which identifies the operand that has to be created. These come from the three OpEnums in the riscv32i_enums.h as seen below:

  enum class PredOpEnum {
    kNone = 0,
    kPastMaxValue = 1,
  };

  enum class SourceOpEnum {
    kNone = 0,
    kBimm12 = 1,
    kCsr = 2,
    kImm12 = 3,
    kJimm20 = 4,
    kRs1 = 5,
    kRs2 = 6,
    kSimm12 = 7,
    kUimm20 = 8,
    kUimm5 = 9,
    kPastMaxValue = 10,
  };

  enum class DestOpEnum {
    kNone = 0,
    kCsr = 1,
    kNextPc = 2,
    kRd = 3,
    kPastMaxValue = 4,
  };

If you look back at the riscv32.isa file, you'll note that these correspond to the sets of source and destination operand names used in the declaration of each instruction. By using different operand names for operands that represent different bitfields and operand types, it makes writing the encoding class easier as the enum member uniquely determines the exact operand type to return, and it is not necessary to consider the values of the slot, entry, or opcode parameters.

Finally, for source and destination operands, the ordinal position of the operand is passed in (again, we can ignore this), and for the destination operand, the latency (in cycles) that elapses between the time the instruction is issued, and the destination result is available to subsequent instructions. In our simulator, this latency will be 0, meaning that the instruction writes the result out immediately to the register.

int GetLatency(SlotEnum slot, int entry, OpcodeEnum opcode,
                         DestOpEnum dest_op, int dest_no);

The final function is used to get the latency of a particular destination operand if it has been specified as * in the .isa file. This is uncommon, and is not used for this RiscV simulator, so our implementation of this function will just return 0.


Define the encoding class

Header file (.h)

Methods

Open up the file riscv32i_encoding.h. All the necessary include files have already been added and the namespaces have been set up. All code addition is done following the comment // Exercise 2.

Let's start by defining a class RiscV32IEncoding that inherits from the generated interface.

class RiscV32IEncoding : public RiscV32IEncodingBase {
 public:

};

Next, the constructor should take a pointer to the state instance, in this case a pointer to riscv::RiscVState. The default destructor should be used.

explicit RiscV32IEncoding(riscv::RiscVState *state);
~RiscV32IEncoding() override = default;

Before we add in all the interface methods, let's add in the method called by RiscV32Decoder to parse the instruction:

void ParseInstruction(uint32_t inst_word);

Next, let's add in those methods that have trivial overrides while dropping the names of the parameters that are not used:

// Trivial overrides.
ResourceOperandInterface *GetSimpleResourceOperand(SlotEnum, int, OpcodeEnum,
                                                   SimpleResourceVector &,
                                                   int) override {
  return nullptr;
}

ResourceOperandInterface *GetComplexResourceOperand(SlotEnum, int, OpcodeEnum,
                                                    ComplexResourceEnum ,
                                                    int, int) override {
  return nullptr;
}

PredicateOperandInterface *GetPredicate(SlotEnum, int, OpcodeEnum,
                                        PredOpEnum) override {
  return nullptr;
}

int GetLatency(SlotEnum, int, OpcodeEnum, DestOpEnum, int) override { return 0; }

Finally add in the remaining method overrides of the public interface but with the implementations deferred to the .cc file.


OpcodeEnum GetOpcode(SlotEnum, int) override;

SourceOperandInterface *GetSource(SlotEnum , int, OpcodeEnum,
                                  SourceOpEnum source_op, int) override;

DestinationOperandInterface *GetDestination(SlotEnum, int, OpcodeEnum,
                                            DestOpEnum dest_op, int,
                                            int latency) override;

In order to simplify the implementation of each of the operand getter methods we will create two arrays of callables (function objects) indexed by the numeric value of the SourceOpEnum and DestOpEnum members respectively. This way the bodies of these to methods is reduced down into calling the function object for the enum value that is passed in and returning its return value.

To organize the initialization of these two arrays we define two private methods that will be called from the constructor as follows:

 private:
  void InitializeSourceOperandGetters();
  void InitializeDestinationOperandGetters();

Data members

The data members required are as follows:

  • state_ to hold the riscv::RiscVState * value.
  • inst_word_ of type uint32_t which holds the value of the current instruction word.
  • opcode_ to hold the opcode of the current instruction that is updated by the ParseInstruction method. This has type OpcodeEnum.
  • source_op_getters_ an array to store the callables used to obtain source operand objects. The type of the array elements is absl::AnyInvocable<SourceOperandInterface *>()>.
  • dest_op_getters_ an array to store the callables used to obtain destination operand objects. The type of the array elements is absl::AnyInvocable<DestinationOperandInterface *>()>.
  • xreg_alias an array of RiscV integer register ABI names, e.g., "zero" and "ra" instead of "x0" and "x1".

  riscv::RiscVState *state_;
  uint32_t inst_word_;
  OpcodeEnum opcode_;

  absl::AnyInvocable<SourceOperandInterface *()>
      source_op_getters_[static_cast<int>(SourceOpEnum::kPastMaxValue)];
  absl::AnyInvocable<DestinationOperandInterface *(int)>
      dest_op_getters_[static_cast<int>(DestOpEnum::kPastMaxValue)];

  const std::string xreg_alias_[32] = {
      "zero", "ra", "sp", "gp", "tp",  "t0",  "t1", "t2", "s0", "s1", "a0",
      "a1",   "a2", "a3", "a4", "a5",  "a6",  "a7", "s2", "s3", "s4", "s5",
      "s6",   "s7", "s8", "s9", "s10", "s11", "t3", "t4", "t5", "t6"};

If you need help (or want to check your work), the full answer is here.

Source file (.cc).

Open up the file riscv32i_encoding.cc. All the necessary include files have already been added and the namespaces have been set up. All code addition is done following the comment // Exercise 2.

Helper functions

We will start by writing a couple of helper functions that we use to create source and destination register operands. These will be templated on the register type and will call into the RiscVState object to get a handle to the register object, and then call an operand factory method in the register object.

Let's start with the destination operand helpers:

template <typename RegType>
inline DestinationOperandInterface *GetRegisterDestinationOp(
    RiscVState *state, const std::string &name, int latency) {
  auto *reg = state->GetRegister<RegType>(name).first;
  return reg->CreateDestinationOperand(latency);
}

template <typename RegType>
inline DestinationOperandInterface *GetRegisterDestinationOp(
    RiscVState *state, const std::string &name, int latency,
    const std::string &op_name) {
  auto *reg = state->GetRegister<RegType>(name).first;
  return reg->CreateDestinationOperand(latency, op_name);
}

As you can see, there are two helper functions. The second takes an additional parameter op_name that allows the operand to have a different name, or string representation, than the underlying register.

Similarly for the source operand helpers:

template <typename RegType>
inline SourceOperandInterface *GetRegisterSourceOp(RiscVState *state,
                                                   const std::string &reg_name) {
  auto *reg = state->GetRegister<RegType>(reg_name).first;
  auto *op = reg->CreateSourceOperand();
  return op;
}

template <typename RegType>
inline SourceOperandInterface *GetRegisterSourceOp(RiscVState *state,
                                                   const std::string &reg_name,
                                                   const std::string &op_name) {
  auto *reg = state->GetRegister<RegType>(reg_name).first;
  auto *op = reg->CreateSourceOperand(op_name);
  return op;
}

Constructor and interface functions

The constructor and the interface functions are very simple. The constructor just calls the two initialize methods to initialize the callables arrays for the operand getters.

RiscV32IEncoding::RiscV32IEncoding(RiscVState *state) : state_(state) {
  InitializeSourceOperandGetters();
  InitializeDestinationOperandGetters();
}

ParseInstruction stores the instruction word and then the opcode that it obtains from calling into the binary decoder generated code.

// Parse the instruction word to determine the opcode.
void RiscV32IEncoding::ParseInstruction(uint32_t inst_word) {
  inst_word_ = inst_word;
  opcode_ = mpact::sim::codelab::DecodeRiscVInst32(inst_word_);
}

Lastly, the operand getters return the value from the getter function it calls based on the array lookup using the destination/source operand enum value.


DestinationOperandInterface *RiscV32IEncoding::GetDestination(
    SlotEnum, int, OpcodeEnum, DestOpEnum dest_op, int, int latency) {
  return dest_op_getters_[static_cast<int>(dest_op)](latency);
}

SourceOperandInterface *RiscV32IEncoding::GetSource(SlotEnum, int, OpcodeEnum,
                                                    SourceOpEnum source_op, int) {
  return source_op_getters_[static_cast<int>(source_op)]();
}

Array initialization methods

As you may have guessed, most of the work is in initializing the getter arrays, but don't worry, it's done using an easy, repeating pattern. Let's start with InitializeDestinationOpGetters() first, since there are only a couple of destination operands.

Recall the generated DestOpEnum class from riscv32i_enums.h:

  enum class DestOpEnum {
    kNone = 0,
    kCsr = 1,
    kNextPc = 2,
    kRd = 3,
    kPastMaxValue = 4,
  };

For dest_op_getters_ we need to initialize 4 entries, one each for kNone, kCsr, kNextPc and kRd. For convenience, each entry is initialized with a lambda, though you could use any other form of callable as well. The signature of the lambda is void(int latency).

Up to now we haven't talked much about the different kinds of destination operands that are defined in MPACT-Sim. For this exercise we will only use two types: generic::RegisterDestinationOperand defined in register.h, and generic::DevNullOperand defined in devnull_operand.h. The details of these operands aren't really important right now, except that the former is used to write to registers, and the latter ignores all writes.

The first entry for kNone is trivial - just return a nullptr and optionally log an error.

void RiscV32IEncoding::InitializeDestinationOperandGetters() {
  // Destination operand getters.
  dest_op_getters_[static_cast<int>(DestOpEnum::kNone)] = [](int) {
    return nullptr;
  };

Next is kCsr. Here we are going to cheat a little. The "hello world" program doesn't rely on any actual CSR update, but there is some boilerplate code that execute CSR instructions. The solution is to just dummy this up by using a regular register named "CSR" and channel all such writes to it.

  dest_op_getters_[static_cast<int>(DestOpEnum::kCsr)] = [this](int latency) {
    return GetRegisterDestinationOp<RV32Register>(state_, "CSR", latency);
  };

Next is kNextPc, which refers to the "pc" register. It is used as the target for all branch and jump instructions. The name is defined in RiscVState as kPcName.

  dest_op_getters_[static_cast<int>(DestOpEnum::kNextPc)] = [this](int latency) {
    return GetRegisterDestinationOp<RV32Register>(state_, RiscVState::kPcName, latency);
  }

Finally there is the kRd destination operand. In riscv32i.isa the operand rd is only used to refer to the integer register encoded in the "rd" field of the instruction word, so there is no ambiguity to which it refers. There is only one complication. Register x0 (abi name zero) is hardwired to 0, so for that register we use the DevNullOperand.

So in this getter we first extract the value in the rd field using the Extract method generated from the .bin_fmt file. If the value is 0, we return a "DevNull" operand, otherwise we return the correct register operand, taking care to use the appropriate register alias as the operand name.

  dest_op_getters_[static_cast<int>(DestOpEnum::kRd)] = [this](int latency) {
    // First extract register number from rd field.
    int num = inst32_format::ExtractRd(inst_word_);
    // For register x0, return the DevNull operand.
    if (num == 0) return new DevNullOperand<uint32_t>(state, {1});
    // Return the proper register operand.
    return GetRegisterDestinationOp<RV32Register>(
      state_, absl::StrCat(RiscVState::kXRegPrefix, num), latency,
      xreg_alias_[num]);
    )
  }
}

Now onto the InitializeSourceOperandGetters() method, where the pattern is much the same, but the details differ slightly.

First let's take a look at the SourceOpEnum that was generated from riscv32i.isa in the first tutorial:

  enum class SourceOpEnum {
    kNone = 0,
    kBimm12 = 1,
    kCsr = 2,
    kImm12 = 3,
    kJimm20 = 4,
    kRs1 = 5,
    kRs2 = 6,
    kSimm12 = 7,
    kUimm20 = 8,
    kUimm5 = 9,
    kPastMaxValue = 10,
  };

Examining the members, in addition to kNone, they fall into two groups. One is immediate operands: kBimm12, kImm12, kJimm20, kSimm12, kUimm20, and kUimm5. The other are register operands: kCsr, kRs1, and kRs2.

The kNone operand is handled just like for destination operands - return a nullptr.

void RiscV32IEncoding::InitializeSourceOperandGetters() {
  // Source operand getters.
  source_op_getters_[static_cast<int>(SourceOpEnum::kNone)] = [] () {
    return nullptr;
  };

Next, let's work on the register operands. We will handle the kCsr similar to how we handled the corresponding destination operands - just call the helper function using "CSR" as the register name.

  // Register operands.
  source_op_getters_[static_cast<int>(SourceOpEnum::kCsr)] = [this]() {
    return GetRegisterSourceOp<RV32Register>(state_, "CSR");
  };

Operands kRs1 and kRs2 are handled equivalently to kRd, except that while we didn't want to update x0 (or zero), we want to make sure that we always read 0 from that operand. For that we will use the generic::IntLiteralOperand<> class defined in literal_operand.h. This operand is used to store a literal value (as opposed to a simulated immediate value). Otherwise the pattern is the same: first extract the rs1/rs2 value from the instruction word, if it is zero return the literal operand with a 0 template parameter, otherwise return a regular register source operand using the helper function, using the abi alias as the operand name.

  source_op_getters_[static_cast<int>(SourceOpEnum::kRs1)] =
      [this]() -> SourceOperandInterface * {
    int num = inst32_format::ExtractRs1(inst_word_);
    if (num == 0) return new IntLiteralOperand<0>({1}, xreg_alias_[0]);
    return GetRegisterSourceOp<RV32Register>(
        state_, absl::StrCat(RiscVState::kXregPrefix, num), xreg_alias_[num]);
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kRs2)] =
      [this]() -> SourceOperandInterface * {
    int num = inst32_format::ExtractRs2(inst_word_);
    if (num == 0) return new IntLiteralOperand<0>({1}, xreg_alias_[0]);
    return GetRegisterSourceOp<RV32Register>(
        state_, absl::StrCat(RiscVState::kXregPrefix, num), xreg_alias_[num]);
  };

Finally we handle the different immediate operands. Immediate values are stored in instances of the class generic::ImmediateOperand<> defined in immediate_operand.h. The only difference between the different getters for the immediate operands is which Extractor function is used, and whether the storage type is signed or unsigned, according to the bitfield.

  // Immediates.
  source_op_getters_[static_cast<int>(SourceOpEnum::kBimm12)] = [this]() {
    return new ImmediateOperand<int32_t>(
        inst32_format::ExtractBImm(inst_word_));
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kImm12)] = [this]() {
    return new ImmediateOperand<int32_t>(
        inst32_format::ExtractImm12(inst_word_));
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kUimm5)] = [this]() {
    return new ImmediateOperand<uint32_t>(
        inst32_format::ExtractUimm5(inst_word_));
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kJimm20)] = [this]() {
    return new ImmediateOperand<int32_t>(
        inst32_format::ExtractJImm(inst_word_));
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kSimm12)] = [this]() {
    return new ImmediateOperand<int32_t>(
        inst32_format::ExtractSImm(inst_word_));
  };
  source_op_getters_[static_cast<int>(SourceOpEnum::kUimm20)] = [this]() {
    return new ImmediateOperand<uint32_t>(
        inst32_format::ExtractUimm32(inst_word_));
  };
}

If you need help (or want to check your work), the full answer is here.

This concludes this tutorial. We hope it has been useful.