There is a RTL code generator tool for generating custom defined instructions,
we can define instructions for specific function for user's requirement.
Each instruction has it's own input operands and output operands.
For example, I can define several instructions as follow:
instruction_A's input operands are op0_64, op1_32, op2_16, op3_8;
instruction_B's input operands are op0_32, op1_16, op2_8;
// instruction A has four input operand,
// op5_16, means operand name is "op5" and it's bitwidth is 16 bits.
The tool will parse all definitions then generate a verilog coprocessor engine verilog.
In coprocessor, there are two adjacent stages are decode stage and execution stage.
The decode pipeline stage must store the all decoded input operand data in the execute stage register in the next cycle,
the execute stage will use these operands to perform corresponding operations (no need to care user's function)
We assume that each stage will only have one instruction executed at the same time.
We don't know how many instructions the user wants and how many operands are in each instruction.
How do we design the algorithm to store this pipeline register for considering area and timing ?
For example :
All instruction's operand are different register spaces, insn_A use 120 bits (64+32+16+8); insn_B use 56 bits (32+16+8) , we need to use 176 bits d-flip-flow totally.
Obviously, we waste lot of register space because instruction cannot doing at the same time.