Non-Pipelined Version
The Non-pipelined version is the simplest version of PicoCPU. The architecture of this Implementation is shown in the fig. 1.
Contents
System Components
DataPath unit
Datapath unit includes an Arithmetic Logical Unit (ALU), one Accumulator(ACC) and one general purpose register(Register B) and 2 multiplexers along with the flags (see Fig. 2). The DPU command is formed as following:
ALU Multiplexer
The ALU multiplexer chooses the inputs according to the table 1.
command | output | |
---|---|---|
1 | 00 | MemDATA |
2 | 01 | ControlDATa |
3 | 10 | B |
4 | 11 | 1 |
Register File
The schematic of register file is show in figure 3. The Input multiplexer chooses the inputs according to the table 2. Each register has a small 2 to 1 multiplexer that chooses whether the input is coming from the outside or keeping the old value. The control for 2 to 1 multiplexers is one hot code that comes from the first 8 bits of operand (this will be used only during writing to Reg-file). The control signals for output registers are binary encoded and are the first three bits of the operand (in case of any operation involving a register). R0 is the only register that can be loaded directly by user from memory or operand since this operation needs source and destination address and our instruction format can not support it. The other registers can be only loaded via Acc.
command | output | |
---|---|---|
1 | 00 | (others=>'0') |
2 | 01 | ControlDATA |
3 | 10 | ALUResult |
4 | 11 | MemDATA |
command | selected register | |
---|---|---|
1 | 00000001 | R0 |
2 | 00000010 | R1 |
3 | 00000100 | R2 |
4 | 00001000 | R3 |
5 | 00010000 | R4 |
6 | 00100000 | R5 |
7 | 01000000 | R6 |
8 | 10000000 | R6 |
ALU
The ALU covers the following operations:
Command | Operation | Description | |
---|---|---|---|
1 | 0000 | A + B | Addition |
2 | 0001 | A - B | subtraction |
3 | 0010 | A | Bypass A |
4 | 0011 | B | Bypass B |
5 | 0100 | A AND B | bitwise And |
6 | 0101 | A OR B | bitwise OR |
7 | 0110 | A XOR B | bitwise XOR |
8 | 0111 | '0' & A(BITWIDTH-1 DOWNTO 1) | Logical Shift Right |
9 | 1000 | A(BITWIDTH-2 DOWNTO 0) & '0' | Logical Shift Left |
10 | 1001 | NOT(A) + 1 | Negation |
11 | 1010 | A(BITWIDTH-1) & A(BITWIDTH-1 DOWNTO 1) | Arithmetic Shift Right |
12 | 1011 | A(BITWIDTH-1) & A(BITWIDTH-3 downto 0)& A(0) | Arithmetic Shift Left |
13 | 1100 | NOT(A) | Flip |
14 | 1101 | 0 | Clear A |
15 | 1110 | Cflag & A(BITWIDTH-1 downto 1) | Rotate Right Through Carry |
16 | 1111 | A(BITWIDTH-2 downto 0)& Cflag | Rotate Left Through Carry |
For addition/subtraction a ripple carry model is made out of chain of full adders.
Flags
command | FlagToClear | |
---|---|---|
1 | 001 | Clear Z |
2 | 010 | Clear OV |
3 | 100 | Clear C |
In DPU has the following flags:
- Zero Flag (Z): will be set if the result of the operation is zero
- Overflow Flag (OV): will be set if an overflow happens in signed operations (as an example if we have 8 bit addition of 82+91 the answer we expect is 173 but the result would be interpreted as -45). Overflow flag can be realized in the following way:
- Carry Flag (C): will be set if the unsigned addition or subtraction results in a carry.
- Equal Flag (EQ): will be set if ACC value is equal to the operand
To clear flags,the SetFlag commands are used in DPU command (see table 5).
Instruction Memory (ROM)
Instruction memory is a read only memory that user will fill in the beginning.
Data Memory
Data memory is made out of blocks of 1024 registers. If user wants bigger size memory, it would be necessary to add more blocks. Writing into data memory takes one clock cycle but reading from it can be done instantly(or in relatively shorter time). So we can assume that if we issue address in one clock cycle, we can get the data in the same clock cycle. There is a stack is at the top of data memory and its size is not restricted. Behavioural VHDL description of one instance of data memory is shown in the code below.
Control unit
Control unit has four states:
- Fetch: fetches the instructions from instruction memory and loads it in Instruction Register (IR). DPU is IDLE. No Read from data memory.
- Decode: decodes the information in IR. DPU is IDLE. No Read from data memory.
- Execute: if execution on DPU is needed the proper control signals would be provided, otherwise DPU will stay IDLE. Read from data memory performed if needed.
- WriteBack: in case there is a need to write a data into memory it will happen in this stage. All changes in Program Counter(PC) is happening here so all conditional and unconditional branching would be decided in this state. in case the instruction is HALT the PC would be frozen.
VHDL complete versions
- 8-bit Version (17 dec 2014)
Functional Testing
Following machine code program has been made to test functionality of all instructions. The test program doesn’t cover all the cases but run through all the instructions. (at the moment not all the instructions are covered-12% missing)
Load_R0_Dir "00011000"
OR_A_R "00000000"
IncA
Sub_A_R "00000000"
NOP
JmpC "00001000"
NOP
NOP
RRC
RLC
NOP
ClearC
Store_A_Mem "00010000"
PUSH
SavePC
PUSH
Jump "00010101"
POP
ShiftArithL
DecA
HALT
Load_A_Mem "00010000"
And_A_R "00000000"
JmpZ "00011001"
NOP
ClearZ
Add_A_Mem "00010000"
Sub_A_Mem "00010000"
Add_A_R "00000000"
Sub_A_Dir "00001100"
FlipA
XOR_A_R "00000000"
NegA
ShiftArithR
ShiftA_L
ShiftA_R
ClearACC
POP
Add_A_Dir "00000011"
LoadPC
HALT
Synthesizing and implementation on FPGA
One of the parts of this project is to synthesize 8-bit version of CPU on an FPGA board. "Nexsys 3" board from Digilent has been chosen for implementation. Clock source is controllable via a switch on the board. ClK would be either the 100MHz on-board oscillator or generated signal from one push button (for debugging). The Accumulator value will be displayed on Seven-Segments and the flag values will be displayed on the LEDs.
You need the following files along the CPU VHDL files for synthesis on and implementation on FPGA:
- Top-level Entity: TopLevel.vhd
- Denouncing circuit: Debouncer.vhd
- Seven-segment Decoder: SevenSegment.vhd
Important note: Make sure you set the "FPGA Start-Up Clock" to "JTAG Clock".
The User Constraint File (UCF)
The following is the user constraints file for the project (for Nexys-3 boards):
NET clk LOC = V10;
NET rst LOC = C4;
NET ClkBttn LOC = D9;
NET ClockSrc LOC = T10;
NET FlagOutput<0> LOC = U16;
NET FlagOutput<1> LOC = V16;
NET FlagOutput<2> LOC = U15;
NET FlagOutput<3> LOC = V15;
NET SevenSeg<6> LOC = T17;
NET SevenSeg<5> LOC = T18;
NET SevenSeg<4> LOC = U17;
NET SevenSeg<3> LOC = U18;
NET SevenSeg<2> LOC = M14;
NET SevenSeg<1> LOC = N14;
NET SevenSeg<0> LOC = L14;
NET AN<3> LOC = P17;
NET AN<2> LOC = P18;
NET AN<1> LOC = N15;
NET AN<0> LOC = N16;