PicoAssembler user manual and syntax reference
Contents
Using the program
The standard command line to run the program is:
java -jar picoAssembler -a <path to assembler file> -i <path to instruction set file> -o <path to output file>
- Assembler file is the file with assembler code to be compiled.
- Instruction set file is the file that maps instructions to their codes (defaults to intructions.conf).
- Output file is the file, where the machinecode is saved (defaults to machinecode.bin).
It is only necessary to specify the assembler file, the other two have default values that will be used if they are not specified on the command line. It is also possible to force the program to not to write the output code into a file at all and just display the output on the screen using the -no flag.
For more info on different command line options, use the --help flag.
Instruction set file syntax
The instruction set file is a text file that maps the higher level instruction names to the opcodes. It is used for configuring the assembler. The instructions are defined as follows:
instruction_name : opcode : is_operand_signed
- instruction_name: The mnemonic name for instruction, like ADD, SUB, etc.
- opcode : Opcode that maps to the mnemonic.
- is_operand_signed: Specifies, if the operand for this instruction is signed (1) or not (0). This depends on the CPU architecture, that it is used with, but in general, instructions that manupulate data, like addition, subtraction, etc. should be considered as signed. On the other hand, if the operand represents for example an memory address or a program counter value, it should be considered unsigned.
Every instruction has to be defined on a new line.
Comments start with the # symbol.
Example:
ADD : 00000 : 1 SUB : 00001 : 1 JUMP : 00010 : 0
The leading zeros are automatically added, you do not have to include them in your assembly code. Negative numbers are also supported and automatically decoded into their complement:
Decimal: -5 -> 11111011 Hexadecimal: -0x5 -> 11111011 Binary: -0b101 -> 11111011
Assembler syntax
Operations
Each operation is defined by opcode and operand that are separated by a whitespace
OPCODE <operand>
Specifying the operand is optional, if it is not specified, the assembler automatically adds 0x00 as the operand. Operand can be either in decimal, hexadecimal or binary format:
Decimal: 5 Hexadecimal: 0x5 Binary: 0b101
Comments
Comments start with ' ; ' symbol
; An awesome example comment
Multi-line comments are not supported.
Labels
GOTO style labels are also supported.
Declaration
To declare a label label, it has to be put on a separate line and end with the ' : ' symbol
AnExampleLabel:
Usage
To branch to the label use it with your branching operation as an operand. Also, notice that the ' : ' symbol has to be in the beginning of the label when it is used as an operand for an instruction:
GOTO :AnExampleLabel
This gets replaced by the program memory address of the operation that follows the label.
Example: The following program first loads number 5 to the accumulator, then decrements the number in the accumulator by one, then checks the zero flag (the BRNZ instruction). If the flag is not up (the answer in the accumulator is not zero), then the code will branch to the address or label in the operand. In this case that means branching to the address defined by the Decrement: label. In the comments are shown the addresses in the program memory for every instruction. In case the zero flag is up (the answer is zero), there is no branching and the CPU halts.
LOADN 5 ;0x0 Decrement: DEC ;0x1 BRNZ :Decrement ;0x2 HALT ;0x3
This gets resolved by the assembler into the following:
LOADN 5 ;0x0 DEC ;0x1 BRNZ 0x1 ;0x2 HALT ;0x3
It is possible to declare one label only once in the code (it can't be re-declared), there are no limitations to the time one label can be used for branching . It is allowed to branch to the label before it is defined (earlier in the code).
Constants
The picoAssembler also supports constants. You can think of the constants as an alias for a data memory address, so it is possible to use some meaningful name to them. For example, you can access your memory address for temporary data as $tmp instead of 0b10101101 . Constant names begin with the ' $ ' symbol.
Declaration
The line for declaring a constant is the following:
$TestConst 52
Usage
To use the constant, it has to be given to an instruction as an operand
LOADV $TestConst
It is important to understand, that the constants do not work as variables, it is not possible to get the value of the memory address they are pointing to. For this it is necessary to use native instructions for loading and storing data on the CPU. Constants work by reference, similar to pointers in C. They return the memory address they are assigned. They are basically just aliases to memory addresses. Also, as the picoAssembler completely lacks any kind of branch prediction, constants, like labels, can be declared only once. Afterwards, the value in the address the constant is pointing to can be changed using loading / storing instructions of the CPU, but the address itself can not. So although it is possible to access the constant from anywhere in the code (including before it is declared, as the declarations are taken care by the pre-processor), you will probably want to declare it in the beginning of the code. Let's call it a good coding practice for picoAssembler.
Example:
#Testing memory operations $NumMem 0x4 $Index 0x0 LOADN 5 ;Load number 5 to AC STORE $NumMem ;STORE 5 to $NumMem LOADN 0 ;Zero the AC LOADV 0x4 ;Load the value of 0x4 ($NumMem) (5) LOADN 4 ;Load 4 to AC STORE 0x0 ;Store 4 in 0x0 ($Index) LOADN 0 ;Zero the AC LOADV 0x0 ;Load memort address from 0x0 ($Index) to AC LOADA ;Load data from address stored on AC (5) HALT ;Halt the cpu
Output format
The assembler outputs the machine code as 16-bit binary vectors. Each line represents one memory address in the program memory, first line = 0x0, second = 0x1 etc. All the vectors are in the following format:
(the unused part is represented as zeroes)
The size of opcode can vary, depending on the CPU architecture. Shorter opcodes are also possible. For example for 5-bit opcodes there would be 3 unused bits of followed by 5 bits of opcaode.
Download
PicoASM can be downloaded from it's git repository.
It has to be noted, that, at this time, it shares the repository also with the picoCPU and picoLang projects. PicoASM resides in the picoASM folder.
PicoASM is written in java and ANTLR 4.5.