PicoAssembler user manual and syntax reference

From ATI public wiki
Jump to: navigation, search

Using the program

The standard command line to run the program is:

java -jar picoAssembler -a <path to assembler file> -i <path to instruction set file> -o <path to output file>
  • Assembler file is the file with assembler code to be compiled.
  • Instruction set file is the file that maps instructions to their codes (defaults to intructions.conf).
  • Output file is the file, where the machinecode is saved (defaults to machinecode.bin).

It is only necessary to specify the assembler file, the other two have default values that will be used if they are not specified on the command line. It is also possible to force the program to not to write the output code into a file at all and just display the output on the screen using the -no flag.

For more info on different command line options, use the --help flag.

Instruction set file syntax

The instruction set file is a text file that maps the higher level instruction names to the opcodes. It is used for configuring the assembler. The instructions are defined as follows:

instruction_name : opcode : is_operand_signed
  • instruction_name: The mnemonic name for instruction, like ADD, SUB, etc.
  • opcode : Opcode that maps to the mnemonic.
  • is_operand_signed: Specifies, if the operand for this instruction is signed (1) or not (0). This depends on the CPU architecture, that it is used with, but in general, instructions that manupulate data, like addition, subtraction, etc. should be considered as signed. On the other hand, if the operand represents for example an memory address or a program counter value, it should be considered unsigned.

Every instruction has to be defined on a new line.

Comments start with the # symbol.

Example:

ADD  : 00000 : 1
SUB  : 00001 : 1
JUMP : 00010 : 0

The leading zeros are automatically added, you do not have to include them in your assembly code. Negative numbers are also supported and automatically decoded into their complement:

Decimal:        -5      ->  11111011
Hexadecimal:    -0x5    ->  11111011
Binary:         -0b101  ->  11111011

Assembler syntax

Operations

Each operation is defined by opcode and operand that are separated by a whitespace

OPCODE <operand>

Specifying the operand is optional, if it is not specified, the assembler automatically adds 0x00 as the operand. Operand can be either in decimal, hexadecimal or binary format:

 Decimal:      5
 Hexadecimal:  0x5
 Binary:       0b101

Comments

Comments start with ' ; ' symbol

; An awesome example comment

Multi-line comments are not supported.

Labels

GOTO style labels are also supported.

Declaration

To declare a label label, it has to be put on a separate line and end with the ' : ' symbol

AnExampleLabel:

Usage

To branch to the label use it with your branching operation as an operand. Also, notice that the ' : ' symbol has to be in the beginning of the label when it is used as an operand for an instruction:

GOTO  :AnExampleLabel

This gets replaced by the program memory address of the operation that follows the label.

Example: The following program first loads number 5 to the accumulator, then decrements the number in the accumulator by one, then checks the zero flag (the BRNZ instruction). If the flag is not up (the answer in the accumulator is not zero), then the code will branch to the address or label in the operand. In this case that means branching to the address defined by the Decrement: label. In the comments are shown the addresses in the program memory for every instruction. In case the zero flag is up (the answer is zero), there is no branching and the CPU halts.

LOADN 5             ;0x0

Decrement:  
DEC                 ;0x1
BRNZ  :Decrement    ;0x2
HALT                ;0x3

This gets resolved by the assembler into the following:

LOADN 5             ;0x0
DEC                 ;0x1
BRNZ  0x1           ;0x2
HALT                ;0x3

It is possible to declare one label only once in the code (it can't be re-declared), there are no limitations to the time one label can be used for branching . It is allowed to branch to the label before it is defined (earlier in the code).

Constants

The picoAssembler also supports constants. You can think of the constants as an alias for a data memory address, so it is possible to use some meaningful name to them. For example, you can access your memory address for temporary data as $tmp instead of 0b10101101 . Constant names begin with the ' $ ' symbol.

Declaration

The line for declaring a constant is the following:

$TestConst 52

Usage

To use the constant, it has to be given to an instruction as an operand

LOADV $TestConst

It is important to understand, that the constants do not work as variables, it is not possible to get the value of the memory address they are pointing to. For this it is necessary to use native instructions for loading and storing data on the CPU. Constants work by reference, similar to pointers in C. They return the memory address they are assigned. They are basically just aliases to memory addresses. Also, as the picoAssembler completely lacks any kind of branch prediction, constants, like labels, can be declared only once. Afterwards, the value in the address the constant is pointing to can be changed using loading / storing instructions of the CPU, but the address itself can not. So although it is possible to access the constant from anywhere in the code (including before it is declared, as the declarations are taken care by the pre-processor), you will probably want to declare it in the beginning of the code. Let's call it a good coding practice for picoAssembler.

Example:

#Testing memory operations
$NumMem 0x4
$Index 0x0

LOADN    5          ;Load number 5 to AC
STORE    $NumMem    ;STORE 5 to $NumMem

LOADN  0     ;Zero the AC
LOADV  0x4   ;Load the value of 0x4 ($NumMem) (5)
LOADN  4     ;Load 4 to AC

STORE  0x0   ;Store 4 in 0x0 ($Index)
LOADN  0     ;Zero the AC

LOADV  0x0 ;Load memort address from 0x0 ($Index) to AC
LOADA  ;Load data from address stored on AC (5)
HALT ;Halt the cpu

Output format

The assembler outputs the machine code as 16-bit binary vectors. Each line represents one memory address in the program memory, first line = 0x0, second = 0x1 etc. All the vectors are in the following format:

Instructionformat.png


(the unused part is represented as zeroes)

The size of opcode can vary, depending on the CPU architecture. Shorter opcodes are also possible. For example for 5-bit opcodes there would be 3 unused bits of followed by 5 bits of opcaode.

Download

PicoASM can be downloaded from it's git repository.

It has to be noted, that, at this time, it shares the repository also with the picoCPU and picoLang projects. PicoASM resides in the picoASM folder.

PicoASM is written in java and ANTLR 4.5.