Part 6 :
Addition
add r1 , r2, r3 |
add performs an integer addition of the two source operands (r1 + r2) and puts the result in the destination operand (r3).
size : | 8 | 6 | 6 | 6 | 6 |
bits : | 0 7 | 8 13 | 14 19 | 20 25 | 26 31 |
function : | OP_ADD | Flags | Reg 3 | Reg 2 | Reg 1 |
Flags | Syntax | Values | Function |
8-9 | .q, .d or .b postfix | Defines the size parameter | |
10 | s- prefix | 1 if set | Defines if the operation is SIMD |
11 | (none yet) | 0 | Reserved |
12 | -s postfix | 1 if set | Saturation flag |
13 | -c postfix | 1 if set | Carry flag ( 2r2w ) |
Examples :
Scalar :
R1 contains 0xF8 (we only consider the lower byte in the registers)
R2 contains 0x0F
add.b r1,r2,r3 : r3 = 0x07 (default behaviour)
adds.b r1,r2,r3 : r3 = 0xFF (saturation)
addc.b r1,r2,r3 : r3 = 0x07, r4= 0x01 (carry)
SIMD :
R1 contains 0x000000F800000001 (in a 64-bit system)
R2 contains 0x0000000F00000002
sadd.b r1,r2,r3 : r3 = 0x0000000700000003 (default behaviour)
sadds.b r1,r2,r3 : r3 = 0x000000FF00000003 (saturation)
saddc.b r1,r2,r3 : r3 = 0x0000000700000003 , r4= 0x0000000100000000 (carry)
Execution Unit : Add/Sub Unit
Latency : 1 cycle for 8-bit data, 2 cycles for 16-bit to 64-bit data
Throughput : 1 operation per cycle per ASU.
Substraction
sub r1 , r2, r3 |
sub performs an integer substraction of the two source operands (r1 - r2) and puts the result in destination operand (r3).
size : | 8 | 6 | 6 | 6 | 6 |
bits : | 0 7 | 8 13 | 14 19 | 20 25 | 26 31 |
function : | OP_SUB | Flags | Reg 3 | Reg 2 | Reg 1 |
Flags | Syntax | Values | Function |
8-9 | .q, .d or .b postfix | Defines the size parameter | |
10 | s- prefix | 1 if set | Defines if the operation is SIMD |
11 | (none yet) | 0 | Reserved |
12 | -f postfix | 1 if set | Floor flag |
13 | -b postfix | 1 if set | Borrow flag ( 2r2w ) |
Examples :
Scalar :
R1 contains 0x05 (we only consider the lower byte in the registers)
R2 contains 0x07
sub.b r1,r2,r3 : r3 = 0xFE (default behaviour)
subf.b r1,r2,r3 : r3 = 0x00 (floor)
subb.b r1,r2,r3 : r3 = 0xFE, r4= 0xFF (borrow)
SIMD :
R1 contains 0x0000000500000003 (in a 64-bit system)
R2 contains 0x0000000700000001
ssub.b r1,r2,r3 : r3 = 0x0000000700000003 (default behaviour)
ssubf.b r1,r2,r3 : r3 = 0x0000000000000002 (floor)
ssubb.b r1,r2,r3 : r3 = 0x000000FE00000002, r4= 0x000000FF00000000 (borrow)
Execution Unit : Add/Sub Unit
Latency : 1 cycle for 8-bit data, 2 cycles for 16-bit to 64-bit data
Throughput : 1 operation per cycle per ASU.
Multiplication
mul r1, r2, r3 |
mul performs an integer multiplication of the two source operands (r1 x r2) and puts the result in the destination operand (r3). The size defined by the size flags corresponds to the size of the source operands.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MUL | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved | |
12 | [-s postfix] | Sign extension flag |
13 | [-h postfix] | High part flag ( 2r2w ) |
[When all the errors will be corrected]
Performance (FC0 only) :
Execution Unit : Integer Multiply Unit
Latency : unknown ATM, depends on the size of the operands
Throughput : unknown ATM, probably 1 operation per cycle per IMU.
Division
div r1, r2, r3 |
div performs an integer division of the two source operands (r1 / r2) and puts the result in destination operand (r3). The size defined by the size flags corresponds to the size of the source operands.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_DIV | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved | |
12 | [-s postfix] | Sign extension flag |
13 | [-m postfix] | Modulo flag ( 2r2w ) |
[When all the errors will be corrected]
Performance (FC0 only) :
Execution Unit : Integer Divide Unit
Latency : unknown ATM, depends on the size of the operands
Throughput : unknown ATM, probably equal to the latency (not pipelined).
Addition Immediate
addi r1, i8, r2 |
Computes r2 = r1 + i8.
This instruction is similar to the ``add'' instruction but it takes one of the source operands from the opcode and sign-extends it. It has less room for the options and flags, so the usage of the reserved bit is still being discussed.
| 7 | 8 | 11 | 12 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_ADDI | FLAGS | imm8 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved |
R1 contains 0x00F80F00F045FF82
addi.b 0x87,r2,r3 : r3 = 0x00F80F00F045FF09
addi.d 0x87,r2,r3 : r3 = 0x00F80F00F0450009
saddi.b 0x87,r2,r3 : r3 = 0x877F968777CC8609
saddi.d 0x87,r2,r3 : r3 = 0x017F0F87F0CC0009
Performance (FC0 only) :
Execution Unit : Add/Sub Unit
Latency : 1 cycle for 8-bit data, 2 cycles for 16-bit to 64-bit data
Throughput : 1 operation per cycle per ASU.
Substraction Immediate
subi r1, i8, r2 |
Computes r2 = r1 - i8.
This instruction is similar to the ``sub'' instruction but it takes one of the source operands from the opcode and sign-extends it. It has less room for the options and flags, so the usage of the reserved bit is still being discussed.
| 7 | 8 | 11 | 12 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_SUBI | FLAGS | imm8 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved |
[When all the errors will be corrected]
Performance (FC0 only) :
Execution Unit : Add/Sub Unit
Latency : 1 cycle for 8-bit data, 2 cycles for 16-bit to 64-bit data
Throughput : 1 operation per cycle per ASU.
Multiplication Immediate
muli r1, i8, r2 |
Computes r2 = r1 x i8.
This instruction is similar to the ``mul'' instruction but it takes one of the source operands from the opcode and sign-extends it. It has less room for the options and flags, so the usage of the reserved bit is still being discussed.
| 7 | 8 | 11 | 12 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MULI | FLAGS | imm8 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved |
[When all the errors will be corrected]
Performance (FC0 only) :
Execution Unit : Integer Multiply Unit
Latency : unknown ATM, depends on the size of the operands
Throughput : unknown ATM, probably 1 operation per cycle per IMU.
Division Immediate
divi r1, i8, r2 |
Computes r2 = r1 / i8. This instruction is similar to ``div'' but the second operand is the sign-extended value of imm8.
| 7 | 8 | 11 | 12 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_DIVI | FLAGS | imm8 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
11 | Reserved |
[When all the errors will be corrected]
Performance (FC0 only) :
Execution Unit : Integer Divide Unit
Latency : unknown ATM, depends on the size of the operands
Throughput : unknown ATM, probably equal to the latency (not pipelined).
Modulo
mod r1, r2, r3 |
Computes r3 = r1 mod r2.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MOD | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
Modulo Immediate
modi r1, i8, r2 |
Computes r2 = r1 mod i8.
| 7 | 8 | 11 | 12 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MODI | FLAGS | imm8 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [.q, .d or .b postfix] | Defines the size parameter |
10 | [s- prefix ] | Defines if the operation is SIMD |
Shift
shift r1, r2, r3 |
Computes r3 = r1 << r2 or r3 = r1 >> r2
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_SHIFT | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [lr] | Defines if the direction is |
left (cleared) or right (set). | ||
11 | [a] | Defines if the operand is signed |
Rotation
rot r1, r2, r3 |
Computes r3 = r1 <- r2 or r3 = r1 -> r2
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_ROT | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [lr] | Defines if the direction is |
left (cleared) or right (set). |
Shift Immediate
shifti r1, imm6, r2 |
Computes r2 = r1 << imm6 or r2 = r1 << imm6
imm6 is unsigned.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_SHIFTI | FLAGS | imm6 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [lr] | Defines if the direction is |
left (cleared) or right (set). | ||
11 | [a] | Defines if the operand is signed |
Rotate Immediate
roti r1, imm6, r2 |
Computes r2 = r1 <- imm6 or r2 = r1 -> imm6
imm6 is unsigned.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_ROTI | FLAGS | imm6 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [lr] | Defines if the direction is |
left (cleared) or right (set). |
Single Bit Operation
bitop r1, imm6, r2 |
Apply a logical operation to r1 and bit imm6 set.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_BITOP | FLAGS | imm6 | Reg1 | Reg2 |
l|
Flags | Values | Function |
8-9 | [scxt] | Defines the operation |
bclear is an alias for bitop.c (bit clear, andn).
bchange is an alias for bitop.x (bit change, xor).
btest is an alias for bitop.t (bit test, and).
Bitwise Logic
logic r1, r2, r3 |
Computes r3 = f(r1,r2) where f is a logic function whose truth table is defined in the flags.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_LOGIC | FLAGS | Reg1 | Reg2 | Reg3 |
l|
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [01] | Defines the value of f(0,0) |
11 | [01] | Defines the value of f(1,0) |
12 | [01] | Defines the value of f(0,1) |
13 | [01] | Defines the value of f(1,1) |
and is an alias for logic.0001 .
xor is an alias for logic.0110 .
not is an alias for logic.1010 .
nor is an alias for logic.1000 .
nandis an alias for logic.1110 .
Remark: XOR should be used to compare two numbers for equality
mix r1, r2, r3 |
Mix r1 and r2 into r3.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MIX | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8 | [hl] | Defines which part of the words |
should be mixed (high, low). |
expand r1, r2 |
Expand r1 into r2.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_EXPAND | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8 | [hl] | Defines which in part of the word |
the result should be put (high, low). |
There are different levels of implementation of floating point operations.
Level | Instructions implemented |
Level 0 | No FP |
Level 1 | fadd, fsub, fmul, int2f/f2int, finv_app, sqrt_inv_app |
Level 2 | fadd, fsub, fmul, int2f/f2int, finv, sqrt |
Level 3 | fadd, fsub, fmul, int2f/f2int, div, finv, sqrt, sqrt_inv |
Floating Point Addition
fadd r1, r2, r3 |
fadd performs a floating addition of the two source operands (r1 + r2) and puts the result in destination operand (r3). The operation should be IEEE-754 compliant.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_FADD | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Floating Point Substraction
fsub r1, r2, r3 |
fsub performs a floating substraction of the two source operands (r1 - r2) and puts the result in destination operand (r3). The operation should be IEEE-754 compliant.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_FSUB | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Floating Point Multiplication
fmul[f] r1, r2, C |
fmul performs a floating multiplication of the two source operands (r1 x r2) and puts the result in destination operand (r3). The operation should be IEEE-754 compliant.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_FMUL | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Floating Point Division
fdiv r1, r2, r3 |
fdiv performs a floating division of the two source operands (r1 / r2) and puts the result in destination operand (r3). The operation should be IEEE-754 compliant.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_FDIV | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Integer to Floating Point and Floating Point to Integer
int2f r1, r2 |
f2int r1, r2 |
``int2f'' converts integer number in register r1 into a floating point number and put it in register r2.
``f2int'' converts floating point number in register r1 into an integer number and put it in register r2.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_FCONV | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | Direction flag. | |
11 | [s] | Defines if the operation is SIMD (*) |
12 | [x] | Defines if IEEE compliance isn't required (*) |
13-15 | Rounding modes see table below. |
Value | Rounding mode |
000 | Nearest (default) |
001 | Towards 0 |
010 | Away from 0 |
011 | Towards -infinity |
100 | Towards +infinity |
Floating Point Inverse
finv r1, r2 |
Computes r2 = [1/r1]
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_FINV | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Floating Point Square Root
fsqrt r1, r2 |
Computes r2 = Ö[r1]
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_FSQRT | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Floating Point Inverse Square Root
finvsqrt r1, r2 |
Computes r2 = [1/(Ö[r1])]
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_FINVSQRT | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [f??] | Defines the size parameter |
10 | [s] | Defines if the operation is SIMD |
11 | [x] | Defines if IEEE compliance isn't required |
Reverses the bits from r1 and shifts them right by r2 bits and put the result in r3.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_BITREV | FLAGS | Reg1 | Reg2 | Reg3 |
Reverses the bits from r1 and shifts the result to the right imm6 bits and put the result in r3.
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_BITREVI | FLAGS | imm6 | Reg1 | Reg2 |
Reverses the bytes in r1 (change the endianism) and stores the result in r2.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_BYTEREV | EMPTY | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
Load
load [r1 + r2 * size], r3 |
r3 = [r1 + r2 * size]
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_LOAD | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [e] | Defines the endianness |
little-endian if cleared (default) | ||
big-endian if set | ||
11-13 | RESERVED |
Store
store r1, [r2 + r3 * size] |
[r2 + r3 * size] = r1
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_STORE | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [e] | Defines the endianness |
little-endian if cleared (default) | ||
big-endian if set | ||
11-13 | RESERVED |
Move
mov [r1,] r2, r3 |
if (r1) r3 = r2
| 7 | 8 | 13 | 14 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_MOV | FLAGS | Reg1 | Reg2 | Reg3 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10-11 | [sz] | Defines how the high part of the |
destination register will be. | ||
(See table below) |
Flag | Values | Function |
(default) | 00 | High part remains unchanged |
z | 01 | Zero extend |
s | 10 | Sign extend |
? | 11 | Reserved |
Load Constant
loadcons imm16, r1 |
Loads the imm16 constant into the register r1 at the specified location (shifts of 16 bits). The rest of the register remains unmodified.
Flags | Values | Function |
8-9 | [123] | Defines the shift parameter |
| 7 | 8 | 9 | 10 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_LOADCONS | EMPTY | imm16 | Reg1 |
Load Constant with Sign Extension
loadconsx imm16, r1 |
Loads the imm16 constant into the register r1 at the specified location (shifts of 16 bits). The higher part of the register is assigned the value of the most significant bit of the constant. The lower part of the register remains unmodified.
Flags | Values | Function |
8-9 | [123] | Defines the shift parameter |
| 7 | 8 | 9 | 10 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_LOADCONSX | EMPTY | imm16 | Reg1 |
/* LOADCONST.C by WHYGEE 14 septembre 1999 to be included in a compiler or an assembler, after some interface fixing : it currently outputs to stderr, it will output to a file the same way. */ #include "stdlib.h" #include "stdio.h" #define MAXSIZE (sizeof(long long int)) /* should be ideally 8 */ void emit_constant(unsigned long long int c, unsigned char reg) { unsigned short int data[MAXSIZE>>1]; signed long long int t,u; signed int s=0; if (reg==0) { fprintf(stderr,"\\n Error : can't write to register 0 \\n"); exit(-1); /* should be performed by an error routine that does this cleanly */ } if (c==0) { fprintf(stderr,"mov rd,r0\\n",reg); } else if (c==-1) { fprintf(stderr,"logic.1111 rd,r0,r0\\n",reg); } else if ((c>65535)&((c & -c)==c)) /* a power of two, but the latency of bitset is higher */ { do { s++; c>>=1; } while (c!=0); /* find the LSB */ if (s>63) { fprintf(stderr,"loadconsts rd,0x04X\\n",reg,s); fprintf(stderr,"bitset rd,r0,rd\\n",reg,reg); } else { fprintf(stderr,"bitset rd,r0,d\\n",reg,s); } } else /* any kind of number */ { u=c; do { t=u; data[s]=t & 0xFFFF; u=t>>16; s++; } while ((t!=u) & (s<MAXSIZE>>1)); s--; /* handle the case where the MSB of the highest data is not the sign */ if ((data[s]^data[s-1])& 0x8000) { fprintf(stderr,"loadconsts.d rd,0x04X\\n", s,reg,data[s]); s--; fprintf(stderr,"loadconst.d rd,0x04X\\n", s,reg,data[s]); s--; } else { s--; fprintf(stderr,"loadconsts.d rd,0x04X\n", s,reg,data[s]); s--; } while (s>=0) { fprintf(stderr,"loadconst.d rd,0x04X\\n", s,reg,data[s]); s--; } } }
Cache Memory Management
prefetch, flush a data block to/from a memory level.
cachemm r1, r2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [fp] | Prefetch/Flush |
11 | [l] | Lock. This flag means that the data |
are static and will be used a lot | ||
12-14 | [0-7] | Memory level (see table below) |
D | 000 | data L1 cache |
I | 001 | instructions L1 cache |
C | 010 | onchip unified cache |
011 | [unused] | |
U | 100 | offchip unified cache |
L | 101 | local memory |
G | 110 | global memory |
V | 111 | virtual memory (hard disk) |
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_CACHEMM | EMPTY | Reg1 | Reg2 |
example : "flushg ra,rb" flushes rb bytes starting at address ra from every memory level until global memory. Any cache (L1, L2, local...) containing data that belong to the block is updated in main memory and the corresponding cache spaces are freed (available for future use). this should be executed everytime the programer knows that he won't use a block of data until a certain moment, and the cache level is a hint for performance.
"preftchu ra,rb" copies the data block at address ra and size rb that is present in lower memory levels (virtual, global, local) to the unified offchip memory (at least).
forms : rr or ri (size could be immediate)
These instructions are very important for memory management, and should be used when performing SMC (for memory coherency).
Load Immediate
loadi [r1 + imm9 * size], r2 |
r2 = [r1 + imm9 * size]
| 7 | 8 | 10 | 11 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_LOADI | FLAGS | imm9 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [e] | Defines the endianness |
little-endian if cleared (default) | ||
big-endian if set |
Store Immediate
storei r1, [r2 + imm9 * size] |
[r2 + imm9 * size] = r1
| 7 | 8 | 10 | 11 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||
OP_STOREI | FLAGS | imm9 | Reg1 | Reg2 |
Flags | Values | Function |
8-9 | [qdb] | Defines the size parameter |
10 | [e] | Defines the endianness |
little-endian if cleared (default) | ||
big-endian if set |
Get and Put internal register.
R/W | Description |
R | Number of cycles |
R | Number of cycles (countdown) |
R | Number of instructions executed |
R | Number of Pages Faults |
R | Number of traps/interrupts |
R | Number of FPU traps |
R | Number of Cache hit/misses |
R | Number of correct/incorrect branch predictions |
R | Number of pipeline bubbles |
R | Number of TLB hits/misses |
R/W | Description |
RW | Old Program Counter |
RW | Old Machine Status Word |
RW | Exception Vector |
RW | Temporary |
R | Exception Reason |
R | Exception Number/Type |
R/W | Description |
R | Processor ID |
Get Internal Register
get IR[r1], r2 |
Get internal register at index r1 and put its content in register r2. The whole register gets dumped. There is no size flag.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_GET | EMPTY | Reg1 | Reg2 |
PUT Internal Register
put r1, IR[r2] |
Put contents of r1 and put into internal register at index r2. The whole register gets dumped. There is no size flag.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_PUT | EMPTY | Reg1 | Reg2 |
Get Internal Register Immediate
geti IR[imm16], r1 |
Get internal register at index imm16 and put its content in register r1. The whole register gets dumped. There is no size flag.
p
| 7 | 8 | 9 | 10 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_GETI | EMPTY | imm16 | Reg1 |
Put Internal Register Immediate
puti r1, IR[imm16] |
Get internal register at index imm16 and put its content in register r1. The whole register gets dumped. There is no size flag.
| 7 | 8 | 9 | 10 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_PUTI | EMPTY | imm16 | Reg1 |
Absolute Jump.
jmpa [r1,] r2 |
If r1 contains a non-nil value jump to the address pointed by r2.
Flags | Values | Function |
8 | [n] | Negates the condition |
9-10 | [lm] | Test the MSB or the LSB |
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_JMPA | EMPTY | Reg1 | Reg2 |
Load Address
loadaddr imm18 r1 |
Stores PC+imm18 into r1.
The result is a 64 bit address. imm18 is a signed value.
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_LOADADDR | imm18 | Reg1 |
Loop Entry
loopentry r1 |
Stores PC+4 into r1.
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_LOOPENTRY | EMPTY | Reg1 |
Loop
loop r1, r2 |
Performs two parallel things :
This overlapping of the operations allows greater parallelism and lower latency : we can loop fast without compromising security.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_LOOP | EMPTY | Reg1 | Reg2 |
Absolute Jump Immediate.
jmpi [r1,] r2, imm12 |
If r1 contains a non-nil value jump to address pointed by r2+4*imm12.
| 7 | 8 | 19 | 20 | 25 | 26 | 31 | ||||||||||||||||||||||||
OP_JMPI | imm12 | Reg1 | Reg2 |
Relative Jump Immediate.
jmpr [r1,] imm18 |
The imm18 is a signed value. Warning! All code is aligned on a 32bit boundary so the imm18 value will be shifted to the left 2 times.
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_JMPR | imm18 | Reg1 |
syscall [r1,] imm18 |
trap [r1,] imm18 |
Syscall are two names for the same instruction.
[FIX ME] But what does it do ?
The argument is ignored by the hardware and may be used to encode information for system software. To retrieve the argument system software must load the instruction word from memory.
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_SYSTRAP | imm18 | Reg1 |
halt [r1,] [imm18] |
Halts until an External Exception occurs
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_HALT | imm18 | Reg1 |
rfe [r1,] [imm18] |
Return From Exception...
[FIX ME] Little short...
| 7 | 8 | 25 | 26 | 31 | ||||||||||||||||||||||||||
OP_RFE | imm18 | Reg1 |
Type 1 | Software exception |
Type 2 | External exception (interrupt) |
Type 3 | Privilege Violation |
Type 4 | Memory Error |
Type 5 | Syscall |
[FIXME] How many differents types do we need to have, each one corresponding to one pointer in the exception vector (except for the hardware that takes all the rest).
Exception pointer vector has 64 entries. [FIXME] Confirm that.
SR_OPC | Old Program Counter |
SR_OMSW | Old Machine Status Word |
SR_EV | Exception Vector |
SR_TMP | Temporary |
SR_ER | Exception Reason |
SR_ENT | Exception Number/Type |
1 | External Interupt |
2 | Illegal Opcode |
3 | Malformed Instruction |
4 | Priviledge Violation |
5 | Integer divide by ZERO |
6 | FP divide by ZERO |
7 | FP INF-INF |
8 | FP INF/INF |
9 | FP ZERO/ZERO |
10 | FP ZERO*INF |
11 | FP SQRT(NEG) |
12 | Memory Exception |
Upon the occurance of an exception, the proccessor performs the following..
Upon Call to the RFE instruction: