SMOKE-16 Architecture ------------------------------------------- $Id: opcodes.txt,v 1.18 2001/07/26 07:15:46 bsittler Exp $ This document contains a brief description of the SMOKE-16 architecture. None of this is set in stone yet. --BCWS ----------------------------------------------------------------- OVERVIEW OF THE SMOKE-16 ARCHITECTURE The SMOKE-16 is a big-endian 16-bit architecture, capable of addressing up to 128k 8-bit bytes of memory from user mode (or up to 64k when data and address space are unified.) The supervisor mode design hasn't been finalized yet, so I won't tease you with much of it here. There is currently no SMOKE-16 support for floating-point acceleration, but such support may be added someday, probably using a variant of the lds/sts instructions. The user mode is emulated by the 'emu' program, which reflects a few of your operating system's basic features through SMOKE-16 interrupt calls (listed in 'smoke16/syscall.h'.) In general, the interrupt calls have UNIX system call semantics. See the emulator sources and the C library sources for details. The assembler syntax is heavily influenced by GNU as (gas) and the GNU assembler preprocessor (gasp.) In the future, I expect to use gasp as a preprocessor for all my SMOKE-16 assembly, but for now I use cpp. REGISTERS The SMOKE-16 has 16 general purpose registers and 4 system registers (S-REGS) which are visible to the programmer. Additional mnemonic for a register are listed after the primary mnemonic. All registers are 16 bits wide. - SYSTEM REGISTERS (%sX) %s0/%pc: program counter, contains address of the current instruction %s1/%ps: processor status, contains several fields: - 0x8000: supervisor mode (K) (not yet...) - 0x7f00: [mask] priority of current hardware interrupt (not yet...) - 0x00c0: [mask] comparison flags: 0x0080: "less than"/"carry"/"borrow" flag (C) 0x0040: "equal to"/"zero" flag (Z) - 0x003f: [mask] reserved for future use %s2/%pt: page table base (not yet...) %s3/%tt: trap table base (not yet...) - GENERAL REGISTERS(%X/%rX) %0/%r0: always 0x0000, writing to it has no effect %1/%r1/%t0: temporary, saved %2/%r2/%t1: temporary, saved %3/%r3/%t2: temporary, saved %4/%r4/%t3: temporary, saved %5/%r5/%t4: temporary, saved %6/%r6/%t5: temporary, saved %7/%r7/%t6: temporary, non-saved (reserved for assembler use) %8/%r8/%a0: argument 0/return value 0, non-saved %9/%r9/%a1: argument 1/return value 1, non-saved %10/%r10/%a2: argument 2/return value 2, non-saved %11/%r11/%a3: argument 3/return value 3, non-saved %12/%r12/%fp: frame pointer (points to caller's frame pointer) %13/%r13/%sp: stack pointer (points to highest unused word on stack) %14/%r14/%ra: return address %15/%r15/%gp: global pointer (use discouraged) - By convention, return values of 4 words or less are returned in %a0.. %a3, and all other values are returned by having the caller pass the address of a temporary structure on the stack as an "invisible" first argument. *This address is returned by the callee.* - By convention, arguments of more than 4 words are passed by reference to temporary stack space set aside by the caller. Arguments past the fourth word are passed on the stack in the caller's frame, with the first argument at the lowest address. - Saved means a function should return the register to its original state before returning. Non-saved means a function should place the value on the stack before calling a sub-function, and restore it from the stack after the sub-function returns. INSTRUCTION SET The SMOKE-16 has a fixed-length instruction word of 16 bits. This is somewhat limiting (SMOKE-16 has only 32 actual instructions,) but it's not an insurmountable obstacle. Many instructions commonly used on other processors are aliased to one or more actual instructions with additional arguments. Some instructions have more than one name, and the different names can be used interchangeably. Actual instructions are marked with a double asterisk ('**'); Aliased instructions are marked with a double dash ('--'). See the end of this file for additional notes. MOV/LD/ST/LDS/STS/LDI - load/store word -- mov , %rdest -- ldi , %rdest => sethi %hi(), %rdest movb %lo(), %rdest -- mov %rsrc, %rdest -- st %rsrc, %rdest => or %0, %rsrc, %rdest -- mov [], %rdest -- ld [], %rdest => sethi %hi(), %t6 movb %lo(), %t6 mov [%t6 + %0], %rdest ** mov [%raddr + ], %rdest -- ld [%raddr + ], %rdest Encoding: 0x6000 | (rdest << 8) | ((offset4 & 0x1e) << 3) | raddr Load the word starting at + %raddr into %rdest. -- mov [%raddr], %rdest => mov [%raddr + 0], %rdest -- mov %rsrc, [] -- st %rsrc, [] => sethi %hi(), %t6 movb %lo(), %t6 mov %rsrc, [%t6 + %0] ** mov %rsrc, [%raddr + ] -- st %rsrc, [%raddr + ] Encoding: 0xd000 | (rsrc << 8) | ((offset4 & 0x1e) << 3) | raddr Store %rsrc at starting at + %raddr. -- mov %rsrc, [%raddr] => mov %rsrc, [%raddr + 0] ** mov %ssrc, %rdest -- lds %ssrc, %rdest -- ld %ssrc, %rdest Encoding: 0xf400 | (rdest << 4) | ssrc [supervisor mode only] Copy the value in special register %ssrc into %rdest. ** mov %rsrc, %sdest -- sts %rsrc, %sdest -- st %rsrc, %sdest Encoding: 0xf500 | (rsrc << 4) | sdest [supervisor mode only] Copy the value in %rsrc into special register %sdest. SETHI - set high byte, clear low byte ** sethi , %rdest Encoding: 0x7000 | (rdest << 8) | symexpr8 Loads into the upper half of %rdest. Clears the lower half of %rdest. MOVB/LDB/STB/LDBI - load/store byte -- movb [], %rdest -- ldb [], %rdest => sethi %hi(), %t6 movb %lo(), %t6 movb [%t6], %rdest ** movb [%raddr], %rdest -- ldb [%raddr], %rdest Encoding: 0xf200 | (rdest << 4) | raddr Load the byte at address %raddr into the low half of %rdest. -- movb %rsrc, [] -- stb %rsrc, [] => sethi %hi(), %t6 movb %lo(), %t6 movb %rsrc, [%t6] ** movb %rsrc, [%raddr] -- stb %rsrc, [%raddr] Encoding: 0xf300 | (rsrc << 4) | raddr Store the low half of %rsrc at the address %raddr. ** movb , %rdest -- ldbi , %rdest Encoding: 0xe000 | (rdest << 8) | symexpr8 Load into the lower half of %rdest. Does not affect the upper half of %rdest. ADD - add to words ** add %rsrc, %rarg, %rdest Encoding: 0x0000 | (rdest << 8) | (rsrc << 4) | rarg Place the sum of the values in %rsrc and %rarg in %rdest. Affects the C and Z flags. -- add %rarg, %rdest => add %rdest, %rarg, %rdest ** add , %rdest -- addi , %rdest Encoding: 0xfe00 | (rdest << 4) | immed4 Increment %rdest by . Affects the C and Z flags. SUB/SUBI - subtract from words ** sub %rsrc, %rarg, %rdest Encoding: 0x1000 | (rdest << 8) | (rsrc << 4) | rarg Place the difference of the values in %rsrc and %rarg in %rdest. Affects the C and Z flags. -- sub %rarg, %rdest => sub %rdest, %rarg, %rdest ** sub , %rdest -- subi , %rdest Encoding: 0xfd00 | (rdest << 4) | immed4 Decrement %rdest by . Affects the C and Z flags. CMP/CMPL - compare word ** cmp %rsrc1, %rsrc2 Encoding: 0xf100 | (rsrc1 << 4) | rsrc2 Compare %rsrc1 and %rsrc2. Affects the C and Z flags. Treats operands as signed. -- cmpl %rsrc1, %rsrc2 => sub %rsrc1, %rsrc2, %0 AND - bitwise and ** and %rsrc, %rarg, %rdest Encoding: 0x4000 | (rdest << 8) | (rsrc << 4) | rarg Place the bitwise AND of the values in %rsrc and %rarg in %rdest. -- and %rarg, %rdest => and %rdest, %rarg, %rdest OR - bitwise inclusive or ** or %rsrc, %rarg, %rdest Encoding: 0x5000 | (rdest << 8) | (rsrc << 4) | rarg Place the bitwise OR of the values in %rsrc and %rarg in %rdest. -- or %rarg, %rdest => or %rdest, %rarg, %rdest XOR - bitwise exclusive or ** xor %rsrc, %rarg, %rdest Encoding: 0xb000 | (rdest << 8) | (rsrc << 4) | rarg Place the bitwise XOR of the values in %rsrc and %rarg in %rdest. -- xor %rarg, %rdest => xor %rdest, %rarg, %rdest NOT - bitwise complement ** not %rsrc, %rdest Encoding: 0xf000 | (rdest << 4) | rsrc Place the bitwise complement of the value in %rsrc in %rdest. -- not %rdest => not %rdest, %rdest ROL/ROR - rotate bits in word ** rol %rsrc, , %rdest Encoding: 0x2000 | (rdest << 8) | (rsrc << 4) | immed4 Place %rsrc rotated left by bits in %rdest. Affects the C and Z flags. -- rol , %rdest => rol %rdest, , %rdest ** ror %rsrc, , %rdest Encoding: 0x3000 | (rdest << 8) | (rsrc << 4) | immed4 Place %rsrc rotated right by bits in %rdest. Affects the C and Z flags. -- ror , %rdest => ror %rdest, , %rdest SL/SLA/SLL/SR/SRA/SRL - shift bits in word ** sl %rsrc, , %rdest -- sla %rsrc, , %rdest -- sll %rsrc, , %rdest Encoding: 0x8000 | (rdest << 8) | (rsrc << 4) | immed4 Place %rsrc in %rdest, and shift it left by bits. Affects the C and Z flags. -- sl , %rdest -- sll , %rdest -- sla , %rdest => sl %rdest, , %rdest ** sr %rsrc, , %rdest -- sra %rsrc, , %rdest Encoding: 0x9000 | (rdest << 8) | (rsrc << 4) | immed4 Place %rsrc in %rdest, and shift it right by bits, performing sign-extension of negative values. Affects the C and Z flags. -- sr , %rdest -- sra , %rdest => sr %rdest, , %rdest ** srl %rsrc, , %rdest Encoding: 0xa000 | (rdest << 8) | (rsrc << 4) | immed4 Place %rsrc in %rdest, and shift it right by bits, treating values as unsigned. Affects the C and Z flags. -- srl , %rdest => srl %rdest, , %rdest BL/BC/BGE/BNC/BE/BZ/BNE/BNZ/BG/B - conditional and unconditional short branch ** bl -- bc Encoding: 0xf600 | (disp8 >> 1) Branch to if the C flag is set. ** be -- bz Encoding: 0xf700 | (disp8 >> 1) Branch to if the Z flag is set. ** bne -- bnz Encoding: 0xf800 | (disp8 >> 1) Branch to if the Z flag is not set. ** bge -- bnc Encoding: 0xf900 | (disp8 >> 1) Branch to if the C flag is not set. ** bg Encoding: 0xfa00 | (disp8 >> 1) Branch to if neither the Z flag nor the C flag is set. ** b Encoding: 0xfb00 | (disp8 >> 1) Branch to unconditionally. JAL/J/CALL/RET - long jump, procedure call/return ** jal %raddr + %roff, %rret Encoding: 0xc000 | (rret << 8) | (roff << 4) | raddr Jump to the address %raddr + %roff, placing the address of the instruction which would have been executed next in %rret. -- jal + %roff, %rret => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %roff, %rret -- jal , %rret => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %0, %rret -- jal %raddr, %rret => jal %raddr + %0, %rret -- j + %roff => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %roff, %0 -- j => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %0, %0 -- j %raddr + %roff => jal %raddr + %roff, %0 -- j %raddr => jal %raddr + %0, %0 -- call + %roff => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %roff, %ra -- call => sethi %hi(), %t6 movb %lo(), %t6 jal %t6 + %0, %ra -- call %raddr + %roff => jal %raddr + %roff, %ra -- call %raddr => jal %raddr + %0, %ra -- ret => jal %ra + %0, %0 INT/IRET - interrupt call/return ** int Encoding: 0xfc00 | immed7 Store the processor state to the kernel stack, and execute the handler for interrupt . (Doesn't actually do this in emu or emu--, but user code can't tell the difference anyhow.) See syscall.h for some useful (i.e., implemented) interrupts. ** iret Encoding: 0xffff [supervisor mode only] Restore the processor state from the kernel stack. NOP - no operation ** nop Encoding: 0xfff0 No effect. PUSH/POP - stack pseudo-instructions -- push => sub 2, %sp -- push %rsrc => sub 2, %sp mov %rsrc, [%sp + 2] -- pop => add 2, %sp -- pop %rdest => mov [%sp + 2], %rdest add 2, %sp == NOTES ============================================================== - is an arithmetic expression involving constants, the usual C arithmetic operators, and the %hi()/%hi16() and %lo()/%lo16 constructs. - %hi(x) is equivalent to x >> 8 - %hi16(x) is equivalent to x >> 16 - %lo(x) is equivalent to x & 0xff - %lo16(x) is equivalent to x & 0xffff - constants can be: - decimal digits, with optional sign prefix - 0 followed by octal digits, with optional sign prefix - 0x followed by hexadecimal digits, with optional sign prefix - a C character constant (i.e. 'c', '\n', '\0377', '\0x1e', etc.) - is one of the following: - - . - + - - - is a doubled 4-bit - is one of the following: - (which is interpreted as an absolute address) - has a limited range since is uses a signed, doubled 8-bit displacement from the instruction following the branch - is an X-bit - is %hi(), %lo(), or =======================================================================