SMOKE-16 'a.out' and Related Formats ---------------------------- $Id: a_out.txt,v 1.8 2000/11/09 19:55:12 bsittler Exp $ This document contains a brief description of the SMOKE-16 binary file formats. --BCWS ----------------------------------------------------------------- NOTE ============================================================ Toolset v1, revision 19980411 and earlier, incorrectly stamped their object files as being written by "v0" of the toolset. Later revisions of toolset v1 can use these "v0" files, but write v1 files which have a different header format and default load address, and are therefore unusable by the older toolset revisions. The '-A v0' option to 'as' and 'ld' writes files usable by the older toolset revisions. ================================================================= Note: The '.h' files mentioned below are in the 'include/smoke16' directory. OVERVIEW OF SMOKE-16 BINARY FILE FORMATS The SMOKE-16 toolset uses a variant of the 'a.out' file format for object files (OMAGIC), pure executables (NMAGIC), split I&D executables (JMAGIC), and object archives (LMAGIC). You may wish to refer to 'a_out.h' when reading this document, as it contains more detailed information on the file format. In particular, realize that this is a portable file format specification, and these structures are converted into a more host-specific format when loaded into memory by the fget16* functions (listed in 'libObj.h'.) Alignment, padding, endianness and word size may differ from the portable format. The fput16* functions convert back into the portable format. The portable format stores numbers with the most significant byte first. Strings (see the STRING TABLE description, below) are stored in an ASCII subset described in 'portable.txt' in the 'doc' directory. A SMOKE-16 'a.out' file contains the following sections: header, text section (or archive directory,) data section (or member objects,) uninitialized data section (BSS,) text and data relocation sections, and symbol table. HEADER 'struct a_out16_exec' +--------+--------+ 0 | a_info | 1 a_dynamic:1 a_toolversion:7 a_machtype:8 +--------+--------+ 2 | a_magic | 3 +--------+--------+ 4 | a_text | 5 + + + 6 | | 7 +--------+--------+ 8 | a_data | 9 + + + 10 | | 11 +--------+--------+ 12 | a_bss | 13 +-----------------+ 14 | a_syms | 15 +--------+--------+ 16 | a_entry | 17 +--------+--------+ 18 | a_trsize | 19 +--------+--------+ 20 | a_drsize | 21 +--------+--------+ a_toolversion Description Symbol ('a_out.h') ----------------------------------------------------------------- 0 "v0" (see NOTE above) 1 v1 (current) SMOKE16_TOOLVERSION ----------------------------------------------------------------- a_machtype Description Symbol ('a_out.h') ----------------------------------------------------------------- 120 SMOKE-16 M_SMOKE16 ----------------------------------------------------------------- a_magic Description Symbol ('a_out.h') ----------------------------------------------------------------- 0407 object or impure executable SMOKE16_OMAGIC 0410 pure executable SMOKE16_NMAGIC 0411 split I&D executable SMOKE16_JMAGIC 0440 object archive SMOKE16_LMAGIC ----------------------------------------------------------------- This is a 22-byte structure 'struct a_out16_exec' which starts with a two-byte information field 'a_info' which identifies the file format version ('a_toolversion'; must be 0 or 1 as of this writing,) machine type ('a_machtype'; must be M_SMOKE16 as of this writing,) and whether the file uses dynamic linking ('a_dynamic'; must be zero as of this writing.) Next is two-byte magic number 'a_magic' which identifies the file as an object file (OMAGIC), pure executable (NMAGIC), split I&D executable (JMAGIC), or object archive (LMAGIC). Together, these first four bytes of the header provide a fairly unique identification usable by the `file' command; see provided 'magic' file for more information. The next four bytes of the header contain the length of the text section in bytes ('a_text'). Immediately following this are four more bytes which give the size of the initialized data section in bytes ('a_data'). After that, there are two bytes giving the length of the uninitialized data section, or BSS ("Block Started by Symbol", so-called for historical reasons) in bytes ('a_bss'). This section is not recorded in the object file, since it is uninitialized. Following this are two bytes which give the length of the symbol table in bytes ('a_syms'), and two more bytes which give the absolute address of the program entry point 'a_entry'. The '-E' option can override this address for `as', `ld' and `emu'. The '-e' option for `ld' can specify an entry symbol (the default is '__entry'.) Finally, the header contains two bytes giving the length of the text relocations in bytes ('a_trsize') and two bytes giving the length of the data relocations in bytes ('a_drsize'). TEXT SECTION In normal object files (OMAGIC), pure executables (NMAGIC), and split I&D executables (JMAGIC), the text section contains the program code. The text section is loaded into the program's instruction address space starting at address 0x0400 (in split I&D executables (JMAGIC) the instruction address space is separate from the data address space.) The '-Ttext' option can override this address for `as', `ld' and `emu'. Constant data is usually placed in the data section, but may be placed in the text section instead (see the DATA SECTION description, below.) DATA SECTION In normal object files (OMAGIC), pure executables (NMAGIC), and split I&D executables (JMAGIC), the data section contains any initialized data the program may require. In normal object files (OMAGIC) and pure executables (NMAGIC), the data section is loaded into the program's data address space immediately following the text section. In split I&D executables (JMAGIC), the data section is loaded into the program's data address space starting at address 0x0400. The '-Tdata' option can override this address for `as', `ld' and `emu'. By default, constant data (the assembler ".rdata" section) is written to the object file in the data section. This data can instead be placed in the text section, but that will cause problems when generating split I&D executables (JMAGIC). The '-R' option for 'as' places constant data in the text section. ARCHIVE DIRECTORY [LMAGIC] 'struct a_out16_dirent' +--------+--------+ 0 | d_strx | 1 +--------+--------+ 2 | d_magic | 3 +--------+--------+ 4 | d_value | 5 + + + 6 | | 7 +--------+--------+ 8 | d_size | 9 + + + 10 | | 11 +--------+--------+ 12 | d_mtime | 13 + + + 14 | | 15 +--------+--------+ In object archives (LMAGIC), the text section serves as a directory of the archive. The archive directory consists of a sequence of 16-byte directory entry structures 'struct a_out16_dirent' with the following format: a two-byte offset into the string table for the member's filename 'd_strx' (see the STRING TABLE description, below,) a two-byte copy of the member's magic number 'd_magic' (see the HEADER description, above,) a four-byte offset to the member in the data segment 'd_value', a four-byte length of the member 'd_size', and a four-byte modification time 'd_mtime'. MEMBER OBJECTS [LMAGIC] In object archives (LMAGIC), the data section holds the member object files. UNINITIALIZED DATA SECTION (BSS) The uninitialized data section is not stored in the object file. The uninitialized data section is loaded into the program's address space immediately following the data section. The '-Tbss' option can override this address for `ld' and `emu'. TEXT AND DATA RELOCATION SECTIONS 'struct a_out16_reloc' +--------+--------+ 0 | r_addr_high | 1 +--------+--------+ 2 | r_addr_low | 3 +--------+--------+ 4 | r_index | 5 +--------+--------+ 6 | r_info | 7 r_extern:1 r_high:1 r_low:1 ... r_type:2 +--------+--------+ 8 | r_value | 9 +--------+--------+ r_type Description Symbol ('a_out.h') ----------------------------------------------------------------- 0 absolute address SMOKE16_RELOC_ABSOLUTE 1 %pc-relative displacement SMOKE16_RELOC_DISP8 ----------------------------------------------------------------- The relocation sections hold lists of relocations, or fixups, which need to be made to the relevant sections before they can be used. Each relocation is described in a 10-byte structure 'struct a_out16_reloc': The first two bytes hold the address in the relevant segment of the high byte of the address to be relocated 'r_addr_high'. The next two bytes hold the address of the low byte of the address to be relocated 'r_addr_low'. The next two bytes 'r_index' hold the symbol ordinal for the symbol (in the case of symbol-relative relocations) or the section (see 'n_type' in the SYMBOL TABLE description, below) that the relocation is relative to. The next two bytes 'r_info' hold various flags relating to the relocation, such as whether it is symbol-relative or not ('r_extern',) whether each of the low and high bytes is to be relocated ('r_high'/'r_low',) and the type of relocation to perform once the address is known ('r_type'.) The final two bytes hold the offset to be added to the relocated address 'r_value'. SYMBOL TABLE 'struct a_out16_nlist' +--------+--------+ 0 | n_strx | 1 +--------+--------+ 2 | n_type |n_other | 3 +--------+--------+ 4 | n_desc | 5 +--------+--------+ 6 | n_value | 7 +--------+--------+ n_type Description Symbol ('a_out.h') ----------------------------------------------------------------- 0x01 external (flag) SMOKE16_N_EXT 0x1e basic types (mask) SMOKE16_N_TYPE 0x00 undefined/common SMOKE16_N_UNDF 0x02 absolute SMOKE16_N_ABS 0x04 text section SMOKE16_N_TEXT 0x06 data section SMOKE16_N_DATA 0x08 bss SMOKE16_N_BSS 0x0c alignment SMOKE16_N_ALIGN 0xe0 debugging types (mask) SMOKE16_N_STAB ----------------------------------------------------------------- The symbol table contains a list of symbol descriptions in 8-byte structures 'struct a_out16_nlist': The first two bytes hold the offset into the string table (see the STRING TABLE description, below) of the symbol's name 'n_strx', or zero if the symbol is anonymous. The following byte contains the type of the symbol 'n_type'. The type may be internal or external, and may indicate that the symbol is to be used for debugging purposes. The basic types are undefined/common, absolute, text-segment relative, data-segment relative, and bss-relative. The following byte contains the stab "other" field 'n_other' for debugging symbols. The next two bytes contain the stab "desc" field 'n_desc'. [stab is a debugging information format; it's not yet properly supported by the SMOKE-16 toolset.] The final two bytes contain the value or segment offset of the symbol 'n_value'. For common symbols (SMOKE16_N_UNDF with non-zero 'n_value'), this is the size of the common symbol in bytes. For alignment symbols, this is the alignment shift in bits. The supported alignment symbols are "@t" (text section alignment,) "@d" (data section alignment,) and "@b" (bss alignment.) In object archives (LMAGIC), the symbol table contains copies of all symbols exported by the member objects. In this case, the symbol values 'n_value' contain the ordinals of the defining members in the directory. STRING TABLE +--------+--------+ 0 | n | 1 +--------+--------+ 2 | | ... n-1 | | +--------+ The first two bytes of the string table, if present, contain 'n', the size of the entire string table in bytes. The remainder of the string table holds the names of symbols (indexed by the 'n_strx' field in the symbol table.) In object archives (LMAGIC), the string table is also used to hold the names of the member objects (indexed by the 'd_strx' field in the archive directory.) Strings are referred to by indexes from the start of the string table, so the lowest valid string index is 2. A string index of 0 refers to an empty string. All strings (with the possible exception of the last one) are terminated with nulls.