Binary AST Format
The .axb binary format encodes AXON programs as pre-parsed AST nodes, eliminating lexing and parsing entirely for maximum compilation speed.
âšī¸ Implementation Status
This format is planned for Phase 2 and is not yet implemented. The specification below describes the target design.
Why Binary?
The binary AST format provides significant advantages over text-based S-expressions for AI-to-compiler communication.
No Parsing Errors
Pre-validated AST nodes eliminate all syntax errors â mismatched parens, bad tokens, encoding issues.
Faster Compilation
Skip lexing and parsing entirely. The compiler loads a pre-built AST directly into memory.
Compact Representation
Binary encoding is 40â60% smaller than equivalent text, reducing token output for AI models.
Deterministic Output
No whitespace ambiguity, no formatting variation. Identical ASTs produce identical binary files.
File Format
| Property | Value |
|---|---|
| Extension | .axb |
| Byte Order | Little-endian |
| MIME Type | application/x-axon-binary |
| Magic Bytes | AXNB (0x41 0x58 0x4E 0x42) |
Header Layout
The file header is a fixed 32-byte structure at offset 0.
| Offset | Size | Field | Description |
|---|---|---|---|
0x00 | 4 bytes | Magic | AXNB â file identification |
0x04 | 2 bytes | Version | Format version (currently 0x0001) |
0x06 | 2 bytes | Flags | Reserved flags (must be 0) |
0x08 | 4 bytes | Type Table Offset | Byte offset to type table |
0x0C | 4 bytes | Type Table Count | Number of type entries |
0x10 | 4 bytes | String Table Offset | Byte offset to string table |
0x14 | 4 bytes | String Table Size | Total string table bytes |
0x18 | 4 bytes | AST Table Offset | Byte offset to AST node table |
0x1C | 4 bytes | Entry Point | AST node index of main function |
41 58 4E 42 ;; magic: "AXNB" 01 00 ;; version: 1 00 00 ;; flags: 0 20 00 00 00 ;; type_table_offset: 32 0A 00 00 00 ;; type_table_count: 10 84 00 00 00 ;; string_table_offset: 132 40 00 00 00 ;; string_table_size: 64 C4 00 00 00 ;; ast_table_offset: 196 03 00 00 00 ;; entry_point: node #3
Type Encoding
Each type is encoded as a 1-byte opcode followed by type-specific payload.
| Opcode | Type | Payload | Description |
|---|---|---|---|
0x01 | i8 | â | Signed 8-bit integer |
0x02 | i16 | â | Signed 16-bit integer |
0x03 | i32 | â | Signed 32-bit integer |
0x04 | i64 | â | Signed 64-bit integer |
0x05 | u8 | â | Unsigned 8-bit integer |
0x06 | u16 | â | Unsigned 16-bit integer |
0x07 | u32 | â | Unsigned 32-bit integer |
0x08 | u64 | â | Unsigned 64-bit integer |
0x09 | f32 | â | 32-bit float (IEEE 754) |
0x0A | f64 | â | 64-bit float (IEEE 754) |
0x0B | bool | â | Boolean (1 byte) |
0x0C | void | â | Void / unit type |
0x10 | ptr | type_idx (u16) | Pointer to type at index |
0x11 | array | type_idx (u16) + count (u32) | Fixed-size array |
0x12 | struct | name_idx (u16) + field_count (u16) + fields | Named struct type |
0x13 | enum | name_idx (u16) + backing (u8) + variant_count (u16) + variants | Nominal enum type |
0x14 | fnptr | param_count (u16) + param_types + ret_type (u16) | Function pointer type |
Node Encoding
AST nodes are encoded as a 1-byte opcode followed by node-specific payload. All child references are indices into the node table.
| Opcode | Node | Payload | Description |
|---|---|---|---|
0x20 | Module | child_count (u16) + child_indices | Top-level module node |
0x21 | Fn | name_idx + param_count + ret_type + body_idx | Function definition |
0x22 | Extern | name_idx + param_count + ret_type | External function declaration |
0x23 | Let | name_idx + type_idx + init_idx | Variable declaration |
0x24 | Set | name_idx + value_idx | Variable mutation |
0x25 | Const | name_idx + type_idx + value_idx | Constant declaration |
0x28 | Block | child_count (u16) + child_indices | Block expression |
0x29 | If | cond_idx + then_idx + else_idx | Conditional expression |
0x2A | While | cond_idx + body_idx | While loop |
0x2B | Match | scrutinee_idx + arm_count + arms | Pattern match expression |
0x30 | Call | name_idx + arg_count + arg_indices | Direct function call |
0x31 | CallIndirect | target_idx + arg_count + arg_indices | Indirect call via fnptr |
0x32 | Cast | target_type + value_idx | Type cast expression |
0x38 | IntLit | type_idx + value (i64) | Integer literal |
0x39 | FloatLit | type_idx + value (f64) | Float literal |
0x3A | BoolLit | value (u8) | Boolean literal |
0x3B | StringLit | string_idx (u16) | String literal |
0x3C | EnumLit | type_idx + variant_idx | Enum variant literal |
0x3D | VarRef | de_bruijn_idx (u16) | Variable reference |
0x40 | BinOp | op (u8) + lhs_idx + rhs_idx | Binary operation (add, sub, etc.) |
0x41 | StructLit | type_idx + field_count + value_indices | Struct literal constructor |
De Bruijn Indices
In the binary format, variable references use De Bruijn indices instead of string names. A De Bruijn index is a number that counts how many binders (let/fn parameters) to skip to reach the referenced binding. This eliminates the need for name resolution entirely.
;; Text format (with names): (fn add ((x i32) (y i32)) i32 (add x y)) ;; Binary format (with De Bruijn indices): ;; x â index 1 (skip 1 binder to reach x) ;; y â index 0 (nearest binder) ;; Fn[name=0, params=2, ret=i32, ;; body=BinOp[ADD, VarRef[1], VarRef[0]]]
De Bruijn indices make alpha-equivalence trivial â two programs are identical if their binary encodings match, regardless of the variable names used in the original text.
Size Comparison
The binary format achieves significant size reductions compared to text S-expressions.
| Program | Text (.axs) | Binary (.axb) | Reduction |
|---|---|---|---|
| Hello World | 128 bytes | 52 bytes | 59% |
| Fibonacci | 312 bytes | 124 bytes | 60% |
| Enum Match | 486 bytes | 198 bytes | 59% |
| Function Pointers | 624 bytes | 256 bytes | 59% |
| Struct Operations | 892 bytes | 372 bytes | 58% |
Sizes are estimated projections. Actual sizes will be confirmed during Phase 2 implementation.