Binary AST Format

The .axb binary format encodes AXON programs as pre-parsed AST nodes, eliminating lexing and parsing entirely for maximum compilation speed.

â„šī¸ Implementation Status

This format is planned for Phase 2 and is not yet implemented. The specification below describes the target design.

đŸŽ¯ Why Binary?

The binary AST format provides significant advantages over text-based S-expressions for AI-to-compiler communication.

đŸšĢ

No Parsing Errors

Pre-validated AST nodes eliminate all syntax errors — mismatched parens, bad tokens, encoding issues.

⚡

Faster Compilation

Skip lexing and parsing entirely. The compiler loads a pre-built AST directly into memory.

đŸ“Ļ

Compact Representation

Binary encoding is 40–60% smaller than equivalent text, reducing token output for AI models.

🔒

Deterministic Output

No whitespace ambiguity, no formatting variation. Identical ASTs produce identical binary files.

📄 File Format

PropertyValue
Extension.axb
Byte OrderLittle-endian
MIME Typeapplication/x-axon-binary
Magic BytesAXNB (0x41 0x58 0x4E 0x42)

📐 Header Layout

The file header is a fixed 32-byte structure at offset 0.

OffsetSizeFieldDescription
0x004 bytesMagicAXNB — file identification
0x042 bytesVersionFormat version (currently 0x0001)
0x062 bytesFlagsReserved flags (must be 0)
0x084 bytesType Table OffsetByte offset to type table
0x0C4 bytesType Table CountNumber of type entries
0x104 bytesString Table OffsetByte offset to string table
0x144 bytesString Table SizeTotal string table bytes
0x184 bytesAST Table OffsetByte offset to AST node table
0x1C4 bytesEntry PointAST node index of main function
header (hex dump)
41 58 4E 42  ;; magic: "AXNB"
01 00        ;; version: 1
00 00        ;; flags: 0
20 00 00 00  ;; type_table_offset: 32
0A 00 00 00  ;; type_table_count: 10
84 00 00 00  ;; string_table_offset: 132
40 00 00 00  ;; string_table_size: 64
C4 00 00 00  ;; ast_table_offset: 196
03 00 00 00  ;; entry_point: node #3

đŸ—ī¸ Type Encoding

Each type is encoded as a 1-byte opcode followed by type-specific payload.

OpcodeTypePayloadDescription
0x01i8—Signed 8-bit integer
0x02i16—Signed 16-bit integer
0x03i32—Signed 32-bit integer
0x04i64—Signed 64-bit integer
0x05u8—Unsigned 8-bit integer
0x06u16—Unsigned 16-bit integer
0x07u32—Unsigned 32-bit integer
0x08u64—Unsigned 64-bit integer
0x09f32—32-bit float (IEEE 754)
0x0Af64—64-bit float (IEEE 754)
0x0Bbool—Boolean (1 byte)
0x0Cvoid—Void / unit type
0x10ptrtype_idx (u16)Pointer to type at index
0x11arraytype_idx (u16) + count (u32)Fixed-size array
0x12structname_idx (u16) + field_count (u16) + fieldsNamed struct type
0x13enumname_idx (u16) + backing (u8) + variant_count (u16) + variantsNominal enum type
0x14fnptrparam_count (u16) + param_types + ret_type (u16)Function pointer type

đŸŒŗ Node Encoding

AST nodes are encoded as a 1-byte opcode followed by node-specific payload. All child references are indices into the node table.

OpcodeNodePayloadDescription
0x20Modulechild_count (u16) + child_indicesTop-level module node
0x21Fnname_idx + param_count + ret_type + body_idxFunction definition
0x22Externname_idx + param_count + ret_typeExternal function declaration
0x23Letname_idx + type_idx + init_idxVariable declaration
0x24Setname_idx + value_idxVariable mutation
0x25Constname_idx + type_idx + value_idxConstant declaration
0x28Blockchild_count (u16) + child_indicesBlock expression
0x29Ifcond_idx + then_idx + else_idxConditional expression
0x2AWhilecond_idx + body_idxWhile loop
0x2BMatchscrutinee_idx + arm_count + armsPattern match expression
0x30Callname_idx + arg_count + arg_indicesDirect function call
0x31CallIndirecttarget_idx + arg_count + arg_indicesIndirect call via fnptr
0x32Casttarget_type + value_idxType cast expression
0x38IntLittype_idx + value (i64)Integer literal
0x39FloatLittype_idx + value (f64)Float literal
0x3ABoolLitvalue (u8)Boolean literal
0x3BStringLitstring_idx (u16)String literal
0x3CEnumLittype_idx + variant_idxEnum variant literal
0x3DVarRefde_bruijn_idx (u16)Variable reference
0x40BinOpop (u8) + lhs_idx + rhs_idxBinary operation (add, sub, etc.)
0x41StructLittype_idx + field_count + value_indicesStruct literal constructor

đŸ”ĸ De Bruijn Indices

In the binary format, variable references use De Bruijn indices instead of string names. A De Bruijn index is a number that counts how many binders (let/fn parameters) to skip to reach the referenced binding. This eliminates the need for name resolution entirely.

De Bruijn Index Example
;; Text format (with names):
(fn add ((x i32) (y i32)) i32
  (add x y))

;; Binary format (with De Bruijn indices):
;;   x → index 1 (skip 1 binder to reach x)
;;   y → index 0 (nearest binder)
;;   Fn[name=0, params=2, ret=i32,
;;      body=BinOp[ADD, VarRef[1], VarRef[0]]]

De Bruijn indices make alpha-equivalence trivial — two programs are identical if their binary encodings match, regardless of the variable names used in the original text.

📊 Size Comparison

The binary format achieves significant size reductions compared to text S-expressions.

ProgramText (.axs)Binary (.axb)Reduction
Hello World128 bytes52 bytes59%
Fibonacci312 bytes124 bytes60%
Enum Match486 bytes198 bytes59%
Function Pointers624 bytes256 bytes59%
Struct Operations892 bytes372 bytes58%

Sizes are estimated projections. Actual sizes will be confirmed during Phase 2 implementation.