IL & text format
This page covers the .sbbe text format including its lexical conventions, literals, keywords, and source mapping.
A .sbbe file is a single translation unit: a flat list of top-level forms (source
directives, globals, externs, and functions) in declaration order.
This page covers the surface syntax only. For instruction semantics see the Instruction Set page; for type names see the Types page.
Minimal example
file "test.c"
func $add(i32, i32) -> i32 {
entry:
ldl 0
ldl 1
add.s i32
ret
}
Whitespace and comments
Spaces, tabs, carriage returns, and newlines are insignificant between tokens. Line comments
begin with // and extend to end of line. There are no block comments.
ldi 42 // a trailing comment is fine
// a whole-line comment too
Identifiers
Names always appear with a leading $ sigil: $x, $add, $_counter. The sigil is part
of the reference syntax, not the stored name. An identifier body is one or more characters
drawn from [A-Za-z0-9_.]. There is no maximum length in the grammar but the parser
currently truncates at 63 characters.
Locals, globals, functions, and externs share the same $name syntax; scope resolves at the
use site. Inside a function body, ld $x checks locals first, then globals. This means locals (and will) shadow a global of the same name.
Block labels
Labels are bare words (no $) followed by :, appearing at the start of a line:
entry:
loop_header:
done:
Label bodies use the same character class as identifiers. Labels are function-scoped and resolved after the body is parsed, so forward references are allowed.
Integer literals
Integers are decimal or hexadecimal. Hex uses a 0x or 0X prefix. A leading - negates.
There are no digit separators, no binary or octal prefixes, and no unsigned suffix.
ldi 42
ldi -1
ldc i32 0xDEADBEEF
ldc i64 0x7FFFFFFFFFFFFFFF
An integer token followed immediately by ., e, or E is rejected as an integer and
re-parsed as a float.
Float literals
Float literals are anything accepted by strtod: a decimal significand with optional
fraction and optional exponent. The parser does not distinguish between f32 and f64
lexically. The instruction’s type operand selects the target width and the value is
rounded at encode time.
ldc f64 3.14
ldc f32 -0.5
ldc f64 1e-9
ldc f64 6.022e23
String literals
Strings appear in two places: the file directive and as initializers for ptr constants.
They are enclosed in double quotes. The recognized escapes are \n, \t, \r, \\,
\", and \0; any other \x passes x through literally. String constants are
automatically null-terminated when stored as data.
ldc ptr "hello\n"
Byte arrays
Byte arrays are an alternative ptr initializer: a comma-separated list of integer literals
in square brackets. Each element is truncated to 8 bits. No null terminator is appended.
ldc ptr [0x48, 0x69, 0x00]
Top-level forms
The top level accepts exactly these forms, each introduced by a keyword:
| Keyword | Purpose |
|---|---|
file "path" | Sets the source file tag for subsequent declarations |
var $name T = v | Mutable global with initializer |
const $name T = v | Immutable global with initializer |
extern var $name T | Imported mutable global, no initializer |
extern const $name T | Imported immutable global, no initializer |
extern func ... | Imported function declaration, no body |
func $name(...) ... | Function declaration with body |
data $idx T = v | Writes a typed literal into constant-pool slot $idx |
Order matters only for the file directive (which is sticky for everything following it).
Globals and functions may reference each other in any order; the parser resolves cross-
references after the full unit is parsed.
The file directive
Multiple source files can be interleaved in a single translation unit. A file directive
tags every subsequent declaration with the given path for diagnostics and debugging:
file "src/math.c"
func $add(i32, i32) -> i32 { ... }
file "src/io.c"
func $puts(ptr) -> i32 { ... }
The directive has no effect on linkage or symbol visibility (it’s purely metadata).
Global Containers
Globals are named, typed storage that lives for the lifetime of the program. A var global
is mutable and may be written from any function; a const global is immutable after its
initializer runs. Both are declared with a single type and a literal initializer joined by
=. The initializer is a literal only: no instructions, no references to other globals.
For ptr globals the literal may be a string or byte array, which the assembler lowers
into a data segment and replaces with the resulting offset.
var $counter i32 = 0 // mutable global, zero-initialized
const $max i32 = 100 // immutable global
const $banner ptr = "hello\n" // ptr initialized from a string literal
Functions
A function declaration pairs a typed signature with a body of local declarations and
labeled blocks of instructions. The signature names the function with $name, lists its
parameter types in order, and optionally declares a return type with -> T. The body is
delimited by { ... } and contains var declarations and labeled blocks of instructions,
which may be freely interleaved. Execution begins at the first block in source order
regardless of its label, though entry is the conventional choice.
func $name(param-types) -> return-type {
var $x i32 // named local
var i32 // unnamed local, referenced by index
entry:
ret
}
Parameters are a comma-separated list of types (no names in the signature). The
-> return-type clause is omitted when the function returns nothing. local is accepted
as a synonym for var inside function bodies.
Locals are indexed after parameters: with two parameters and two declared locals, the
indices run 0 (param 0), 1 (param 1), 2 (first local), 3 (second local). Named
locals are addressable both by $name and by index; unnamed locals only by index.
Only extern functions may omit the body. Non-extern functions must have a body, even if it’s just an empty block with a ret instruction.
Externs
extern var $errno i32 // imported from another unit / runtime
extern func $puts(ptr) -> i32
extern func $exit(i32)
The initializer is a literal only — no instructions, no references to other globals. For
ptr globals, the initializer may be a string literal or byte array as described above.
Source mapping
Any instruction, global, or function signature may carry an @ line:column suffix. The
parser and printer round-trip these positions unchanged:
var $x i32 = 0 @ 3:1
func $add(i32, i32) -> i32 @ 10:1 {
entry:
ldl 0 @ 11:3
ldl 1 @ 11:12
add.s i32 @ 11:7
ret @ 12:3
}
Both numbers are 1-based. A missing suffix means “no source location” and encodes as zeros.
The file directive supplies the file component; the @ suffix supplies the line and
column within that file.
Instruction syntax
An instruction occupies one line. It begins with a mnemonic, followed by operands separated
by whitespace, optionally ending with an @ line:column suffix. Trailing text on the line is
ignored after the first // comment.
Operands are drawn from a small vocabulary:
- Type name —
i8,i16,i32,i64,f32,f64,ptr,v128,void - Lane descriptor —
i8x16,i16x8,i32x4,i64x2,f32x4,f64x2 - Integer or float literal — as described above
- Identifier —
$namefor a function, local, or global reference - Bare word — a block label, memory ordering, or flag keyword
key=valuepair — for flags likealign=4
Operand order, count, and types are determined by the mnemonic. The parser is strict: extra or missing operands are a parse error.
Mnemonic conventions
Mnemonics follow a few consistent patterns:
- Typed instructions take a trailing type operand:
add.s i32,fmul f64,ext.u i8 i32. - Signed / unsigned variants are spelled with a
.sor.usuffix on the mnemonic:div.s,shr.u,lt.s. - Float operations are prefixed with
f:fadd,fsqrt,fle. - Vector operations are prefixed with
v:vadd,vshuf,vldm. - Atomic operations are prefixed with
a:ald,ast,armw.add,acas. - Memory-access width narrower than the pushed/popped type is spelled on the mnemonic,
not as a separate operand:
ldm.s8 i32,stm16.
Alignment hint
Memory instructions accept an optional align=N flag, where N is a power of two:
ldm i32
ldm i32 align=4
stm i64 align=8
vldm align=16
When omitted, the backend may assume natural alignment for the access width.
Memory orderings
Atomic instructions end with a memory-ordering keyword (bare word, no punctuation):
ald i32 seq_cst
ast i32 release
armw.add i32 acq_rel
fence relaxed
The accepted orderings are relaxed, acquire, release, acq_rel, and seq_cst.
Control-flow targets
jmp and jmp.if take a block label (bare word, no $):
jmp loop_header
jmp.if done
jmpt takes a branch-table index. Branch-table data is not yet exposed in the text format.
Constant loading
ldc is the general typed-constant form:
ldc i32 1000000
ldc f64 3.14
ldc ptr "hello\n"
ldc ptr [0x00, 0x01, 0x02]
The assembler automatically rewrites ldc to ldi whenever the value is an integer that
fits in a signed 24-bit immediate, so ldc i32 42 encodes as ldi 42. Use ldi
directly only when the 24-bit range is guaranteed; otherwise prefer ldc and let the
assembler choose.
Variable access
ld $x // load local or global by name
str $x // store local or global by name
tee $x // store-without-pop (locals only)
ldl 0 // load local by numeric index
strl 2 // store local by numeric index
ld and str resolve $name against locals first, then globals. tee is restricted to
locals and using it with a global will produce a parse error.
Constant pool and data declarations
The data keyword writes directly into a slot of the translation unit’s constant pool:
data $0 i32 = 42
data $1 f64 = 3.14
data $2 ptr = "hello"
The $N in a data declaration is a decimal index, not a symbolic name. Slots referenced
but not declared default to zero-initialized constants of the appropriate type. Most
hand-written IR never needs data and it primarily exists to let the printer round-trip constant
tables produced by other tools.