Unions and Tagged Variants
How to lower unions, tagged unions, and variant types to SBBE IR.
Many languages have types that can hold one of several alternatives: C union, Rust enum, TypeScript discriminated unions, ML variants. This guide shows how a compiler lowers these to SBBE IR.
What we’re lowering
enum Shape {
case circle(radius: Double)
case rect(width: Double, height: Double)
}
func area(s: Shape) -> Double {
match s {
case .circle(let r):
return 3.14159 * r * r
case .rect(let w, let h):
return w * h
}
}
A Shape can be either a circle (one f64 field) or a rect (two f64 fields). The compiler must choose a memory layout that can hold any variant and a way to distinguish which variant is active at runtime. The two standard approaches are untagged unions (the programmer is responsible for tracking the active variant) and tagged unions (a discriminant field identifies the variant automatically).
Untagged unions (C-style)
A C union overlays multiple fields at the same memory address. The size of the union is the size of its largest member. In SBBE, this is trivial: all fields start at offset 0, and the frontend picks the correct ldm/stm type based on which field is being accessed.
Pseudo-code
union Value {
var asInt: Int32 // offset 0, size 4
var asFloat: Float // offset 0, size 4
}
// total size: 4 bytes
Step-by-step lowering
- Layout: Both fields start at offset 0 and are 4 bytes. The union is 4 bytes total.
- Accessing
asInt: Load from offset 0 withldm i32. - Accessing
asFloat: Load from the same offset 0 withldm f32. The raw bits in memory are reinterpreted as a float. - No tag: There is no discriminant. The programmer (or the frontend’s type system) must track which interpretation is valid. Reading
asFloatafter writingasIntis a type-pun and produces the IEEE 754 interpretation of whatever integer bit pattern was stored. - Alternative: The
castinstruction can reinterpret a value already on the stack without going through memory, which avoids the load/store round-trip.
SBBE lowering
Reading the int interpretation:
ld $val // ptr to union
ldm i32 // read as i32
Reading the float interpretation of the same memory:
ld $val // same ptr
ldm f32 // read as f32
Using cast to reinterpret without a memory round-trip:
ldi 0x3F800000 // IEEE 754 for 1.0f
cast i32 f32 // reinterpret bits as f32, now 1.0 is on the stack
Tagged unions (discriminated variants)
Most modern languages use tagged unions where a discriminant field identifies the active variant. The standard lowering is a tag word followed by the payload.
Pseudo-code
enum Shape {
case circle(radius: Double) // tag 0, payload: 8 bytes
case rect(width: Double, height: Double) // tag 1, payload: 16 bytes
}
Step-by-step lowering
- Layout decision: The tag is an
i32at offset 0. The payload starts at offset 8 (not offset 4) becausef64values require 8-byte alignment. This means there are 4 bytes of padding between the tag and the payload. - Total size: The largest variant is
rectwith 16 bytes of payload. Total: 4 (tag) + 4 (padding) + 16 (payload) = 24 bytes. - Constructing a variant: Allocate 24 bytes, write the tag, then write the payload fields at their offsets.
- Dispatching: Load the tag with
ldm i32, then branch. For two variants,eqz+jmp.ifis sufficient. For many variants,jmpt(branch table) is more efficient. - Why put the tag first? Loading the tag is the first thing every dispatch does. Putting it at offset 0 means dispatch is always
ldm i32from the base pointer with no offset computation.
Constructing a Circle
func $make_circle(f64) -> ptr {
var $shape ptr
entry:
ldi 24
salloc
str $shape
// Write tag = 0 (Circle)
ld $shape
ldi 0
stm i32
// Write radius at offset 8
ld $shape
ldi 8
add.s i64
ldl 0 // radius parameter
stm f64
ld $shape
ret
}
Dispatching on the tag
The area function loads the tag and branches to the correct handler:
func $area(ptr) -> f64 {
entry:
ldl 0 // shape ptr
ldm i32 // load tag
// Check if Circle (tag == 0)
eqz i32
jmp.if circle
jmp rect
circle:
ldl 0
ldi 8
add.s i64
ldm f64 // radius
dup
fmul f64 // radius * radius
ldc f64 3.14159265358979
fmul f64 // pi * r^2
ret
rect:
ldl 0
ldi 8
add.s i64
ldm f64 // width
ldl 0
ldi 16
add.s i64
ldm f64 // height
fmul f64 // width * height
ret
}
Option/Maybe types
Pseudo-code
func unwrapOr<T>(opt: Optional<T>, default: T) -> T {
match opt {
case .some(let value): return value
case .none: return default
}
}
Step-by-step lowering
An Option<T> is a tagged union with two variants: None (tag 0, no payload) and Some(T) (tag 1, payload follows). For pointer-sized types, a common optimization is to use a null pointer as None, avoiding the tag entirely:
- Null optimization: If
Tis a pointer type and the language guarantees non-null pointers for valid values, thenptr == 0meansNoneand any nonzero pointer meansSome(ptr). This saves 4+ bytes per optional and avoids the tag load. - Checking:
eqz i32tests for null. If null, jump to the default path.
SBBE lowering
// Option<ptr> — null = None, non-null = Some
func $unwrap_or(ptr, ptr) -> ptr {
entry:
ldl 0 // the option value (ptr)
eqz i32 // is it null?
jmp.if use_default
jmp use_value
use_value:
ldl 0
ret
use_default:
ldl 1 // default value
ret
}
For non-pointer types, use the tag + payload layout described above.