Many languages have types that can hold one of several alternatives: C union, Rust enum, TypeScript discriminated unions, ML variants. This guide shows how a compiler lowers these to SBBE IR.

What we’re lowering

enum Shape {
    case circle(radius: Double)
    case rect(width: Double, height: Double)
}

func area(s: Shape) -> Double {
    match s {
    case .circle(let r):
        return 3.14159 * r * r
    case .rect(let w, let h):
        return w * h
    }
}

A Shape can be either a circle (one f64 field) or a rect (two f64 fields). The compiler must choose a memory layout that can hold any variant and a way to distinguish which variant is active at runtime. The two standard approaches are untagged unions (the programmer is responsible for tracking the active variant) and tagged unions (a discriminant field identifies the variant automatically).

Untagged unions (C-style)

A C union overlays multiple fields at the same memory address. The size of the union is the size of its largest member. In SBBE, this is trivial: all fields start at offset 0, and the frontend picks the correct ldm/stm type based on which field is being accessed.

Pseudo-code

union Value {
    var asInt: Int32    // offset 0, size 4
    var asFloat: Float  // offset 0, size 4
}
// total size: 4 bytes

Step-by-step lowering

  1. Layout: Both fields start at offset 0 and are 4 bytes. The union is 4 bytes total.
  2. Accessing asInt: Load from offset 0 with ldm i32.
  3. Accessing asFloat: Load from the same offset 0 with ldm f32. The raw bits in memory are reinterpreted as a float.
  4. No tag: There is no discriminant. The programmer (or the frontend’s type system) must track which interpretation is valid. Reading asFloat after writing asInt is a type-pun and produces the IEEE 754 interpretation of whatever integer bit pattern was stored.
  5. Alternative: The cast instruction can reinterpret a value already on the stack without going through memory, which avoids the load/store round-trip.

SBBE lowering

Reading the int interpretation:

ld $val            // ptr to union
ldm i32            // read as i32

Reading the float interpretation of the same memory:

ld $val            // same ptr
ldm f32            // read as f32

Using cast to reinterpret without a memory round-trip:

ldi 0x3F800000     // IEEE 754 for 1.0f
cast i32 f32       // reinterpret bits as f32, now 1.0 is on the stack

Tagged unions (discriminated variants)

Most modern languages use tagged unions where a discriminant field identifies the active variant. The standard lowering is a tag word followed by the payload.

Pseudo-code

enum Shape {
    case circle(radius: Double)        // tag 0, payload: 8 bytes
    case rect(width: Double, height: Double)  // tag 1, payload: 16 bytes
}

Step-by-step lowering

  1. Layout decision: The tag is an i32 at offset 0. The payload starts at offset 8 (not offset 4) because f64 values require 8-byte alignment. This means there are 4 bytes of padding between the tag and the payload.
  2. Total size: The largest variant is rect with 16 bytes of payload. Total: 4 (tag) + 4 (padding) + 16 (payload) = 24 bytes.
  3. Constructing a variant: Allocate 24 bytes, write the tag, then write the payload fields at their offsets.
  4. Dispatching: Load the tag with ldm i32, then branch. For two variants, eqz + jmp.if is sufficient. For many variants, jmpt (branch table) is more efficient.
  5. Why put the tag first? Loading the tag is the first thing every dispatch does. Putting it at offset 0 means dispatch is always ldm i32 from the base pointer with no offset computation.

Constructing a Circle

func $make_circle(f64) -> ptr {
    var $shape ptr

entry:
    ldi 24
    salloc
    str $shape

    // Write tag = 0 (Circle)
    ld $shape
    ldi 0
    stm i32

    // Write radius at offset 8
    ld $shape
    ldi 8
    add.s i64
    ldl 0              // radius parameter
    stm f64

    ld $shape
    ret
}

Dispatching on the tag

The area function loads the tag and branches to the correct handler:

func $area(ptr) -> f64 {
entry:
    ldl 0              // shape ptr
    ldm i32            // load tag

    // Check if Circle (tag == 0)
    eqz i32
    jmp.if circle
    jmp rect

circle:
    ldl 0
    ldi 8
    add.s i64
    ldm f64            // radius
    dup
    fmul f64           // radius * radius
    ldc f64 3.14159265358979
    fmul f64           // pi * r^2
    ret

rect:
    ldl 0
    ldi 8
    add.s i64
    ldm f64            // width
    ldl 0
    ldi 16
    add.s i64
    ldm f64            // height
    fmul f64           // width * height
    ret
}

Option/Maybe types

Pseudo-code

func unwrapOr<T>(opt: Optional<T>, default: T) -> T {
    match opt {
    case .some(let value): return value
    case .none:            return default
    }
}

Step-by-step lowering

An Option<T> is a tagged union with two variants: None (tag 0, no payload) and Some(T) (tag 1, payload follows). For pointer-sized types, a common optimization is to use a null pointer as None, avoiding the tag entirely:

  1. Null optimization: If T is a pointer type and the language guarantees non-null pointers for valid values, then ptr == 0 means None and any nonzero pointer means Some(ptr). This saves 4+ bytes per optional and avoids the tag load.
  2. Checking: eqz i32 tests for null. If null, jump to the default path.

SBBE lowering

// Option<ptr> — null = None, non-null = Some
func $unwrap_or(ptr, ptr) -> ptr {
entry:
    ldl 0              // the option value (ptr)
    eqz i32            // is it null?
    jmp.if use_default
    jmp use_value

use_value:
    ldl 0
    ret

use_default:
    ldl 1              // default value
    ret
}

For non-pointer types, use the tag + payload layout described above.