Overview — SBBE

What is SBBE?

SBBE is a stack-based backend designed for embedding within language compilers, toolchains, or game engines. Its IR format mirrors much of WebAssembly’s own stack-based design, making it extremely easy to target from a frontend. The goal is a portable backend that achieves 70-90% of the performance of LLVM (for our native backends) with a fraction of the complexity and development time.

Features

Multiple value types: integers, floats, and vectors
Parser and printer for the .sbbe text format
SIMD types and instructions
Easy integration with existing toolchains and compilers (C99)
Extremely fast code generation (compact, cache-friendly IR)
Virtual machine for executing SBBE code directly
Debugging support and integration with GDB / LLDB
Lowering to other portable targets (C99, WebAssembly, LLVM v21)
Optimization passes for performance and code size
Cross-platform and architecture agnostic: x86, ARM, and RISC-V natively
GPU targets through SPIR-V and other backends (instruction subset)

Supported backends

We support multiple native backends with moderate performance and fast code generation speed. These backends are designed to allow SBBE to be used directly in compilers and toolchains without relying on external toolchains and dependencies.

When top performance is required, SBBE can be used as a portable IR that lowers to LLVM IR, allowing you to take advantage of LLVM’s powerful optimization and code generation capabilities for a wide range of targets. In addition to LLVM, you can also lower to C99 and WebAssembly.

Finally, SBBE can be used as a portable IR for GPU programming by lowering to SPIR-V or other GPU-specific backends.

Why stack-based over SSA?

Most compiler backends (LLVM, Cranelift, GCC) use Static Single Assignment (SSA) as their intermediate representation. SSA names every intermediate value explicitly and uses phi nodes to merge values at control flow join points. This is powerful for analysis but introduces significant complexity for both the frontend and the backend.

A frontend targeting SBBE never has to name temporaries, manage register-like virtual variables, or insert phi nodes. Code generation follows evaluation order directly.

Compact representation

Each instruction is 12 bytes: an 8-bit opcode, a 24-bit argument, and 64 bits of source location. No operand lists, no variable-length phi nodes, no named virtual registers. This makes the IR cache-friendly and fast to iterate.

Implicit SSA

A stack-based IR is not a step backward from SSA. Every push creates a new unique value. Every pop consumes it exactly once. There is no mutation of stack slots; values flow forward through the program in strict definition-use order. This is single assignment by construction.

Where explicit SSA becomes necessary (optimization passes like global value numbering, register allocation, or loop-invariant code motion), SBBE derives it internally from the stack.

Optimizations

SBBE performs optimization passes on its stack-based IR before lowering to native code. Each pass can be enabled individually via the compiler configuration or through optimization profiles (-O1, -O2, -O3).

Evaluates operations on known constants at compile time (constant folding)
Removes unreachable blocks and dead instructions (dead code elimination)
Matches common patterns and simplifies short instruction sequences and idioms (peephole optimization)
Control-flow simplification — jump threading, branch-on-constant, redundant jump removal
Eliminates syntactically identical redundant computations within a basic block (local common sub-expression elimination)
Inlines small or hot functions based on heuristics (call count, instruction count, etc.)
Inlines functions annotated with inline regardless of heuristics
Replaces expensive operations with cheaper equivalents (strength reduction)
Eliminates redundant local variable loads and stores (copy propagation)
Moves instructions that compute the same value on every loop iteration outside the loop (loop-invariant code motion)
Improves recursive function performance by transforming eligible calls into jumps (tail-call optimization)
Assigns value numbers to computations so that expressions producing the same result share a number, enabling redundancy elimination even when the expressions aren’t syntactically identical (global value numbering)