Overview
What SBBE is, why it's stack-based, and what it can do.
What is SBBE?
SBBE is a stack-based backend designed for embedding within language compilers, toolchains, or game engines. Its IR format mirrors much of WebAssembly’s own stack-based design, making it extremely easy to target from a frontend. The goal is a portable backend that achieves 70-90% of the performance of LLVM (for our native backends) with a fraction of the complexity and development time.
Features
- Multiple value types: integers, floats, and vectors
- Parser and printer for the
.sbbetext format - SIMD types and instructions
- Easy integration with existing toolchains and compilers (C99)
- Extremely fast code generation (compact, cache-friendly IR)
- Virtual machine for executing SBBE code directly
- Debugging support and integration with GDB / LLDB
- Lowering to other portable targets (C99, WebAssembly, LLVM v21)
- Optimization passes for performance and code size
- Cross-platform and architecture agnostic: x86, ARM, and RISC-V natively
- GPU targets through SPIR-V and other backends (instruction subset)
Supported backends
- Virtual machine interpreter
- Portable C99 output with
#linedirectives for source-level debugging - LLVM IR (version 21)
- WebAssembly
- x86_64
- ARM64
- RISC-V 64
- SPIR-V (GPU)
We support multiple native backends with moderate performance and fast code generation speed. These backends are designed to allow SBBE to be used directly in compilers and toolchains without relying on external toolchains and dependencies.
When top performance is required, SBBE can be used as a portable IR that lowers to LLVM IR, allowing you to take advantage of LLVM’s powerful optimization and code generation capabilities for a wide range of targets. In addition to LLVM, you can also lower to C99 and WebAssembly.
Finally, SBBE can be used as a portable IR for GPU programming by lowering to SPIR-V or other GPU-specific backends.
Why stack-based over SSA?
Most compiler backends (LLVM, Cranelift, GCC) use Static Single Assignment (SSA) as their intermediate representation. SSA names every intermediate value explicitly and uses phi nodes to merge values at control flow join points. This is powerful for analysis but introduces significant complexity for both the frontend and the backend.
A frontend targeting SBBE never has to name temporaries, manage register-like virtual variables, or insert phi nodes. Code generation follows evaluation order directly.
Compact representation
Each instruction is 12 bytes: an 8-bit opcode, a 24-bit argument, and 64 bits of source location. No operand lists, no variable-length phi nodes, no named virtual registers. This makes the IR cache-friendly and fast to iterate.
Implicit SSA
A stack-based IR is not a step backward from SSA. Every push creates a new unique value. Every pop consumes it exactly once. There is no mutation of stack slots; values flow forward through the program in strict definition-use order. This is single assignment by construction.
Where explicit SSA becomes necessary (optimization passes like global value numbering, register allocation, or loop-invariant code motion), SBBE derives it internally from the stack.
Optimizations
SBBE performs optimization passes on its stack-based IR before lowering to native code. Each
pass can be enabled individually via the compiler configuration or through optimization
profiles (-O1, -O2, -O3).
- Evaluates operations on known constants at compile time (constant folding)
- Removes unreachable blocks and dead instructions (dead code elimination)
- Matches common patterns and simplifies short instruction sequences and idioms (peephole optimization)
- Control-flow simplification — jump threading, branch-on-constant, redundant jump removal
- Eliminates syntactically identical redundant computations within a basic block (local common sub-expression elimination)
- Inlines small or hot functions based on heuristics (call count, instruction count, etc.)
- Inlines functions annotated with
inlineregardless of heuristics - Replaces expensive operations with cheaper equivalents (strength reduction)
- Eliminates redundant local variable loads and stores (copy propagation)
- Moves instructions that compute the same value on every loop iteration outside the loop (loop-invariant code motion)
- Improves recursive function performance by transforming eligible calls into jumps (tail-call optimization)
- Assigns value numbers to computations so that expressions producing the same result share a number, enabling redundancy elimination even when the expressions aren’t syntactically identical (global value numbering)