Isolation Model#

Every submission to Rustbox executes in a fresh, disposable sandbox. The sandbox is constructed from 8 independent kernel-level isolation layers. Escape requires simultaneously defeating all eight.

This is not container-based isolation. Rustbox composes Linux kernel primitives directly, with the construction order enforced at compile time by the Rust type system. Skip a step and the code does not compile.

8NO_NEW_PRIVScan't regain privileges
7Credential DropUID 60000+, not root
6Capabilities Zeroedno privilege escalation
5Seccomp-BPF51 syscalls blocked
4Cgroups v2memory + CPU + PIDs
3Network NSno sockets, no DNS
2Mount NS + Chrootisolated filesystem
1PID NScan't see host
[ your code ]
Escape requires defeating all eight simultaneously

The 8 layers#

LayerKernel primitiveWhat it prevents
1. PID namespaceCLONE_NEWPIDSeeing or signalling host processes
2. Mount namespaceCLONE_NEWNS + chrootAccessing host filesystem
3. Network namespaceCLONE_NEWNETNetwork access (no sockets, no DNS in Judge mode)
4. Cgroups v2cgroup controllersMemory bombs, fork bombs, CPU hogging
5. Seccomp-BPFBPF syscall filterDangerous syscalls: ptrace, mount, bpf, io_uring
6. CapabilitiesBounding + ambient setsAll 5 capability sets zeroed - no privilege escalation
7. Credential dropsetresuid/setresgidRunning as root - drops to unprivileged UID
8. NO_NEW_PRIVSprctl(PR_SET_NO_NEW_PRIVS)Regaining privileges via setuid binaries

Namespaces (layers 1-3)#

Namespaces give the sandbox its own view of the world. The sandboxed process sees PID 1 as itself, an empty network stack, and a minimal filesystem.

We use PID, IPC, UTS, mount, and network namespaces. We deliberately do not use user namespaces - they have a long history of privilege escalation CVEs and are unnecessary when the platform has the required capabilities to set up isolation directly.

Cgroups v2 (layer 4)#

The resource enforcer. Cgroups are the only Linux mechanism that can kill a process for exceeding resident memory usage (as opposed to virtual memory). Every sandbox gets its own cgroup with hard limits on memory, PIDs, and CPU time.

When a submission exceeds its memory limit, the kernel's OOM killer terminates it and Rustbox records the event as an MLE verdict backed by the cgroup OOM notification - not an exit code guess.

Seccomp-BPF (layer 5)#

A BPF program loaded into the kernel intercepts every syscall. Rustbox blocks 51 syscalls across categories including io_uring (kernel LPE history), ptrace (cross-process inspection), bpf (eBPF loading), mount/pivot_root (filesystem manipulation), and namespace escape primitives.

Three response modes are used: ENOSYS for probe syscalls that runtimes handle gracefully, EPERM for diagnostic syscalls, and KILL for exploit-class syscalls.

See Seccomp Filtering for the full rule table.

Privilege stripping (layers 6-8)#

After the sandbox environment is constructed, all privileges are stripped in three ordered steps:

  1. Drop bounding + ambient capabilities - controls what the process can gain
  2. Drop to unprivileged UID/GID - exits root
  3. Zero remaining capability sets + set NO_NEW_PRIVS - makes privilege loss permanent

The ordering is enforced at compile time. Calling the execution function before completing all privilege-stripping steps is a type error, not a runtime check.

Fresh sandbox per execution#

Every submission gets a new sandbox built from scratch. There is no reuse, no warm pooling of execution environments, no shared state between submissions. When execution completes, the sandbox is torn down and all resources are reclaimed.

Adversarial testing#

The platform is tested against 147 adversarial scenarios across all 8 supported languages:

  • Fork bombs and memory bombs
  • Chroot escape attempts
  • Seccomp bypass attempts
  • Privilege escalation via setuid, capabilities, and ptrace
  • Network escape attempts
  • Symlink and hardlink attacks
  • Signal-based attacks

Result: 0 escapes across 22 attack vectors x 8 languages.

Why trust Rustbox cloud#

  • The isolation model is not a wrapper around containers. It uses the same kernel primitives, composed in a specific order with compile-time enforcement.
  • Every verdict is backed by kernel evidence, not heuristics. When we say "Memory Limit Exceeded," it is because the cgroup OOM killer fired, not because the exit code looked suspicious.
  • The adversarial test suite runs on every release. Regressions in isolation are caught before deployment.
  • Seccomp rules are tuned per-action, not a blanket allow/deny. Runtime probes (io_uring, process_vm_readv) get graceful fallbacks instead of process termination.