Isolation Model#

Every submission to Rustbox executes in a fresh, disposable sandbox. The sandbox is constructed from 8 independent kernel-level isolation layers. Escape requires simultaneously defeating all eight.

This is not container-based isolation. Rustbox composes Linux kernel primitives directly, with the construction order enforced at compile time by the Rust type system. Skip a step and the code does not compile.

8NO_NEW_PRIVScan't regain privileges

7Credential DropUID 60000+, not root

6Capabilities Zeroedno privilege escalation

5Seccomp-BPF51 syscalls blocked

4Cgroups v2memory + CPU + PIDs

3Network NSno sockets, no DNS

2Mount NS + Chrootisolated filesystem

1PID NScan't see host

[ your code ]

Escape requires defeating all eight simultaneously

The 8 layers#

Layer	Kernel primitive	What it prevents
1. PID namespace	`CLONE_NEWPID`	Seeing or signalling host processes
2. Mount namespace	`CLONE_NEWNS` + chroot	Accessing host filesystem
3. Network namespace	`CLONE_NEWNET`	Network access (no sockets, no DNS in Judge mode)
4. Cgroups v2	cgroup controllers	Memory bombs, fork bombs, CPU hogging
5. Seccomp-BPF	BPF syscall filter	Dangerous syscalls: ptrace, mount, bpf, io_uring
6. Capabilities	Bounding + ambient sets	All 5 capability sets zeroed - no privilege escalation
7. Credential drop	`setresuid`/`setresgid`	Running as root - drops to unprivileged UID
8. NO_NEW_PRIVS	`prctl(PR_SET_NO_NEW_PRIVS)`	Regaining privileges via setuid binaries

Namespaces (layers 1-3)#

Namespaces give the sandbox its own view of the world. The sandboxed process sees PID 1 as itself, an empty network stack, and a minimal filesystem.

We use PID, IPC, UTS, mount, and network namespaces. We deliberately do not use user namespaces - they have a long history of privilege escalation CVEs and are unnecessary when the platform has the required capabilities to set up isolation directly.

Cgroups v2 (layer 4)#

The resource enforcer. Cgroups are the only Linux mechanism that can kill a process for exceeding resident memory usage (as opposed to virtual memory). Every sandbox gets its own cgroup with hard limits on memory, PIDs, and CPU time.

When a submission exceeds its memory limit, the kernel's OOM killer terminates it and Rustbox records the event as an MLE verdict backed by the cgroup OOM notification - not an exit code guess.

Seccomp-BPF (layer 5)#

A BPF program loaded into the kernel intercepts every syscall. Rustbox blocks 51 syscalls across categories including io_uring (kernel LPE history), ptrace (cross-process inspection), bpf (eBPF loading), mount/pivot_root (filesystem manipulation), and namespace escape primitives.

Three response modes are used: ENOSYS for probe syscalls that runtimes handle gracefully, EPERM for diagnostic syscalls, and KILL for exploit-class syscalls.

See Seccomp Filtering for the full rule table.

Privilege stripping (layers 6-8)#

After the sandbox environment is constructed, all privileges are stripped in three ordered steps:

Drop bounding + ambient capabilities - controls what the process can gain
Drop to unprivileged UID/GID - exits root
Zero remaining capability sets + set NO_NEW_PRIVS - makes privilege loss permanent

The ordering is enforced at compile time. Calling the execution function before completing all privilege-stripping steps is a type error, not a runtime check.

Fresh sandbox per execution#

Every submission gets a new sandbox built from scratch. There is no reuse, no warm pooling of execution environments, no shared state between submissions. When execution completes, the sandbox is torn down and all resources are reclaimed.

Adversarial testing#

The platform is tested against 147 adversarial scenarios across all 8 supported languages:

Fork bombs and memory bombs
Chroot escape attempts
Seccomp bypass attempts
Privilege escalation via setuid, capabilities, and ptrace
Network escape attempts
Symlink and hardlink attacks
Signal-based attacks

Result: 0 escapes across 22 attack vectors x 8 languages.

Why trust Rustbox cloud#

The isolation model is not a wrapper around containers. It uses the same kernel primitives, composed in a specific order with compile-time enforcement.
Every verdict is backed by kernel evidence, not heuristics. When we say "Memory Limit Exceeded," it is because the cgroup OOM killer fired, not because the exit code looked suspicious.
The adversarial test suite runs on every release. Regressions in isolation are caught before deployment.
Seccomp rules are tuned per-action, not a blanket allow/deny. Runtime probes (io_uring, process_vm_readv) get graceful fallbacks instead of process termination.