The Sandbox2 design builds on well-known and established technologies, a policy framework, and two processes: the Sandbox Executor and the Sandboxee.
The following sections cover the technologies that build the foundation layer for Sandbox2.
The Linux namespaces are an attempt to provide operating-system-level virtualization. While multiple userspaces run seemingly independently of each other, they share a single kernel instance. Sandbox2 uses the following kinds of namespaces:
- Network (unless explicitly disabled by calling
- Mount (using a custom view of the filesystem tree)
- User (unless explicitly disabled by calling
Read more about Linux namespaces on Wikipedia or on the related man page.
Sandbox2 allows exchanging arbitrary data between the Sandbox Executor and the untrusted Sandboxee. It supports Type-Length-Value (TLV) messages, passing file descriptors, and credential exchange through tokens and handles.
Sandbox2 relies on seccomp-bpf, which is an extension to Secure Computing Mode (seccomp) that allows using Berkeley Packet Filter (BPF) rules to filter syscalls.
seccomp is a Linux kernel facility that restricts a process's system calls to only allow
write. If a process attempts to execute another syscall, it will be terminated. The seccomp-bpf extension allows more flexibility than seccomp. Instead of allowing a fixed set of syscalls, seccomp-bpf runs a BPF program on the syscall data and depending on the program's return value, it can execute the syscall, skip the syscall and return a dummy value, terminate the process, generate a signal, or notify the tracer.
The ptrace (process trace) syscall provides functionality that allows the tracer process to observe and control the execution of the tracee process. The tracer process has full control over the tracee once attached. Read more about ptrace on Wikipedia or on the related man page.
The Sandbox Policy is the most crucial part of a sandbox, as it specifies the actions which the Sandboxee can and cannot execute. There are 2 parts to a sandbox policy:
- Syscall policy
- Namespace setup
Default Syscall Policy
The default policy blocks syscalls that are always dangerous and takes precedence over the user-supplied extended policy.
Extended Syscall Policy
The extended syscall policy can be created using our PolicyBuilder class. This class defines a number of convenience rules (e.g.
AllowOpen) which can be used to improve the readability of your policy.
If you want to further restrict syscalls or require more complex rules, you can specify raw BPF macros with
AddPolicyOnSyscalls. The crc4 example makes use of this mechanism to restrict arguments for the
In general, the tighter the Sandbox Policy, the better because the exploitation of any vulnerability present within the code will be confined by the policy. If you're able to specify exactly which syscalls and arguments are required for the normal operation of the program, then any attacker exploiting a code execution vulnerability is also restricted to the same limits.
A really tight Sandbox Policy could deny all syscalls except reads and writes on standard input and output file descriptors. Inside this sandbox, a program could take input, process it, and return the output. However, if the process would attempt to make any other syscall, it would be terminated due to a policy violation. Hence, if the process is compromised (code execution by a malicious user), it cannot do anything more nefarious than producing bad output (that the executor and others still need to handle correctly).
The PolicyBuilder object is also used to set up a Sandboxee's individual view of the filesystem. Single files (
AddFileAt), whole directories (
AddDirectoryAt), as well as temporary storage (
AddTmpfs) can be mapped into the Sandboxee's environment. Additionally,
AddLibrariesForBinary can be used to automatically map all the libraries needed by the specified dynamically linked executable.
Any Sandbox2 policy can be disabled by specifying one of the following command-line flags. These flags are intended for testing purposes (e.g. while refining the Extended Syscall Policy).
The Sandbox Executor is a process that is not sandboxed itself. It's the ptrace tracer process that attaches to the Sandboxee (ptrace tracee process). The Sandbox Executor also sets up and runs a Monitor instance which tracks the Sandboxee and provides status information.
Sandbox2 allows three execution modes: Stand-alone, Sandbox2 Forkserver, and Custom Forkserver. If you use a forkserver, the Sandboxee is created as a child process of the Sandbox Executor. These modes are explained in detail here.
The Sandboxee is the process which runs in the restricted, sandboxed environment which was defined by the Sandbox Policy. The Sandbox Executor sends the policy to the Sandboxee via IPC. The Sandboxee then applies the policy. Any violation of the policy will result in the termination of the process, unless configured otherwise (see Sandbox Policy).