Sandbox2 Explained

The Sandbox2 design builds on well-known and established technologies, a policy framework, and two processes: the Sandbox Executor and the Sandboxee.

Technologies Involved

The following sections cover the technologies that build the foundation layer for Sandbox2.

Linux Namespaces

The Linux namespaces are an attempt to provide operating-system-level virtualization. While multiple userspaces run seemingly independently of each other, they share a single kernel instance. Sandbox2 uses the following kinds of namespaces:

  • IPC
  • Network (unless explicitly disabled by calling PolicyBuilder::AllowUnrestrictedNetworking())
  • Mount (using a custom view of the filesystem tree)
  • PID
  • User (unless explicitly disabled by calling PolicyBuilder::AllowUnsafeKeepCapabilities())
  • UTS

Read more about Linux namespaces on Wikipedia or on the related man page.

IPC

Sandbox2 allows exchanging arbitrary data between the Sandbox Executor and the untrusted Sandboxee. It supports Type-Length-Value (TLV) messages, passing file descriptors, and credential exchange through tokens and handles.

Seccomp-BPF

Sandbox2 relies on seccomp-bpf, which is an extension to Secure Computing Mode (seccomp) that allows using Berkeley Packet Filter (BPF) rules to filter syscalls.

seccomp is a Linux kernel facility that restricts a process's system calls to only allow exit, sigreturn, read, and write. If a process attempts to execute another syscall, it will be terminated. The seccomp-bpf extension allows more flexibility than seccomp. Instead of allowing a fixed set of syscalls, seccomp-bpf runs a BPF program on the syscall data and depending on the program's return value, it can execute the syscall, skip the syscall and return a dummy value, terminate the process, generate a signal, or notify the tracer.

Ptrace

The ptrace (process trace) syscall provides functionality that allows the tracer process to observe and control the execution of the tracee process. The tracer process has full control over the tracee once attached. Read more about ptrace on Wikipedia or on the related man page.

Sandbox Policy

The Sandbox Policy is the most crucial part of a sandbox, as it specifies the actions which the Sandboxee can and cannot execute. There are 2 parts to a sandbox policy:

  • Syscall policy
  • Namespace setup

Default Syscall Policy

The default policy blocks syscalls that are always dangerous and takes precedence over the user-supplied extended policy.

Extended Syscall Policy

The extended syscall policy can be created using our PolicyBuilder class. This class defines a number of convenience rules (e.g. AllowStaticStartup, AllowDynamicStartup, AllowOpen) which can be used to improve the readability of your policy.

If you want to further restrict syscalls or require more complex rules, you can specify raw BPF macros with AddPolicyOnSyscall and AddPolicyOnSyscalls. The crc4 example makes use of this mechanism to restrict arguments for the read, write, and close syscalls.

In general, the tighter the Sandbox Policy, the better because the exploitation of any vulnerability present within the code will be confined by the policy. If you're able to specify exactly which syscalls and arguments are required for the normal operation of the program, then any attacker exploiting a code execution vulnerability is also restricted to the same limits.

A really tight Sandbox Policy could deny all syscalls except reads and writes on standard input and output file descriptors. Inside this sandbox, a program could take input, process it, and return the output. However, if the process would attempt to make any other syscall, it would be terminated due to a policy violation. Hence, if the process is compromised (code execution by a malicious user), it cannot do anything more nefarious than producing bad output (that the executor and others still need to handle correctly).

Namespace Setup

The PolicyBuilder object is also used to set up a Sandboxee's individual view of the filesystem. Single files (AddFile / AddFileAt), whole directories (AddDirectory / AddDirectoryAt), as well as temporary storage (AddTmpfs) can be mapped into the Sandboxee's environment. Additionally, AddLibrariesForBinary can be used to automatically map all the libraries needed by the specified dynamically linked executable.

Command-Line Flags

Any Sandbox2 policy can be disabled by specifying one of the following command-line flags. These flags are intended for testing purposes (e.g. while refining the Extended Syscall Policy).

  • --sandbox2_danger_danger_permit_all
  • --sandbox2_danger_danger_permit_all_and_log

Sandbox Executor

The Sandbox Executor is a process that is not sandboxed itself. It's the ptrace tracer process that attaches to the Sandboxee (ptrace tracee process). The Sandbox Executor also sets up and runs a Monitor instance which tracks the Sandboxee and provides status information.

Sandbox2 allows three execution modes: Stand-alone, Sandbox2 Forkserver, and Custom Forkserver. If you use a forkserver, the Sandboxee is created as a child process of the Sandbox Executor. These modes are explained in detail here.

Sandboxee

The Sandboxee is the process which runs in the restricted, sandboxed environment which was defined by the Sandbox Policy. The Sandbox Executor sends the policy to the Sandboxee via IPC. The Sandboxee then applies the policy. Any violation of the policy will result in the termination of the process, unless configured otherwise (see Sandbox Policy).