The Sandbox2 design builds on well-known and established technologies, a policy framework, and two processes: the Sandbox Executor and the Sandboxee.
Technologies Involved
The following sections cover the technologies that build the foundation layer for Sandbox2.
Linux Namespaces
The Linux namespaces are an attempt to provide operating-system-level virtualization. While multiple userspaces run seemingly independently of each other, they share a single kernel instance. Sandbox2 uses the following kinds of namespaces:
- IPC
- Network (unless explicitly disabled by calling
PolicyBuilder::AllowUnrestrictedNetworking()
) - Mount (using a custom view of the filesystem tree)
- PID
- User
- UTS
Read more about Linux namespaces on Wikipedia or on the related man page.
IPC
Sandbox2 allows exchanging arbitrary data between the Sandbox Executor and the untrusted Sandboxee. It supports Type-Length-Value (TLV) messages, passing file descriptors, and credential exchange through tokens and handles.
Seccomp-BPF
Sandbox2 relies on seccomp-bpf, which is an extension to Secure Computing Mode (seccomp) that allows using Berkeley Packet Filter (BPF) rules to filter syscalls.
seccomp is a Linux kernel facility that restricts a process's system calls to
only allow exit
, sigreturn
, read
, and write
. If a process attempts to
execute another syscall, it will be terminated. The seccomp-bpf extension allows
more flexibility than seccomp. Instead of allowing a fixed set of syscalls,
seccomp-bpf runs a BPF program on the syscall data and depending on the
program's return value, it can execute the syscall, skip the syscall and return
a dummy value, terminate the process, generate a signal, or notify the tracer.
Ptrace
The ptrace (process trace) syscall provides functionality that allows the tracer process to observe and control the execution of the tracee process. The tracer process has full control over the tracee once attached. Read more about ptrace on Wikipedia or on the related man page.
Sandbox Policy
The Sandbox Policy is the most crucial part of a sandbox, as it specifies the actions which the Sandboxee can and cannot execute. There are 2 parts to a sandbox policy:
- Syscall policy
- Namespace setup
Default Syscall Policy
The default policy blocks syscalls that are always dangerous and takes precedence over the user-supplied extended policy.
Extended Syscall Policy
The extended syscall policy can be created using our
PolicyBuilder
class. This class defines a number of convenience rules (e.g.
AllowStaticStartup
, AllowDynamicStartup
, AllowOpen
) which can be used to
improve the readability of your policy.
If you want to further restrict syscalls or require more complex rules, you can
specify raw BPF macros with AddPolicyOnSyscall
and AddPolicyOnSyscalls
. The
crc4 example makes use of this
mechanism to restrict arguments for the read
, write
, and close
syscalls.
In general, the tighter the Sandbox Policy, the better because the exploitation of any vulnerability present within the code will be confined by the policy. If you're able to specify exactly which syscalls and arguments are required for the normal operation of the program, then any attacker exploiting a code execution vulnerability is also restricted to the same limits.
A really tight Sandbox Policy could deny all syscalls except reads and writes on standard input and output file descriptors. Inside this sandbox, a program could take input, process it, and return the output. However, if the process would attempt to make any other syscall, it would be terminated due to a policy violation. Hence, if the process is compromised (code execution by a malicious user), it cannot do anything more nefarious than producing bad output (that the executor and others still need to handle correctly).
Namespace Setup
The PolicyBuilder object is also used to set up a Sandboxee's individual view of
the filesystem. Single files (AddFile
/ AddFileAt
), whole directories
(AddDirectory
/ AddDirectoryAt
), as well as temporary storage (AddTmpfs
)
can be mapped into the Sandboxee's environment. Additionally,
AddLibrariesForBinary
can be used to automatically map all the libraries
needed by the specified dynamically linked executable.
Command-Line Flags
Any Sandbox2 policy can be disabled by specifying one of the following command-line flags. These flags are intended for testing purposes (e.g. while refining the Extended Syscall Policy).
--sandbox2_danger_danger_permit_all
--sandbox2_danger_danger_permit_all_and_log
Sandbox Executor
The Sandbox Executor is a process that is not sandboxed itself. It's the ptrace tracer process that attaches to the Sandboxee (ptrace tracee process). The Sandbox Executor also sets up and runs a Monitor instance which tracks the Sandboxee and provides status information.
Sandbox2 allows three execution modes: Stand-alone, Sandbox2 Forkserver, and Custom Forkserver. If you use a forkserver, the Sandboxee is created as a child process of the Sandbox Executor. These modes are explained in detail here.
Sandboxee
The Sandboxee is the process which runs in the restricted, sandboxed environment which was defined by the Sandbox Policy. The Sandbox Executor sends the policy to the Sandboxee via IPC. The Sandboxee then applies the policy. Any violation of the policy will result in the termination of the process, unless configured otherwise (see Sandbox Policy).