2. Create a Sandbox Policy

Once you have an executor, you will likely want to define a Sandbox Policy for the Sandboxee. Otherwise, the Sandboxee is only protected by the Default Syscall Policy.

With the Sandbox Policy, the objective is to restrict the syscalls and arguments that the Sandboxee can make, as well as the files it can access. You will need to have a detailed understanding of the syscalls required by the code you plan to sandbox. One way of observing syscalls is to run the code with Linux's command-line tool strace.

Once you have the list of syscalls, you can use the PolicyBuilder to define the policy. PolicyBuilder comes with many convenience and helper functions that allow many common operations. The following list is only a small excerpt of available functions:

  • Allowlist any syscall for process startup:
    • AllowStaticStartup();
    • AllowDynamicStartup();
  • Allowlist any open/read/write* syscalls:
    • AllowOpen();
    • AllowRead();
    • AllowWrite();
  • Allowlist any exit/access/state related syscalls:
    • AllowExit();
    • AllowStat();
    • AllowAccess();
  • Allowlist any sleep/time related syscalls:
    • AllowTime();
    • AllowSleep();

These convenience functions allowlist any relevant syscall. This has the advantage that the same policy can be used over different architectures where certain syscalls are not available (e.g. ARM64 has no OPEN syscall), but with the minor security risk of enabling more sycsalls than might be necessary. For example, AllowOpen() enables the Sandboxee to call any open related syscall. If you only want to allowlist one specific syscall, you can use AllowSyscall(); to allow multiple syscalls at once you can use AllowSyscalls().

So far the policy only checks the syscall identifier. If you have the need to further strengthen the policy and want to define a policy in which you only allow a syscall with particular arguments, you need to use AddPolicyOnSyscall() or AddPolicyOnSyscalls(). These functions not only take the syscall ID as an argument, but also a raw seccomp-bpf filter using the bpf helper macros from the Linux kernel. See the kernel documentation for more information about BPF. If you find yourself writing repetitive BPF code that you think should have a usability-wrapper, feel free to file a feature request.

Apart from syscall-related functions, the PolicyBuilder also provides a number of filesystem-related functions like AddFile() or AddDirectory() to bind-mount a file/directory into the sandbox. The AddTmpfs() helper can be used to add a temporary file storage within the sandbox.

A particularly useful function is AddLibrariesForBinary() which adds the libraries and linker required by a binary.

Coming up with the syscalls to allowlist is still a bit of manual work unfortunately. Create a policy with the syscalls you know your binary needs and run it with a common workload. If a violation is triggered, allowlist the syscall and repeat the process. If you run into a violation that you think might be risky to allowlist and the program handles errors gracefully, you can try to make it return an error instead with BlockSyscallWithErrno().

#include "sandboxed_api/sandbox2/policy.h"
#include "sandboxed_api/sandbox2/policybuilder.h"
#include "sandboxed_api/sandbox2/util/bpf_helper.h"

std::unique_ptr<sandbox2::Policy> CreatePolicy() {
  return sandbox2::PolicyBuilder()
    .AllowSyscall(__NR_read)  // See also AllowRead()
    .AllowTime()              // Allow time, gettimeofday and clock_gettime
    .AddPolicyOnSyscall(__NR_write, {
        ARG(0),        // fd is the first argument of write (argument #0)
        JEQ(1, ALLOW), // allow write only on fd 1
        KILL,          // kill if not fd 1
    })
    .AddPolicyOnSyscall(__NR_mprotect, {
        ARG_32(2), // prot is a 32-bit wide argument, so it's OK to use *_32
                   // macro here
        JNE32(PROT_READ | PROT_WRITE, KILL), // prot must be the RW, otherwise
                                             // kill the process
        ARG(1), // len is a 64-bit argument
        JNE(0x1000, KILL),  // Allow single page syscalls only, otherwise kill
                            // the process
        ALLOW,              // Allow for the syscall to proceed, if prot and
                            // size match
    })
    // Allow the openat() syscall but always return "not found".
    .BlockSyscallWithErrno(__NR_openat, ENOENT)
    .BuildOrDie();
}