go-sandbox

module
v0.10.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 16, 2024 License: MIT

README

go-sandbox

GoDoc Go Report Card Release

Original goal was to replica uoj-judger/run_program in GO language using libseccomp. As technology grows, it also implements new technologies including Linux namespace and cgroup.

The idea of rootfs and interval CPU usage checking comes from syzoj/judge-v3 and the pooled pre-forked container comes from vijos/jd4.

If you are looking for sandbox implementation via REST / gRPC API, please check go-judge.

Notice: Only works on Linux since ptrace, unshare, cgroup are available only on Linux

Build & Install

  • install latest go compiler from golang/download
  • install libseccomp library: (for Ubuntu) apt install libseccomp-dev
  • build & install: go install github.com/criyle/go-sandbox/...

Technologies

libseccomp + ptrace (improved UOJ sandbox)
  1. Restricted computing resource by POSIX rlimit: Time & Memory (Stack) & Output
  2. Restricted syscall access (by libseccomp & ptrace)
  3. Restricted file access (read & write & access & exec). Evaluated by UOJ FileSet

Improvements:

  1. Precise resource limits (s -> ms, mb -> kb)
  2. More architectures (arm32, arm64)
  3. Allow multiple traced programs in different threads
  4. Allow pipes as input / output files

Default file access syscall check:

  • check file read / write: open, openat
  • check file read: readlink, readlinkat
  • check file write: unlink, unlinkat, chmod, rename
  • check file access: stat, lstat, access, faccessat
  • check file exec: execve, execveat
linux namespace + cgroup
  1. Unshare & bind mount rootfs based on hostfs (eliminated ptrace)
  2. Use Linux Control Groups to limit & acct CPU & memory (eliminated wait4.rusage)
  3. Container tech with execveat memfd, sethostname, setdomainname

Design

Result Status
  • Normal (no error)
  • Program Error
    • Resource Limit Exceeded
      • Time
      • Memory
      • Output
    • Unauthorized Access
      • Disallowed Syscall
    • Runtime Error
      • Signalled
        • SIGXCPU / SIGKILL are treated as TimeLimitExceeded by rlimit or caller kill
        • SIGXFSZ is treated as OutputLimitExceeded by rlimit
        • SIGSYS is treaded as Disallowed Syscall by seccomp
        • Potential Runtime error are: SIGSEGV (segment fault)
      • Nonzero Exit Status
  • Program Runner Error
Result Structure
type Result struct {
    Status            // result status
    ExitStatus int    // exit status (signal number if signalled)
    Error      string // potential detailed error message (for program runner error)

    Time   time.Duration // used user CPU time  (underlying type int64 in ns)
    Memory Size          // used user memory    (underlying type uint64 in bytes)
    // metrics for the program runner
    SetUpTime   time.Duration
    RunningTime time.Duration
}
Runner Interface

Configured runner to run the program. Context is used to cancel (control time limit exceeded event; should not be nil).

type Runner interface {
    Run(context.Context) <-chan runner.Result
}
Pre-forked Container Protocol
  1. Pre-fork container to run programs inside
  2. Unix socket to pass fd inside / outside

Container / Host Communication Protocol (single thread):

  • ping (alive check):
    • reply: pong
  • conf (set configuration):
    • reply pong
  • open (open files in given mode inside container):
    • send: []OpenCmd
    • reply: "success", file fds / "error"
  • delete (unlink file / rmdir dir inside container):
    • send: path
    • reply: "finished" / "error"
  • reset (clean up container for later use (clear workdir / tmp)):
    • send:
    • reply: "success"
  • execve: (execute file inside container):
    • send: argv, env, rLimits, fds
    • reply:
      • success: "success", pid
      • failed: "failed"
    • send (success): "init_finished" (as cmd)
      • reply: "finished" / send: "kill" (as cmd)
      • send: "kill" (as cmd) / reply: "finished"
    • reply:

Any socket related error will cause the container exit (with all process inside container)

Pre-forked Container Environment

Container restricted environment is accessed though RPC interface defined by above protocol

Provides:

  • File access
    • Open: create / access files
    • Delete: remove file
  • Management
    • Ping: alive check
    • Reset: remove temporary files
    • Destroy: destroy the container environment
  • Run program
    • Execve: execute program with given parameters
type Environment interface {
    Ping() error
    Open([]OpenCmd) ([]*os.File, error)
    Delete(p string) error
    Reset() error
    Execve(context.Context, ExecveParam) <-chan runner.Result
    Destroy() error
}

Packages (/pkg)

  • seccomp: provides seccomp type definition
    • libseccomp: provides utility function that wrappers libseccomp
  • forkexec: fork-exec provides mount, unshare, ptrace, seccomp, capset before exec
  • memfd: read regular file and creates a sealed memfd for its contents
  • unixsocket: send / recv oob msg from a unix socket
  • cgroup: creates cgroup directories and collects resource usage / limits
  • mount: provides utility function that wrappers mount syscall
  • rlimit: provides utility function that defines rlimit syscall
  • pipe: provides wrapper to collect all written content through pipe

Packages

  • cmd/runprog/config: defines arch & language specified trace condition for ptrace runner from UOJ
  • container: creates pre-forked container to run programs inside
  • runner: interface to run program
    • ptrace: wrapper to call forkexec and ptracer
      • filehandler: an example implementation of UOJ file set
    • unshare: wrapper to call forkexec and unshared namespaces
  • ptracer: ptrace tracer and provides syscall trap filter context

Executable

  • runprog: safely run program by unshare / ptrace / pre-forked containers

Configurations

  • config/config.go: all configs toward running specs (similar to UOJ)

Kernel Versions

  • 5.19: memory.peak in cgroup v2
  • 4.15: cgroup v2
  • 4.14: SECCOMP_RET_KILL_PROCESS
  • 4.6: CLONE_NEWCGROUP
  • 3.19: execveat()
  • 3.17: seccomp, memfd_create
  • 3.10: CentOS 7
  • 3.8: CLONE_NEWUSER without CAP_SYS_ADMIN, CAP_SETUID, CAP_SETGID
  • 3.5: prctl(PR_SET_NO_NEW_PRIVS)
  • 2.6.36: prlimit64

Benchmarks

ForkExec
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/forkexec
BenchmarkSimpleFork-4              	   12409	    996096 ns/op
BenchmarkUnsharePid-4              	   10000	   1065168 ns/op
BenchmarkUnshareUser-4             	   10000	   1061770 ns/op
BenchmarkUnshareUts-4              	   10000	   1056558 ns/op
BenchmarkUnshareCgroup-4           	   10000	   1049446 ns/op
BenchmarkUnshareIpc-4              	     709	  16114052 ns/op
BenchmarkUnshareMount-4            	     745	  16207754 ns/op
BenchmarkUnshareNet-4              	    3643	   3492924 ns/op
BenchmarkFastUnshareMountPivot-4   	     612	  20967318 ns/op
BenchmarkUnshareAll-4              	     837	  14047995 ns/op
BenchmarkUnshareMountPivot-4       	     488	  24198331 ns/op
PASS
ok  	github.com/criyle/go-sandbox/pkg/forkexec	147.186s
Container
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/container
BenchmarkContainer-4   	    5907	   2062070 ns/op
PASS
ok  	github.com/criyle/go-sandbox/container	21.763s
Cgroup
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/cgroup
BenchmarkCgroup-4   	   50283	    245094 ns/op
PASS
ok  	github.com/criyle/go-sandbox/pkg/cgroup	14.744s
Socket

Blocking:

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8             12170148              1048 ns/op
BenchmarkGoroutine-8             2658846              4910 ns/op
BenchmarkChannel-8               8454133              1431 ns/op
BenchmarkChannelBuffed-8         8767264              1357 ns/op
BenchmarkChannelBuffed4-8        9670935              1230 ns/op
BenchmarkEmptyGoroutine-8       34927512               342.8 ns/op
PASS
ok      github.com/criyle/go-sandbox/pkg/unixsocket     83.669s

Non-block:

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8             11609772              1001 ns/op
BenchmarkGoroutine-8             2470767              4788 ns/op
BenchmarkChannel-8               8488646              1427 ns/op
BenchmarkChannelBuffed-8         8876050              1345 ns/op
BenchmarkChannelBuffed4-8        9813187              1212 ns/op
BenchmarkEmptyGoroutine-8       34852828               342.2 ns/op
PASS
ok      github.com/criyle/go-sandbox/pkg/unixsocket     81.679s

Directories

Path Synopsis
cmd
runprog
Command runprog executes program defined restricted environment including seccomp-ptraced, namespaced and containerized.
Command runprog executes program defined restricted environment including seccomp-ptraced, namespaced and containerized.
Package container provides pre-forked container environment to run programs in isolated Linux namespaces.
Package container provides pre-forked container environment to run programs in isolated Linux namespaces.
pkg
cgroup
Package cgroup provides builder to create cgroup under systemd defined mount path (i.e.,sys/fs/cgroup) including v1 and v2 implementation.
Package cgroup provides builder to create cgroup under systemd defined mount path (i.e.,sys/fs/cgroup) including v1 and v2 implementation.
forkexec
Package forkexec provides interface to run a subprocess with seccomp filter, rlimit and containerized or ptraced.
Package forkexec provides interface to run a subprocess with seccomp filter, rlimit and containerized or ptraced.
memfd
Package memfd provides interface to Linux memfd to create and seal a memory file.
Package memfd provides interface to Linux memfd to create and seal a memory file.
mount
Package mount provides general data structure for mount and mount namespace (multiple mounts) definition.
Package mount provides general data structure for mount and mount namespace (multiple mounts) definition.
pipe
Package pipe provides a wrapper to create a pipe and collect at most max bytes from the reader side
Package pipe provides a wrapper to create a pipe and collect at most max bytes from the reader side
rlimit
Package rlimit provides data structure for resource limits by setrlimit syscall on linux.
Package rlimit provides data structure for resource limits by setrlimit syscall on linux.
seccomp
Package seccomp provides a generated filter format for seccomp filter
Package seccomp provides a generated filter format for seccomp filter
seccomp/libseccomp
Package libseccomp provides a wrapper for "github.com/seccomp/libseccomp-golang"
Package libseccomp provides a wrapper for "github.com/seccomp/libseccomp-golang"
unixsocket
Package unixsocket provides wrapper for Linux unix socket to send and recv oob messages including fd and user credential.
Package unixsocket provides wrapper for Linux unix socket to send and recv oob messages including fd and user credential.
Package ptracer provides platform independent ptrace pooling loop interface to trace program syscalls on Linux.
Package ptracer provides platform independent ptrace pooling loop interface to trace program syscalls on Linux.
Package runner provides common interface for program runner together with common types including Result, Limit, Size and Status.
Package runner provides common interface for program runner together with common types including Result, Limit, Size and Status.
unshare
Package unshare implements runner that uses Linux unshare syscall & mount namespace & rlimit to restrict program access
Package unshare implements runner that uses Linux unshare syscall & mount namespace & rlimit to restrict program access

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL