vmyard sandbox scheduler · ERP•AI Labs

A high-throughput scheduler for sandboxes.

vmyard places and bin-packs microVM and container sandboxes onto a fleet of hosts, with the dataplane behind pluggable drivers for Firecracker, Cloud Hypervisor, and Docker. The design target is thousands of placements per second from a single scheduler node.

host-01 host-02 host-03 host-04 94% 91% 58% 70%
Hosts are tracks; sandboxes are cars. The scheduler packs for density and keeps headroom on each host for sandboxes about to wake (red).

Why not Kubernetes

Kubernetes schedules services that start once and run for months. A sandbox is created in a burst next to a thousand siblings, runs for a few seconds, goes to sleep, and is expected back fast when its owner returns. Pushed through pod machinery, every one of those transitions pays for API round-trips, scheduling passes, kubelet reconciliation, and network setup that were priced for a workload changing state a few times a day.

Waiting for Kubernetes to grow into this is a bet against its own architecture: a control plane built on slowly converging desired state cannot become the fast path for millions of short-lived, stateful workloads. That gap is what vmyard is for.

Lifecycle

PhaseWhat happens
createBoot a fresh sandbox from an image. The cold path.
placePick a host. Bin-pack for density while leaving headroom for wakes.
runCode executes. The scheduler stays out of the way.
sleepSnapshot and release CPU and memory. Identity and state are kept.
wakeRestore from snapshot. The latency this project is judged on.
destroyReclaim everything.

Drivers

Drivers own the dataplane. The scheduling core sees capacity, constraints, and lifecycle verbs; it never talks to a runtime directly. The shape follows the task-driver model proven in HashiCorp Nomad.

DriverIsolationStatus
firecrackermicroVMfirst target
cloud hypervisormicroVMplanned
dockercontainerplanned, for development and CI

Benchmarks

Every number is reproducible from the repository with one command, runs from a fixed seed, and reports latency percentiles next to throughput. Measured 2026-06-12; machines noted per table.

Placement decisions — single thread, Apple M5 laptop, fleet of 32-core/128 GiB hosts

FleetSustained churnDecision p50Decision p99Wake locality
100 hosts7,433k placements/s208 ns334 ns98.1% local
1,000 hosts912k placements/s1.7 µs2.3 µs98.2% local
10,000 hosts88k placements/s17.6 µs22.0 µs98.3% local

Placement is a deliberate linear scan, and the curve above is its published cost. Wake locality holds at ~98% at every fleet size because it depends on the per-host wake reserve, not on scale.

Time to runnable — through the Docker dev driver, real containers, CI runner

ModeSpawn p50Spawn p95Wake p50Wake p95
serial, 30 containers126 ms147 ms18.1 ms19.7 ms
storm, 60 × 8 workers785 ms1.08 s113 ms156 ms

The scheduler contributes nanoseconds (42 ns p50 against a no-op driver); the rest is Docker's container setup, which serializes under the 8-way storm. Docker wake is pause/unpause, not snapshot restore — that distinction is the reason the numbers are labeled. Real restore benchmarks begin with the Firecracker driver.

Status

In development.

The scheduler core, simulator, driver contract, and a Docker development driver exist and are benchmarked in CI against real containers on every push. There is no installable release yet. Source opens with the first runnable release.