Spike S12 · CubeSandbox 在 WSL2 上跑通
状态:✅ 通过(2026-05-20)— 4 个 core service 全部 healthy,但 sandbox lifecycle 端到端未完整跑(template 创建留待 follow-up)
环境
- host:
douglas-wsl(Windows 11 22H2+ · WSL 2.7.3 · kernel 6.6.114-microsoft) - 资源:Intel x86_64 · 24 vCPU · 31 GiB RAM · 941 GiB free
- CubeSandbox:v0.2.2(拉到 release-46424e7)
- Tailscale:mirror mode(WSL eth1 共享 Windows host 的 100.116.83.96 接口)
- 跟主 API host 联通:Tailscale 100.116.83.96 ↔ 100.64.3.43 · 6.7 ms RTT
为什么记录这份 spike
CubeSandbox install 在 WSL2 上踩了 8 个坎(含 Microsoft 官方 + Tencent 官方都没明示的兼容性陷阱)。固化下来供未来 CubeSandboxAdapter 真跑 + 客户私有化 SOP 复用。
8 个坎完整复盘
坎 1 · XFS 必需(/data/cubelet)
CubeSandbox 用 XFS reflink + project quota 做 sandbox 镜像 CoW + 配额。install.sh 第一步检查 /data/cubelet 所在 fs。WSL 默认 root fs 是 ext4。
解法:loopback XFS 镜像
truncate -s 100G /cubelet-data.img
mkfs.xfs -m reflink=1,crc=1 -L cubelet -f /cubelet-data.img
mkdir -p /data/cubelet
mount -o loop,pquota /cubelet-data.img /data/cubelet
echo "/cubelet-data.img /data/cubelet xfs loop,pquota 0 0" >> /etc/fstab坎 2 · /sys/fs/bpf 没挂
network-agent 启动时把 eBPF map pin 到 /sys/fs/bpf/。WSL 默认不挂 bpffs。
FATAL: ebpf.NewCollectionWithOptions:
pin map to /sys/fs/bpf/ifindex_to_mvmmeta:
/sys/fs/bpf/ifindex_to_mvmmeta is not on a bpf filesystem解法:
mount -t bpf bpffs /sys/fs/bpf
echo "bpffs /sys/fs/bpf bpf defaults 0 0" >> /etc/fstab坎 3 · cube-proxy 443 撞 Tailscale Funnel
Windows 端 Tailscale 占了主机 443 端口(serve/funnel 功能)。cube-proxy 默认绑 host 443,docker 静默吞 port mapping。
解法:换 4443
curl -sL .../online-install.sh | sudo env CUBE_PROXY_HOST_PORT=4443 bash坎 4 · Hyper-V firewall 默认 block inbound
WSL2 mirrored mode 下 0.0.0.0 监听经过 Windows host network stack 转发;Hyper-V firewall DefaultInboundAction=Block 拦掉所有 inbound。docker 容器端口走 docker-proxy 不受影响,但 WSL host 进程(cube-api/cubelet/cubemaster/network-agent)受影响。
解法(Windows 端 admin PowerShell):
New-NetFirewallHyperVRule -Name "CubeSandbox" `
-DisplayName "CubeSandbox Cube Services" `
-Direction Inbound `
-VMCreatorId '{40E0AC32-46A5-438A-A0B2-2B479E8F2E90}' `
-Protocol TCP `
-LocalPorts 3000,8089,9966,9998,9999,4443,19090⚠️ 加规则后 host loopback 127.0.0.1 仍然不通(mirrored mode 行为,见坎 5)。规则主要让 LAN/Tailscale 跨网段访问通。
坎 5 · WSL2 mirrored mode 127.0.0.1 不可达
实测:WSL 内 curl http://127.0.0.1:3000/health(cube-api listen 0.0.0.0:3000)→ TCP 握手 fail。但 curl http://10.255.255.254:3000/health(WSL 内 global lo IP)→ 200 OK。
mirrored mode 把 WSL 127.0.0.1 映射到 Windows host loopback,inbound 经过 Hyper-V firewall。
解法:所有探活/健康检查改用 10.255.255.254(WSL 内 global lo IP):
CUBE_API_HEALTH_ADDR=10.255.255.254:3000 \
NETWORK_AGENT_HEALTH_ADDR=10.255.255.254:19090 \
CUBEMASTER_ADDR=10.255.255.254:8089 \
/usr/local/services/cubetoolbox/scripts/one-click/quickcheck.sh坎 6 · network-agent hardcode 127.0.0.1:19090 health
network-agent default --health-listen=127.0.0.1:19090。受坎 5 影响,外部不可达。
解法:启动加 --health-listen 0.0.0.0:19090:
/usr/local/services/cubetoolbox/network-agent/bin/network-agent \
--cubelet-config /usr/local/services/cubetoolbox/Cubelet/config/config.toml \
--state-dir /usr/local/services/cubetoolbox/network-agent/state \
--health-listen 0.0.0.0:19090⚠️ 初始化要 ~40 秒(创建 500 个 tap interface),耐心等 listener up。
坎 7 · cubelet 写死 127.0.0.1:8089 注册 cubemaster
cubelet binary 里 hardcode 用 POST http://127.0.0.1:8089/internal/meta/nodes/register 注册节点。坎 5 导致永远 register fail,nodes 状态 HEALTHY=false。
解法:iptables OUTPUT DNAT 重定向:
iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 8089 \
-j DNAT --to-destination 10.255.255.254:8089可选:同样处理 3306(mysql)和 6379(redis)端口,避免 cubemastercli 等 CLI 工具直连 mysql 失败。
坎 8 · cubemaster conf.yaml 5 处 127.0.0.1
cubemaster 启动 panic:dial tcp 127.0.0.1:3306: i/o timeout。配置在 /usr/local/services/cubetoolbox/CubeMaster/conf.yaml:
OssDBConfig.addr: 127.0.0.1:3306InstanceDBConfig.addr: 127.0.0.1:3306RedisConf.nodes: 127.0.0.1:6379RedisReadConf.nodes: 127.0.0.1:6379RedisWriteConf.nodes: 127.0.0.1:6379
解法:
sed -i.bak 's/127\.0\.0\.1:3306/10.255.255.254:3306/g; s/127\.0\.0\.1:6379/10.255.255.254:6379/g' \
/usr/local/services/cubetoolbox/CubeMaster/conf.yaml不动 HttpPort: 8089(cubemaster 自己 listen 端口,sed 没匹配到也好)。
完整 install + healthcheck SOP(汇总)
按顺序执行(all as root):
# === 准备阶段 ===
# 1. XFS loopback
modprobe xfs
truncate -s 100G /cubelet-data.img
mkfs.xfs -m reflink=1,crc=1 -L cubelet -f /cubelet-data.img
mkdir -p /data/cubelet
mount -o loop,pquota /cubelet-data.img /data/cubelet
grep -q /cubelet-data.img /etc/fstab || \
echo "/cubelet-data.img /data/cubelet xfs loop,pquota 0 0" >> /etc/fstab
# 2. bpffs
mount -t bpf bpffs /sys/fs/bpf
grep -q "bpffs /sys/fs/bpf" /etc/fstab || \
echo "bpffs /sys/fs/bpf bpf defaults 0 0" >> /etc/fstab
# === Windows 端(admin PowerShell)===
# 3. Hyper-V firewall allow cube ports
# (在 PowerShell admin 跑,见坎 4)
# === Install ===
# 4. install with CUBE_PROXY_HOST_PORT=4443
curl -sL https://github.com/tencentcloud/CubeSandbox/raw/master/deploy/one-click/online-install.sh \
| sudo env CUBE_PROXY_HOST_PORT=4443 bash
# install 末段会报 "cube-api did not become ready" —— 预期,继续手动修
# === Post-install patches ===
# 5. iptables DNAT 127.0.0.1:8089 → 10.255.255.254:8089
iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 8089 \
-j DNAT --to-destination 10.255.255.254:8089
iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 3306 \
-j DNAT --to-destination 10.255.255.254:3306
iptables -t nat -A OUTPUT -d 127.0.0.1 -p tcp --dport 6379 \
-j DNAT --to-destination 10.255.255.254:6379
# 6. cubemaster conf.yaml patch
sed -i.bak 's/127\.0\.0\.1:3306/10.255.255.254:3306/g; s/127\.0\.0\.1:6379/10.255.255.254:6379/g' \
/usr/local/services/cubetoolbox/CubeMaster/conf.yaml
# 7. 重启所有 cube core service(network-agent 加 --health-listen flag)
pkill -9 -f "/usr/local/services/cubetoolbox/"
sleep 2
nohup /usr/local/services/cubetoolbox/network-agent/bin/network-agent \
--cubelet-config /usr/local/services/cubetoolbox/Cubelet/config/config.toml \
--state-dir /usr/local/services/cubetoolbox/network-agent/state \
--health-listen 0.0.0.0:19090 \
>> /var/log/cube-sandbox-one-click/network-agent.log 2>&1 &
setsid bash -c "export CUBE_MASTER_CONFIG_PATH=/usr/local/services/cubetoolbox/CubeMaster/conf.yaml; \
/usr/local/services/cubetoolbox/CubeMaster/bin/cubemaster \
> /var/log/cube-sandbox-one-click/cubemaster.log 2>&1" < /dev/null &
# cube-api / cubelet 让 up.sh 启
env CUBE_API_HEALTH_ADDR=10.255.255.254:3000 \
NETWORK_AGENT_HEALTH_ADDR=10.255.255.254:19090 \
CUBEMASTER_ADDR=10.255.255.254:8089 \
bash /usr/local/services/cubetoolbox/scripts/one-click/up.sh验证
# quickcheck (sandbank 端到端协议层)
env CUBE_API_HEALTH_ADDR=10.255.255.254:3000 \
NETWORK_AGENT_HEALTH_ADDR=10.255.255.254:19090 \
CUBEMASTER_ADDR=10.255.255.254:8089 \
/usr/local/services/cubetoolbox/scripts/one-click/quickcheck.sh
# 预期输出:
# [quickcheck] 1/5 check network-agent healthz ✓
# [quickcheck] 2/5 check network-agent readyz ✓
# [quickcheck] 3/5 check cubemaster /notify/health ✓
# [quickcheck] 4/5 check cube-api /health ✓
# [quickcheck] 5/5 check essential sockets and config ✓
# [quickcheck] OK留给 follow-up
- ✗ 未做
cubemastercli tpl create-from-image真创建第一个 template(要拉 OCI image + ext4 rootfs 构建,~5-10 min) - ✗ 未做 E2BAdapter 真调 cube-api 3000 一次(要本地装 e2b SDK 或用 curl 模拟
POST /sandboxes) - ✗ 未验 长任务稳定性(30 min × 10 次)
- ✗ 未验 sandbox 内出口流量是否走 Tailscale eth1(mirror mode 行为)
坎 9(新发现,2026-05-20 下午)· cube-api 用默认 cubemaster URL 在 mirrored mode 下连不通
CubeSandboxAdapter 真跑期间发现:
GET /health 跨 Tailscale 21ms 返 200,但 GET /sandboxes 8s timeout 不返。深入 cube-api 源码 (CubeAPI/src/handlers/sandboxes.rs + CubeAPI/src/services/sandboxes.rs):
// services/sandboxes.rs
pub async fn list(&self, ...) -> AppResult<Vec<ListedSandbox>> {
let resp = self.cubemaster.list_sandboxes(&req).await?; // gRPC call, no timeout
...
}cube-api 在 install 时启动用默认 --cubemaster-url http://127.0.0.1:8089。WSL2 mirrored mode 下 127.0.0.1 → host loopback 经 Hyper-V firewall(坎 5),cubemaster gRPC 永远 connect fail,handler 一直 await 永不返。/health 不调 cubemaster 所以快返。
New-NetFirewallHyperVRule -Name CubeSandbox -LocalPorts 3000,8089,...,4443 加了 Inbound Allow 规则,从 WSL 内访问 work,但从 Windows host 经 Tailscale interface 进来的 inbound 仍被拦。
可能原因:
- 规则的
Profiles: Any没覆盖 Tailscale 接口的真实 profile(Tailscale 通常注册为 Public) - mirror mode 的 inbound 路由 + Hyper-V firewall 的多层 filter 行为
解法:cube-api 启动加显式 --cubemaster-url http://10.255.255.254:8089 flag(或 CUBE_MASTER_ADDR env),绕开默认 http://127.0.0.1:8089:
# 重启 cube-api 用 WSL 内 global lo IP
sudo pkill -9 -f cubetoolbox/CubeAPI/bin/cube-api
sudo setsid bash -c '
export LOG_DIR=/data/log/CubeAPI
export CUBE_API_BIND=0.0.0.0:3000
/usr/local/services/cubetoolbox/CubeAPI/bin/cube-api \
--cubemaster-url http://10.255.255.254:8089 \
> /var/log/cube-sandbox-one-click/cube-api.log 2>&1
' < /dev/null &验证:
$ curl http://100.116.83.96:3000/sandboxes
HTTP 200 t=3.4s [] ✓ (首次 gRPC connect 3.4s, 后续秒级)对 CubeSandboxAdapter 部署的影响:客户私有化部署 cube-api 必须用显式 cubemaster URL,不能依赖默认 127.0.0.1。建议把这一步加入”客户私有化部署 SOP”。
CubeSandbox API 真实路由(实施期校准结果)
从 CubeAPI/src/routes.rs 反查,E2BProtocolAdapter 假设的所有路径都正确:
GET /sandboxes → list_sandboxes
POST /sandboxes → create_sandbox
GET /sandboxes/:id → get_sandbox
DELETE /sandboxes/:id → kill_sandbox
GET /v2/sandboxes → list_sandboxes_v2
POST /sandboxes/:id/timeout
POST /sandboxes/:id/refreshes
POST /sandboxes/:id/pause
POST /sandboxes/:id/resume
POST /sandboxes/:id/connect
POST /sandboxes/:id/snapshots
GET /templates → list_templates
POST /templates → create_template外加 prefixed /cubeapi/v1/... 镜像所有以上路径 + cluster/nodes/config/store 等管理接口。CubeSandbox 是 E2B 协议反向兼容的:路径同名、结构同形,可以共用 E2BProtocolAdapter 基类。
对 CubeSandboxAdapter 实现的输入
这次 spike 决定的事:
CubeSandboxAdapter的apiUrl应该指向**http://<wsl-tailscale-ip>:3000**(cube-api 端点),不是 cube-proxy 4443- 主 API 从 Fly host 跨网段访问时走 Tailscale 100.116.83.96:3000 — 验证过 6.7 ms RTT
- WSL 主机内部 hardcode 127.0.0.1 问题与 adapter 实现无关(adapter 只调 HTTP 端点)
- Adapter 的 capability 集与 E2B 同:
['exec.stream','terminal','sleep','snapshot','port.expose']
参考
- CubeSandbox: https://github.com/TencentCloud/CubeSandbox
- WSL2 mirrored mode docs: https://learn.microsoft.com/en-us/windows/wsl/networking#mirrored-mode-networking
- Hyper-V firewall: https://learn.microsoft.com/en-us/windows/security/operating-system-security/network-security/windows-firewall/hyper-v-firewall
- 本 change:
openspec/changes/add-sandbank-fork-aio-e2b/ - 上游 issue #15(network-agent panic / WSL 兼容性): https://github.com/TencentCloud/CubeSandbox/issues/15