Agents

The agent scaffold is the harness that drives the model through each task. eval validates three scaffolds end-to-end. Switch by editing the four agent fields below together in config.yaml — name, version, runtime_image, and runtime_host_path.

Supported agents

`name`	Protocol	Version	Runtime image
`custom-claude-code`	Anthropic	`2.1.118`	`docker.io/jierun/c-cc-2.1.118:v0.1`
`custom-openhands-sdk`	OpenAI	`1.14.0`	`docker.io/jierun/c-oh-sdk-1.14.0:v0.5`
`custom-opencode`	OpenAI-compatible	`1.14.22`	`docker.io/yjiangcm/c-oc-1.14.22:v0.2`

max_turns is reused as max_iterations when the agent is openhands-sdk (same semantics). The per-job LiteLLM proxy bridges OpenAI and Anthropic formats, so any of these works against the same upstream model.

The runtime bind-mount

runtime_host_path must point at a directory that already contains the extracted agent runtime tree from runtime_image. start.sh bind-mounts it into each task container.

Pre-extract the runtime

If runtime_host_path is empty, the agent falls back to an in-container install (curl https://claude.ai/install.sh for claude-code, pip for openhands-sdk), which 403s or times out on isolated networks. Bind-mount is also preferred over image-mount, which trips an overlayfs filename-too-long bug for some task images.

Pre-extract once per runtime_image (idempotent — re-run when you bump the image). Set SUBPATH to claude-code or oh-sdk to match the agent:

RUNTIME_IMAGE=<runtime_image from config>
SUBPATH=oh-sdk            # or: claude-code
HOST_DIR=artifacts/runtime/<runtime_host_path basename>
docker pull "$RUNTIME_IMAGE"
CID=$(docker create "$RUNTIME_IMAGE")
rm -rf "$HOST_DIR" && mkdir -p "$(dirname "$HOST_DIR")"
docker cp "$CID:/opt/custom-agent-runtime/$SUBPATH" "$HOST_DIR"
docker rm "$CID"

Agent vs benchmark compatibility

The three agents above are validated against the curated SWE / terminal benchmarks. Math/MCQ/QA benchmarks and benchmarks that ship their own agent runtime generally do not work with these coding agents — see Select Benchmark.

Agents

Supported agents

The runtime bind-mount

Agent vs benchmark compatibility

On this page