LiteLLM Proxy
The per-job model proxy in front of your upstream API
eval never points the agent directly at your upstream model API. Instead, each job starts its own LiteLLM proxy, which normalizes the endpoint, exposes both OpenAI- and Anthropic-compatible protocols, and attaches the trajectory logger.
Why a proxy
- Protocol bridging — different agent scaffolds expect OpenAI or Anthropic message formats. The proxy serves both from one upstream model.
- Trajectory capture — the proxy's logger writes each request/response to
litellm-trajectory.jsonl, which is what makes a run reproducible and analyzable. - Backend transparency — because the agent only ever talks to the proxy, the same setup works whether the upstream is a remote API or a local vLLM checkpoint.
- Isolation — each job gets a fresh, per-job config so concurrent or sequential jobs don't share state.
Configuration
The upstream model is declared in config.yaml under runtime_info.input.llm_api. The block ships two interchangeable recipes, with only one uncommented at a time:
# MODE A — remote API (shared GLM-5 endpoint)
llm_api:
api_key: dummy-cf
api_base_url: "http://llm10.jierungogogo.com/v1"
model: "openai/GLM-5-FP8"
protocols: [openai_compatible, anthropic_compatible]
served_via: per_job_litellm_proxy
input_cost_per_token: 0.0000021
output_cost_per_token: 0.0000084# MODE B — local vLLM checkpoint
llm_api:
api_key: dummy-key # must match vLLM --api-key
api_base_url: "http://<GPU_NODE_IP>:8000/v1"
model: "openai/Qwen3-8B" # openai/<vLLM --served-model-name>
protocols: [openai_compatible, anthropic_compatible]
served_via: per_job_litellm_proxy
input_cost_per_token: 0.0
output_cost_per_token: 0.0The proxy itself is configured under runtime_info.input.litellm_proxy:
litellm_proxy:
config_template: scripts/serve_llm/litellm_config.example.yaml
port: 4101
master_key: dummy-key-cfdummy keys are intentional
The dummy-* API keys are intentional for Cloudflare-gated production endpoints — auth happens at the gateway, not via the key. For a local vLLM checkpoint, api_key must match vLLM's --api-key.
Lifecycle
scripts/start.sh handles the proxy automatically:
- Renders a per-job config from the template into
artifacts/litellm/<job>/litellm_config_eval.yaml. - Starts the proxy on
litellm_proxy.port, passing the config to Harbor's LiteLLM serve script viaLITELLM_CONFIG. - Runs the Harbor job against the proxy.
One proxy only
The model-serving layer for a local checkpoint is vLLM only. start.sh already runs the per-job LiteLLM proxy (with the trajectory logger, sticky routing, and Anthropic-format support); do not start a second LiteLLM next to vLLM — it would collide on the port and bypass that logging. When a job's inference is finished but the proxy is still up, stop the process started for this job — and only that one.
set -u workaround
Harbor's serve_litellm.sh dereferences LITELLM_STICKY_ROUTING_ALIASES without a default under set -u. The block defaults it to an empty string via environment.extra.LITELLM_STICKY_ROUTING_ALIASES in config.yaml, so leave that key in place.