LiteLLM Proxy

eval never points the agent directly at your upstream model API. Instead, each job starts its own LiteLLM proxy, which normalizes the endpoint, exposes both OpenAI- and Anthropic-compatible protocols, and attaches the trajectory logger.

Why a proxy

Protocol bridging — different agent scaffolds expect OpenAI or Anthropic message formats. The proxy serves both from one upstream model.
Trajectory capture — the proxy's logger writes each request/response to litellm-trajectory.jsonl, which is what makes a run reproducible and analyzable.
Backend transparency — because the agent only ever talks to the proxy, the same setup works whether the upstream is a remote API or a local vLLM checkpoint.
Isolation — each job gets a fresh, per-job config so concurrent or sequential jobs don't share state.

Configuration

The upstream model is declared in config.yaml under runtime_info.input.llm_api. The block ships two interchangeable recipes, with only one uncommented at a time:

# MODE A — remote API (shared GLM-5 endpoint)
llm_api:
  api_key: dummy-cf
  api_base_url: "http://llm10.jierungogogo.com/v1"
  model: "openai/GLM-5-FP8"
  protocols: [openai_compatible, anthropic_compatible]
  served_via: per_job_litellm_proxy
  input_cost_per_token: 0.0000021
  output_cost_per_token: 0.0000084

# MODE B — local vLLM checkpoint
llm_api:
  api_key: dummy-key                       # must match vLLM --api-key
  api_base_url: "http://<GPU_NODE_IP>:8000/v1"
  model: "openai/Qwen3-8B"                 # openai/<vLLM --served-model-name>
  protocols: [openai_compatible, anthropic_compatible]
  served_via: per_job_litellm_proxy
  input_cost_per_token: 0.0
  output_cost_per_token: 0.0

The proxy itself is configured under runtime_info.input.litellm_proxy:

litellm_proxy:
  config_template: scripts/serve_llm/litellm_config.example.yaml
  port: 4101
  master_key: dummy-key-cf

dummy keys are intentional

The dummy-* API keys are intentional for Cloudflare-gated production endpoints — auth happens at the gateway, not via the key. For a local vLLM checkpoint, api_key must match vLLM's --api-key.

Lifecycle

scripts/start.sh handles the proxy automatically:

Renders a per-job config from the template into artifacts/litellm/<job>/litellm_config_eval.yaml.
Starts the proxy on litellm_proxy.port, passing the config to Harbor's LiteLLM serve script via LITELLM_CONFIG.
Runs the Harbor job against the proxy.

One proxy only

The model-serving layer for a local checkpoint is vLLM only. start.sh already runs the per-job LiteLLM proxy (with the trajectory logger, sticky routing, and Anthropic-format support); do not start a second LiteLLM next to vLLM — it would collide on the port and bypass that logging. When a job's inference is finished but the proxy is still up, stop the process started for this job — and only that one.

set -u workaround

Harbor's serve_litellm.sh dereferences LITELLM_STICKY_ROUTING_ALIASES without a default under set -u. The block defaults it to an empty string via environment.extra.LITELLM_STICKY_ROUTING_ALIASES in config.yaml, so leave that key in place.

LiteLLM Proxy

Why a proxy

Configuration

Lifecycle

On this page