Results & Artifacts
Where trajectories, evaluation results, analysis, and archives live
Everything a job produces lands under this block's artifacts/ directory. This page maps out what is written and where.
Per-job outputs
artifacts/jobs/<job>/
├── results.json # aggregate job stats (rewards, errors)
├── config.yaml # snapshot of the active config at launch
├── analysis/ # post-eval job analysis (see below)
└── <task>/
├── agent/litellm-trajectory.jsonl # one raw trajectory per task
└── evaluation/ # per-task verdict, scoring, test logslitellm-trajectory.jsonlis the replayable log of one trial, written by the LiteLLM logger.evaluation/holds the per-task grading output — verdict, scoring, and test logs that determine the reward.results.jsonaggregates rewards and error stats across all trials.
This directory is the block's eval_results_dir output (declared in config.yaml → runtime_info.output). eval is a terminal block, so there is no downstream consumer.
Analysis outputs
When job analysis runs (automatically after each eval, or manually via scripts/analyze_job.sh), it writes the files the dashboard reads:
artifacts/jobs/<job>/analysis/
├── report_failed.json / report_resolved.json # primary/axis failure & resolve distributions
├── report_task_analysis.json # task difficulty tiers, domain/bug-type breakdowns
├── traj_analysis/score_comparison.json # resolved vs unresolved metrics
├── instance_analysis/{summary,correlations}.json
├── instances.jsonl
└── analysis_config.yaml # self-contained config snapshotSee Job Analysis for the pipeline and its gold-dataset dependency.
Run archives
After each run, a snapshot is archived under:
artifacts/archives/run_NNN/
├── metadata.yaml # run id, timestamps, results, repo commit ids, copy of inputs
├── config.yaml # config as it was at run time
├── scripts/ # copy of executed scripts
├── session.log # session record
└── monitor.md # monitor outputand one entry is appended to artifacts/index.yaml. Use artifacts/index.yaml for archived run history and config.yaml → status for the current operational snapshot (see Status).
Cleaning up
scripts/clean.sh removes gitignored runtime outputs (jobs/, litellm/, logs/). Pass --repos to also drop repos/. It does not touch archived runs.