Skip to content

Eval Files

Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.

The primary format. A single file contains metadata, execution config, and tests:

description: Math problem solving evaluation
execution:
target: default
assert:
- name: correctness
type: llm_judge
prompt: ./judges/correctness.md
tests:
- id: addition
criteria: Correctly calculates 15 + 27 = 42
input: What is 15 + 27?
expected_output: "42"
FieldDescription
descriptionHuman-readable description of the evaluation
datasetOptional dataset identifier
executionDefault execution config (for example target)
workspaceSuite-level workspace config (lifecycle hooks, template)
testsArray of individual tests, or a string path to an external file
assertSuite-level evaluators appended to each test unless execution.skip_defaults: true is set on the test

You can add structured metadata to your eval file using these optional top-level fields. Metadata is parsed when the name field is present:

FieldDescription
nameMachine-readable identifier (lowercase, hyphens, max 64 chars). Triggers metadata parsing.
descriptionHuman-readable description (max 1024 chars)
versionEval version string (e.g., "1.0")
authorAuthor or team identifier
tagsArray of string tags for categorization
licenseLicense identifier (e.g., "MIT", "Apache-2.0")
requiresDependency constraints (e.g., agentv: ">=0.30.0")
name: export-screening
description: Evaluates export control screening accuracy
version: "1.0"
author: acme-compliance
tags: [compliance, agents]
license: Apache-2.0
requires:
agentv: ">=0.30.0"
tests:
- id: denied-party
criteria: Identifies denied parties correctly
input: Screen "Acme Corp" against denied parties list

The assert field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test’s evaluators unless a test sets execution.skip_defaults: true.

description: API response validation
assert:
- type: is_json
required: true
- type: contains
value: "status"
tests:
- id: health-check
criteria: Returns health status
input: Check API health

assert supports all evaluator types, including deterministic assertion types (contains, regex, is_json, equals) and rubrics. See Tests for per-test assert usage.

Instead of inlining tests in the same file, you can point tests to an external YAML or JSONL file. This is the inverse of the sidecar pattern — the metadata file references the test data:

name: my-eval
description: My evaluation suite
execution:
target: default
tests: ./cases.yaml

The path is resolved relative to the eval file’s directory. The external file should contain a YAML array of test objects or a JSONL file with one test per line.

For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single test:

{"id": "test-1", "criteria": "Calculates correctly", "input": "What is 2+2?"}
{"id": "test-2", "criteria": "Provides explanation", "input": "Explain variables"}

An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:

dataset.jsonl + dataset.eval.yaml:

description: Math evaluation dataset
dataset: math-tests
execution:
target: azure_base
assert:
- name: correctness
type: llm_judge
prompt: ./judges/correctness.md
  • Streaming-friendly — process line by line
  • Git-friendly — diffs show individual case changes
  • Programmatic generation — easy to create from scripts
  • Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets

Use the convert command to switch between YAML and JSONL:

Terminal window
agentv convert evals/dataset.eval.yaml --format jsonl
agentv convert evals/dataset.jsonl --format yaml