Eval Files

Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.

YAML Format

The primary format. A single file contains metadata, execution config, and tests:

description: Math problem solving evaluation
execution:
  target: default

assert:
  - name: correctness
    type: llm_judge
    prompt: ./judges/correctness.md

tests:
  - id: addition
    criteria: Correctly calculates 15 + 27 = 42
    input: What is 15 + 27?
    expected_output: "42"

Top-level Fields

Field	Description
`description`	Human-readable description of the evaluation
`dataset`	Optional dataset identifier
`execution`	Default execution config (for example `target`)
`workspace`	Suite-level workspace config (lifecycle hooks, template)
`tests`	Array of individual tests, or a string path to an external file
`assert`	Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test

Metadata Fields

You can add structured metadata to your eval file using these optional top-level fields. Metadata is parsed when the name field is present:

Field	Description
`name`	Machine-readable identifier (lowercase, hyphens, max 64 chars). Triggers metadata parsing.
`description`	Human-readable description (max 1024 chars)
`version`	Eval version string (e.g., `"1.0"`)
`author`	Author or team identifier
`tags`	Array of string tags for categorization
`license`	License identifier (e.g., `"MIT"`, `"Apache-2.0"`)
`requires`	Dependency constraints (e.g., `agentv: ">=0.30.0"`)

name: export-screening
description: Evaluates export control screening accuracy
version: "1.0"
author: acme-compliance
tags: [compliance, agents]
license: Apache-2.0
requires:
  agentv: ">=0.30.0"

tests:
  - id: denied-party
    criteria: Identifies denied parties correctly
    input: Screen "Acme Corp" against denied parties list

Suite-level Assert

The assert field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test’s evaluators unless a test sets execution.skip_defaults: true.

description: API response validation
assert:
  - type: is_json
    required: true
  - type: contains
    value: "status"

tests:
  - id: health-check
    criteria: Returns health status
    input: Check API health

assert supports all evaluator types, including deterministic assertion types (contains, regex, is_json, equals) and rubrics. See Tests for per-test assert usage.

Tests as String Path

Instead of inlining tests in the same file, you can point tests to an external YAML or JSONL file. This is the inverse of the sidecar pattern — the metadata file references the test data:

name: my-eval
description: My evaluation suite
execution:
  target: default
tests: ./cases.yaml

The path is resolved relative to the eval file’s directory. The external file should contain a YAML array of test objects or a JSONL file with one test per line.

JSONL Format

For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single test:

{"id": "test-1", "criteria": "Calculates correctly", "input": "What is 2+2?"}
{"id": "test-2", "criteria": "Provides explanation", "input": "Explain variables"}

Sidecar Metadata

An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:

dataset.jsonl + dataset.eval.yaml:

description: Math evaluation dataset
dataset: math-tests
execution:
  target: azure_base
assert:
  - name: correctness
    type: llm_judge
    prompt: ./judges/correctness.md

Benefits of JSONL

Streaming-friendly — process line by line
Git-friendly — diffs show individual case changes
Programmatic generation — easy to create from scripts
Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets

Converting Between Formats

Use the convert command to switch between YAML and JSONL:

agentv convert evals/dataset.eval.yaml --format jsonl
agentv convert evals/dataset.jsonl --format yaml