eval

Run scenario-based behavioral evals. pipecat eval run tests scenarios against an already-running agent; pipecat eval suite spawns the agents listed in a manifest and runs their scenarios concurrently. Both exit 0 when everything passes and 1 otherwise. The same commands are also available as python -m pipecat.evals. See the Pipecat Evals guide for concepts, the scenario format, and manifests.

eval run

Run one or more scenarios against an already-running agent (started with -t eval). Usage:

pipecat eval run [OPTIONS] SCENARIOS...

Arguments:

path

required

One or more scenario YAML files.

Options:

string

default:"ws://localhost:7860"

WebSocket URL of the agent’s eval transport.

flag

Print a line for each turn and expectation as it resolves.

flag

Record each scenario’s conversation audio (audio-mode scenarios).

string

default:"recordings"

Directory for --audio recordings: <record-dir>/<scenario>.wav.

string

Directory for cached synthesized user audio. Defaults to <user-cache-dir>/pipecat/tts.

flag

Disable the user-audio cache: re-synthesize every turn (no reads or writes).

integer

default:"60"

Default per-expectation timeout in seconds, for expectations without their own within_ms.

string

default:"."

Directory for each scenario’s logs: <logs-dir>/<scenario>.eval.log (plus .debug.log under --debug).

flag

Also save <scenario>.debug.log with the harness’s full per-pipeline logs.

flag

Cancel the agent’s pipeline (exit it) after the run. By default the agent is left running so it can serve more scenarios.

eval suite

Spawn the agents in a manifest and run their scenarios concurrently. Everything except the suite: list can be set in the manifest or overridden on the command line (the command line wins). Usage:

pipecat eval suite [OPTIONS] MANIFEST_PATH

Arguments:

path

required

Manifest YAML listing agents and their scenarios.

Options:

string

Only run bots whose path contains this substring.

string

Only run this scenario name.

string

Run subdirectory name under runs_dir. Defaults to a timestamp.

path

Output base, overriding the manifest’s runs_dir. A <name>/ subdirectory with logs/ and recordings/ is created under it. Defaults to eval-runs.

path

Override the manifest’s bots_dir (bot paths are relative to it).

path

Override the manifest’s scenarios_dir.

integer

Override the manifest’s concurrency (how many runs execute at once).

integer

Override the manifest’s base_port (default 7900). Each run gets `base_port

index`.

string

Override the manifest’s cache_dir for cached synthesized user audio.

flag

Disable the user-audio cache: re-synthesize every turn (no reads or writes).

integer

default:"60"

Default per-expectation timeout in seconds, for expectations without their own within_ms.

string

Override the manifest’s spawn template. Default: "{python} {bot} -t eval --port {port}".

string

Override the Python interpreter used to spawn each agent.

flag

Record conversation audio.

flag

Also save <run>.debug.log with the harness’s full per-pipeline logs.

Examples

# Run one scenario against a running agent
pipecat eval run scenarios/capital_question.yaml

# Run a batch of scenarios, verbosely
pipecat eval run scenarios/*.yaml -v

# Run a full suite
pipecat eval suite manifest.yaml

# Only the support agent, 8 runs at a time, named output dir
pipecat eval suite manifest.yaml -p support -c 8 -n nightly

Pipecat Server

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

eval run

eval suite

Examples

​eval run

​eval suite

​Examples

eval run

eval suite

Examples