Skip to main content
Run scenario-based behavioral evals. pipecat eval run tests scenarios against an already-running agent; pipecat eval suite spawns the agents listed in a manifest and runs their scenarios concurrently. Both exit 0 when everything passes and 1 otherwise. The same commands are also available as python -m pipecat.evals. See the Pipecat Evals guide for concepts, the scenario format, and manifests.

eval run

Run one or more scenarios against an already-running agent (started with -t eval). Usage:
pipecat eval run [OPTIONS] SCENARIOS...
Arguments:
SCENARIOS...
path
required
One or more scenario YAML files.
Options:
--bot-url
string
default:"ws://localhost:7860"
WebSocket URL of the agent’s eval transport.
--verbose / -v
flag
Print a line for each turn and expectation as it resolves.
--audio / -a
flag
Record each scenario’s conversation audio (audio-mode scenarios).
--record-dir
string
default:"recordings"
Directory for --audio recordings: <record-dir>/<scenario>.wav.
--cache-dir
string
Directory for cached synthesized user audio. Defaults to <user-cache-dir>/pipecat/tts.
--no-cache
flag
Disable the user-audio cache: re-synthesize every turn (no reads or writes).
--timeout / -t
integer
default:"60"
Default per-expectation timeout in seconds, for expectations without their own within_ms.
--logs-dir
string
default:"."
Directory for each scenario’s logs: <logs-dir>/<scenario>.eval.log (plus .debug.log under --debug).
--debug / -d
flag
Also save <scenario>.debug.log with the harness’s full per-pipeline logs.
--stop-bot
flag
Cancel the agent’s pipeline (exit it) after the run. By default the agent is left running so it can serve more scenarios.

eval suite

Spawn the agents in a manifest and run their scenarios concurrently. Everything except the suite: list can be set in the manifest or overridden on the command line (the command line wins). Usage:
pipecat eval suite [OPTIONS] MANIFEST_PATH
Arguments:
MANIFEST_PATH
path
required
Manifest YAML listing agents and their scenarios.
Options:
--pattern / -p
string
Only run bots whose path contains this substring.
--scenario / -s
string
Only run this scenario name.
--name / -n
string
Run subdirectory name under runs_dir. Defaults to a timestamp.
--runs-dir
path
Output base, overriding the manifest’s runs_dir. A <name>/ subdirectory with logs/ and recordings/ is created under it. Defaults to eval-runs.
--bots-dir
path
Override the manifest’s bots_dir (bot paths are relative to it).
--scenarios-dir
path
Override the manifest’s scenarios_dir.
--concurrency / -c
integer
Override the manifest’s concurrency (how many runs execute at once).
--base-port
integer
Override the manifest’s base_port (default 7900). Each run gets `base_port
  • index`.
--cache-dir
string
Override the manifest’s cache_dir for cached synthesized user audio.
--no-cache
flag
Disable the user-audio cache: re-synthesize every turn (no reads or writes).
--timeout / -t
integer
default:"60"
Default per-expectation timeout in seconds, for expectations without their own within_ms.
--spawn
string
Override the manifest’s spawn template. Default: "{python} {bot} -t eval --port {port}".
--python
string
Override the Python interpreter used to spawn each agent.
--audio / -a
flag
Record conversation audio.
--debug / -d
flag
Also save <run>.debug.log with the harness’s full per-pipeline logs.

Examples

# Run one scenario against a running agent
pipecat eval run scenarios/capital_question.yaml

# Run a batch of scenarios, verbosely
pipecat eval run scenarios/*.yaml -v

# Run a full suite
pipecat eval suite manifest.yaml

# Only the support agent, 8 runs at a time, named output dir
pipecat eval suite manifest.yaml -p support -c 8 -n nightly