pipecat eval run tests scenarios against an already-running agent; pipecat eval suite spawns the agents listed in a manifest and runs their scenarios concurrently. Both exit 0 when everything passes and 1 otherwise.
The same commands are also available as python -m pipecat.evals.
See the Pipecat Evals guide for concepts, the scenario format, and manifests.
eval run
Run one or more scenarios against an already-running agent (started with-t eval).
Usage:
One or more scenario YAML files.
WebSocket URL of the agent’s eval transport.
Print a line for each turn and expectation as it resolves.
Record each scenario’s conversation audio (audio-mode scenarios).
Directory for
--audio recordings: <record-dir>/<scenario>.wav.Directory for cached synthesized user audio. Defaults to
<user-cache-dir>/pipecat/tts.Disable the user-audio cache: re-synthesize every turn (no reads or writes).
Default per-expectation timeout in seconds, for expectations without their own
within_ms.Directory for each scenario’s logs:
<logs-dir>/<scenario>.eval.log (plus
.debug.log under --debug).Also save
<scenario>.debug.log with the harness’s full per-pipeline logs.Cancel the agent’s pipeline (exit it) after the run. By default the agent is
left running so it can serve more scenarios.
eval suite
Spawn the agents in a manifest and run their scenarios concurrently. Everything except thesuite: list can be set in the manifest or overridden on the command line (the command line wins).
Usage:
Manifest YAML listing agents and their scenarios.
Only run bots whose path contains this substring.
Only run this scenario name.
Run subdirectory name under
runs_dir. Defaults to a timestamp.Output base, overriding the manifest’s
runs_dir. A <name>/ subdirectory
with logs/ and recordings/ is created under it. Defaults to eval-runs.Override the manifest’s
bots_dir (bot paths are relative to it).Override the manifest’s
scenarios_dir.Override the manifest’s
concurrency (how many runs execute at once).Override the manifest’s
base_port (default 7900). Each run gets `base_port- index`.
Override the manifest’s
cache_dir for cached synthesized user audio.Disable the user-audio cache: re-synthesize every turn (no reads or writes).
Default per-expectation timeout in seconds, for expectations without their own
within_ms.Override the manifest’s spawn template. Default:
"{python} {bot} -t eval --port {port}".Override the Python interpreter used to spawn each agent.
Record conversation audio.
Also save
<run>.debug.log with the harness’s full per-pipeline logs.