pipecat eval run tests scenarios against an agent you started yourself. A suite goes one step further: you list agents and scenarios in a manifest, and pipecat eval suite spawns each agent with its eval transport on its own port, runs its scenarios, tears it down, and aggregates the results, several runs at a time.
Suites are the right tool when you have more than one agent, more than a handful of scenarios, or want a single command for CI. Pipecat’s own release evals are a manifest with 100+ example agents plus this command.
The manifest
manifest.yaml
bots_dir, scenarios_dir, runs_dir, the bot: entries) resolve relative to the manifest file, so a manifest is portable: check it into your repo and run it from anywhere.
Scenarios are reusable across agents. One greeting scenario can cover every agent in the suite.
An optional
runner_body: points at a JSON file passed to the agent as
--runner-body. It supplies session data the agent would normally receive in
a /start request body (for example, a vision agent’s image path).Running a suite
0 only if every run passes.
Useful flags:
suite: list can live in the manifest or be passed on the command line (the command line wins), so a manifest can be as minimal as a suite: list.
Run output
Each invocation writes to<runs_dir>/<name>/ (a timestamp when -n is omitted):
.eval.log decision trace: it’s a timestamped record of every event the harness saw, what it matched, what the judge said, and why an assertion failed. The agent’s own log sits next to it.
Testing one agent with many scenarios
If you just want to run a batch of scenarios against an agent you already have running, you don’t need a manifest.pipecat eval run accepts multiple scenario files and shares the suite’s dashboard and tally:
--stop-bot to shut it down when the batch finishes.
Suites in CI
The exit code makes suites CI-ready with no extra glue:Next steps
Using the Library
Orchestrate suites programmatically with
EvalManifest and EvalSuite.Agent Self-Improvement
Let an AI coding assistant run your suite and iterate until it’s green.