Catch regressions before production
Run the same suite on every deploy and track quality over time. A prompt change that breaks booking shows up in a test, not a support ticket.
A high-frequency safety net
Section titled “A high-frequency safety net”LLM Chat is the fastest and cheapest connection type — 1 credit per round-trip. Use it as a high-frequency sanity check: run LLM Chat tests on every deploy to catch regressions in prompt logic, tool calls, and response content before spending credits on phone or browser tests. Then reserve the higher-fidelity phone and browser tests for the surfaces that matter most.
Track quality over time
Section titled “Track quality over time”- Run the same suite every deploy. Keep one set of scenarios and re-run them so quality is measured the same way each time.
- Compare across channels. Use a Folder and the Suite Runner to run the identical suite across chat, phone, widget, and voice and see which surface a change broke.
- Read the Report. After a run, the channel Report summarizes results, and — when you provide the agent’s system prompt — suggests concrete prompt improvements drawn from the real transcripts (always as suggestions, never auto-applied).
Wire it into CI
Section titled “Wire it into CI”Once you are testing in the app, automate it: drive Testzilla via the API in CI, or via MCP from an AI agent like Claude Code or Cursor, so the suite runs without anyone clicking a button.
Related
Section titled “Related” Reports Channel reports and agent prompt improvements after a run.
Integrate via API Automate projects, channels, tests, and runs over REST in CI.