Catch regressions before production

Run the same suite on every deploy and track quality over time. A prompt change that breaks booking shows up in a test, not a support ticket.

A high-frequency safety net

LLM Chat is the fastest and cheapest connection type, billed per round-trip. Use it as a high-frequency sanity check: run LLM Chat tests on every deploy to catch regressions in prompt logic, tool calls, and response content before spending credits on phone or browser tests. Then reserve the higher-fidelity phone and browser tests for the surfaces that matter most.

Track quality over time

Run the same suite every deploy. Keep one set of scenarios and re-run them so quality is measured the same way each time.
Compare across channels. Use a Folder and the Suite Runner to run the identical suite across chat, phone, widget, and voice and see which surface a change broke.
Read the Report. After a run, the channel Report summarizes results, and — when you provide the agent’s system prompt — suggests concrete prompt improvements drawn from the real transcripts (always as suggestions, never auto-applied).

Wire it into CI

Once you are testing in the app, automate it: drive Testzilla via the API in CI, or via MCP from an AI agent like Claude Code or Cursor, so the suite runs without anyone clicking a button.

Reports Channel reports and agent prompt improvements after a run.

Integrate via API Automate projects, channels, tests, and runs over REST in CI.

Catch regressions before production

A high-frequency safety net

Track quality over time

Wire it into CI

Related