Skip to content

Catch regressions before production

Run the same suite on every deploy and track quality over time. A prompt change that breaks booking shows up in a test, not a support ticket.

LLM Chat is the fastest and cheapest connection type — 1 credit per round-trip. Use it as a high-frequency sanity check: run LLM Chat tests on every deploy to catch regressions in prompt logic, tool calls, and response content before spending credits on phone or browser tests. Then reserve the higher-fidelity phone and browser tests for the surfaces that matter most.

  • Run the same suite every deploy. Keep one set of scenarios and re-run them so quality is measured the same way each time.
  • Compare across channels. Use a Folder and the Suite Runner to run the identical suite across chat, phone, widget, and voice and see which surface a change broke.
  • Read the Report. After a run, the channel Report summarizes results, and — when you provide the agent’s system prompt — suggests concrete prompt improvements drawn from the real transcripts (always as suggestions, never auto-applied).

Once you are testing in the app, automate it: drive Testzilla via the API in CI, or via MCP from an AI agent like Claude Code or Cursor, so the suite runs without anyone clicking a button.