Creating Tests
A Test is a single scenario the AI tester runs against a channel. You describe, in plain language, what the tester should do and what counts as a pass — for example “book an appointment for Saturday” and “the agent must never give medical advice” — and Testzilla drives a real conversation against your agent and grades the result.
Tests sit at the third layer of the hierarchy: Project -> Channel -> Test -> Run -> Result. Every test belongs to a channel, and the channel’s connection type decides how the test reaches your agent (a phone call, a chat widget, a WebRTC voice session, or the LLM directly).
Create a test
Section titled “Create a test”-
Open a channel
In a project, open the channel you want to test against and click + Test (or New Test). The test will run against that channel’s connection type. (Or describe the scenario to Tessie and let it write the test for you.)
-
What should the tester do?
In plain language, describe what the tester should do — this is the scenario that drives the tester’s side of the conversation, turn by turn. Be specific about the goal and any constraints, for example:
“Caller asks to book a haircut for Saturday afternoon. The agent must confirm the day and time, and must not ask for a card number.”
In a hurry? Click Use a template to drop in a starter script you can edit. (The field’s technical name is Caller Script.)
-
What counts as passing?
State what makes the run a pass. The grader checks the finished transcript against these criteria and returns a verdict, a score, and an analysis. Keep criteria concrete and checkable (“confirms a date and time”, “never asks for payment details”) rather than vague. (Technical name: Pass Criteria.)
-
Advanced options (optional)
The common path stops at the two questions above. Expand Advanced options to tune the rest:
- Who speaks first? — whether the tester or the agent opens the conversation (Conversation Starter).
- Max call length — a cap so a test cannot run forever (Max Duration (seconds)).
- Prompt tokens — use
{{date}},{{phone_number}},{{channel_prompt}},{{project_prompt}}and the others to inject live values or pull in shared instructions from the channel and project instead of duplicating them.
-
Save
The test appears under its channel, ready to run.
Run a test and read the result
Section titled “Run a test and read the result”Pick the test and click Run. Choose how many iterations to run, and optionally schedule it for later; track in-progress runs on the Queue page.
When a run finishes, open the result for:
- a verdict banner — PASSED, FAILED, or UNCERTAIN — at the top of the page,
- a transcript of the conversation,
- a score,
- a two-column analysis of why it passed or failed, and
- the data collected during the run.
When a run fails, the result also shows a Suggested fix callout directly under the verdict banner — the single most actionable next step, pulled out of the analysis so you don’t have to dig for it.
Every section has a Copy button. Web Chat runs show a richer result with an enriched per-turn transcript and a screenshot gallery.
Verdicts: PASSED, FAILED, UNCERTAIN
Section titled “Verdicts: PASSED, FAILED, UNCERTAIN”The grader returns one of three verdicts, shown as a badge:
- PASSED — the transcript met your pass criteria.
- FAILED — it did not.
- UNCERTAIN — the grader could not decide confidently either way.
For scoring, PASSED counts as 100 and FAILED as 0; UNCERTAIN is treated as not passing.
Reuse tests across channels
Section titled “Reuse tests across channels”To run the same set of tests against several channels at once — and benchmark how each connection type holds up — put the tests in a Folder channel and use the suite runner. Each run is attributed to the channel it actually ran against, so the comparison stays meaningful.
Next steps
Section titled “Next steps”- Generate a Report to analyse a channel’s or project’s results and get proposed fixes.
- Automate runs from your own tooling: Integrate via API for CI, or Integrate via MCP to drive Testzilla from an AI agent.