Failed-execution triage
A failed execution is information, not a verdict. This page walks through how to find out why a step failed and what to change so the next run is green — without throwing away cache, rewriting the test from scratch, or asking support if you don't have to.
The triage loop in one picture
┌──────────────────────────────────────────────────────────┐
│ Open the failing step's popover │
└──────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────┐
│ Read Result + Explanation │
└──────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────┐
│ Was the AI confused, or was │
│ the page actually broken? │
└──────────────────────────────┘
↓ ↓
AI confused Page broken
────────────── ──────────
• Warnings tab • Verify in the app
• Clear cache for the step • File a bug
• Rewrite the description • Mark known error
• Get step recommendation
↓
┌──────────────────────────────────────────────────────────┐
│ Re-run — full, failed-only, or from-step (recording) │
└──────────────────────────────────────────────────────────┘
The rest of this page is just an expansion of that loop.
Step 1 — read the failure signal
Click the failing step card (in either the step timeline or the left panel) to open the step detail popover, then read the three signal fields:
| Field | What it tells you |
|---|---|
| Result | The AI's verdict on the step — Passed, Failed with a one-line conclusion ("Expected button to be visible, but it was not"). |
| Explanation for result | The AI's reasoning. This is where it tells you why it picked the verdict — which element it was looking for, what it observed, what it compared. |
| Suggestion for better description | An optional AI-generated rewrite of the step. Often the AI is hinting that the step description was ambiguous. |
The combination of Result and Explanation almost always points at one of three root causes:
- The step description was ambiguous — the AI picked the wrong element, asserted the wrong thing, or interpreted "click the cart" as "click the icon" instead of "click the cart button".
- The selector cache is stale — the AI re-used a cached XPath that no longer points at the intended element (typically after a UI change).
- The application is genuinely broken — the element really isn't there, the workflow really doesn't work.
The next sections cover what to do in each case.
Step 2 — check the Warnings tab
If the step popover has a Warnings tab, open it. The runner only adds warnings when it detected something that the AI itself was uncertain about. Common warning types:
| Warning | What it means | What to do |
|---|---|---|
| Multiple elements found | The element description matched more than one element on the page; the runner picked the highest-rated one. | Make the description more specific (mention container, label, position). Use the thumbs-up/down feedback on the right element. |
| DOM unstable | The page was still re-rendering when the action ran. | Add a wait, or check whether auto-waiting is being defeated by an animation or polling. |
| Cache semantic mismatch | The cached XPath and the freshly-extracted XPath disagree about what the right element is. | Open the cache-info dialog and review or delete the stale entry. See Step 3. |
| Fallback model used | The primary AI model failed (rate limit, error); the configured fallback ran instead. | Usually nothing — just be aware. Repeat failures mean your primary model has capacity issues. |
| Element low quality | The picked element had a low quality score. | Rewrite the description, or open the cache dialog to see whether a better candidate is cached. |
The Warnings tab also includes a feedback widget: thumbs-up if the agent picked the right element, thumbs-down otherwise. This both helps the AI learn over time and gives you a chance to confirm whether the issue is the selection or the action.
Step 3 — decide: clear cache, fix description, or fix selector?
This is the most common decision when triaging a failure. Use the table below.
| Symptom | Most likely cause | Recommended action |
|---|---|---|
| Same step that used to pass now fails after a UI change. | Stale cache entry. | Clear cache for the step. Then rerun. Most stale-cache issues self-correct after one cache clear. |
| Step has always been flaky — sometimes picks the right element, sometimes not. | Description is too generic. | Rewrite the description to mention more context (label, container section, position). Don't bother clearing the cache — the AI was guessing each time. |
| AI says "I couldn't find the element" but you can see it in the screenshot. | Element description doesn't match what's labelled in the DOM (e.g. you wrote "Save" but the visible label is "Save changes"). | Rewrite the description to match the actual visible text or aria-label. |
| Cache dialog shows multiple XPath candidates, and the wrong one has the highest score. | Two similar elements on the page. | Open the cache-info dialog, delete the wrong XPath candidate, then add disambiguating context to the description. |
| The element is in an iframe / shadow DOM / virtualised list. | Selector fundamentals. | Fix the selector — either pin a more specific XPath via the step's advanced settings, or restructure the test to scroll the virtualised list before acting. |
| The AI's reasoning shows it understood the wrong intent ("I think the user wants to click the X"). | Step description is misleading. | Rewrite with action verbs that fit the supported actions. |
Rule of thumb: clear cache only when the step used to work and just stopped. If the step never reliably worked, the problem is in the description, not the cache.
Step 4 — use Get step recommendation when you're stuck
If you cannot tell from the Result + Explanation what the right phrasing should be, the step popover has a Get step recommendation option (wrench icon) which:
- Opens an interactive mini-run paused on the failing step.
- Asks you to click on the element you actually meant.
- Returns up to four alternative step descriptions worded against that element.
You then paste the wording that best fits your audience back into the step. Recommendation mode is suggestive only — it does not change anything by itself.
See Recording → Recommendation mode for the full flow.
Step 5 — check the Log for context
The step popover shows the AI's conclusion. The Log shows the path the AI took to that conclusion: which element candidates it considered, which cache entry it consulted, which model returned what, how long each sub-action took.
Open the Log, filter to the failing step, and read the few lines just before the error. The most useful patterns:
Warning: picked candidate with quality 0.4x→ the AI was unsure. Look at the description.Cache hit (semantic match)followed by failure → the cache entry was wrong. Clear it.Multiple elements found→ add more context to the description.Element not actionable→ the element was found but couldn't be clicked (covered by overlay, disabled, off-screen). Look at the screenshot.
Step 6 — when the application is actually broken
If after Steps 1-5 you conclude the failure is a real defect in the application (not in the test):
- Take a verbose snapshot for the bug report. Open the Log, click Collect Support Information with at least AI communication and HTML content selected, and once the rerun finishes, Download Support Information.
- Mark the step as a known error if applicable — the step popover offers a Fix known error button when the failure matches a registered known-error pattern. Use this to keep the test's overall status meaningful while the application bug is being fixed.
Step 7 — re-run
Once you've made a change (cleared cache, rewrote the description, fixed a selector) re-run:
- For a single test: Rerun (or Ctrl+p).
- For a test set with mostly green tests: Rerun failed is the cheap option.
- To re-execute from the step you just changed without going through everything before it: open the step popover and use Start recording from here — this gives you an interactive session that fast-forwards to that step.
See Re-running an execution for a quick comparison of the four rerun options.
What not to do
- Don't tighten an XPath by hand as a first response. XPaths in advanced step settings reduce the AI's flexibility. Use them only after the description, the cache, and the recommendation flow have all been exhausted.
- Don't clear the cache on a step that is failing for a non-cache reason. It wastes the next run's time re-doing AI element identification for no benefit.
- Don't rerun the whole test set when only one test failed. Use Rerun failed — it preserves the already-passed results.
- Don't change the test definition while a verbose rerun is in flight. The collected bundle would record an inconsistent definition. Wait until the verbose rerun is finished, then edit.
See also
- Reading the Log — the log viewer + verbose-rerun flow.
- Recording → Recommendation mode — get alternative step descriptions for an element.
- Caching — how the cache works and when entries are reused.
- Prompting — how to write clear, unambiguous step descriptions.
- Test-Case Overview — finding the test definition to edit.