Skip to main content

Failed-execution triage

A failed execution is information, not a verdict. This page walks through how to find out why a step failed and what to change so the next run is green — without throwing away cache, rewriting the test from scratch, or asking support if you don't have to.

The triage loop in one picture

┌──────────────────────────────────────────────────────────┐
│ Open the failing step's popover │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│ Read Result + Explanation │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────┐
│ Was the AI confused, or was │
│ the page actually broken? │
└──────────────────────────────┘
↓ ↓
AI confused Page broken
────────────── ──────────
• Warnings tab • Verify in the app
• Clear cache for the step • File a bug
• Rewrite the description • Mark known error
• Get step recommendation

┌──────────────────────────────────────────────────────────┐
│ Re-run — full, failed-only, or from-step (recording) │
└──────────────────────────────────────────────────────────┘

The rest of this page is just an expansion of that loop.

Step 1 — read the failure signal

Click the failing step card (in either the step timeline or the left panel) to open the step detail popover, then read the three signal fields:

FieldWhat it tells you
ResultThe AI's verdict on the step — Passed, Failed with a one-line conclusion ("Expected button to be visible, but it was not").
Explanation for resultThe AI's reasoning. This is where it tells you why it picked the verdict — which element it was looking for, what it observed, what it compared.
Suggestion for better descriptionAn optional AI-generated rewrite of the step. Often the AI is hinting that the step description was ambiguous.

The combination of Result and Explanation almost always points at one of three root causes:

  1. The step description was ambiguous — the AI picked the wrong element, asserted the wrong thing, or interpreted "click the cart" as "click the icon" instead of "click the cart button".
  2. The selector cache is stale — the AI re-used a cached XPath that no longer points at the intended element (typically after a UI change).
  3. The application is genuinely broken — the element really isn't there, the workflow really doesn't work.

The next sections cover what to do in each case.

Step 2 — check the Warnings tab

If the step popover has a Warnings tab, open it. The runner only adds warnings when it detected something that the AI itself was uncertain about. Common warning types:

WarningWhat it meansWhat to do
Multiple elements foundThe element description matched more than one element on the page; the runner picked the highest-rated one.Make the description more specific (mention container, label, position). Use the thumbs-up/down feedback on the right element.
DOM unstableThe page was still re-rendering when the action ran.Add a wait, or check whether auto-waiting is being defeated by an animation or polling.
Cache semantic mismatchThe cached XPath and the freshly-extracted XPath disagree about what the right element is.Open the cache-info dialog and review or delete the stale entry. See Step 3.
Fallback model usedThe primary AI model failed (rate limit, error); the configured fallback ran instead.Usually nothing — just be aware. Repeat failures mean your primary model has capacity issues.
Element low qualityThe picked element had a low quality score.Rewrite the description, or open the cache dialog to see whether a better candidate is cached.

The Warnings tab also includes a feedback widget: thumbs-up if the agent picked the right element, thumbs-down otherwise. This both helps the AI learn over time and gives you a chance to confirm whether the issue is the selection or the action.

Step 3 — decide: clear cache, fix description, or fix selector?

This is the most common decision when triaging a failure. Use the table below.

SymptomMost likely causeRecommended action
Same step that used to pass now fails after a UI change.Stale cache entry.Clear cache for the step. Then rerun. Most stale-cache issues self-correct after one cache clear.
Step has always been flaky — sometimes picks the right element, sometimes not.Description is too generic.Rewrite the description to mention more context (label, container section, position). Don't bother clearing the cache — the AI was guessing each time.
AI says "I couldn't find the element" but you can see it in the screenshot.Element description doesn't match what's labelled in the DOM (e.g. you wrote "Save" but the visible label is "Save changes").Rewrite the description to match the actual visible text or aria-label.
Cache dialog shows multiple XPath candidates, and the wrong one has the highest score.Two similar elements on the page.Open the cache-info dialog, delete the wrong XPath candidate, then add disambiguating context to the description.
The element is in an iframe / shadow DOM / virtualised list.Selector fundamentals.Fix the selector — either pin a more specific XPath via the step's advanced settings, or restructure the test to scroll the virtualised list before acting.
The AI's reasoning shows it understood the wrong intent ("I think the user wants to click the X").Step description is misleading.Rewrite with action verbs that fit the supported actions.

Rule of thumb: clear cache only when the step used to work and just stopped. If the step never reliably worked, the problem is in the description, not the cache.

Step 4 — use Get step recommendation when you're stuck

If you cannot tell from the Result + Explanation what the right phrasing should be, the step popover has a Get step recommendation option (wrench icon) which:

  • Opens an interactive mini-run paused on the failing step.
  • Asks you to click on the element you actually meant.
  • Returns up to four alternative step descriptions worded against that element.

You then paste the wording that best fits your audience back into the step. Recommendation mode is suggestive only — it does not change anything by itself.

See Recording → Recommendation mode for the full flow.

Step 5 — check the Log for context

The step popover shows the AI's conclusion. The Log shows the path the AI took to that conclusion: which element candidates it considered, which cache entry it consulted, which model returned what, how long each sub-action took.

Open the Log, filter to the failing step, and read the few lines just before the error. The most useful patterns:

  • Warning: picked candidate with quality 0.4x → the AI was unsure. Look at the description.
  • Cache hit (semantic match) followed by failure → the cache entry was wrong. Clear it.
  • Multiple elements found → add more context to the description.
  • Element not actionable → the element was found but couldn't be clicked (covered by overlay, disabled, off-screen). Look at the screenshot.

Step 6 — when the application is actually broken

If after Steps 1-5 you conclude the failure is a real defect in the application (not in the test):

  • Take a verbose snapshot for the bug report. Open the Log, click Collect Support Information with at least AI communication and HTML content selected, and once the rerun finishes, Download Support Information.
  • Mark the step as a known error if applicable — the step popover offers a Fix known error button when the failure matches a registered known-error pattern. Use this to keep the test's overall status meaningful while the application bug is being fixed.

Step 7 — re-run

Once you've made a change (cleared cache, rewrote the description, fixed a selector) re-run:

  • For a single test: Rerun (or Ctrl+p).
  • For a test set with mostly green tests: Rerun failed is the cheap option.
  • To re-execute from the step you just changed without going through everything before it: open the step popover and use Start recording from here — this gives you an interactive session that fast-forwards to that step.

See Re-running an execution for a quick comparison of the four rerun options.

What not to do

  • Don't tighten an XPath by hand as a first response. XPaths in advanced step settings reduce the AI's flexibility. Use them only after the description, the cache, and the recommendation flow have all been exhausted.
  • Don't clear the cache on a step that is failing for a non-cache reason. It wastes the next run's time re-doing AI element identification for no benefit.
  • Don't rerun the whole test set when only one test failed. Use Rerun failed — it preserves the already-passed results.
  • Don't change the test definition while a verbose rerun is in flight. The collected bundle would record an inconsistent definition. Wait until the verbose rerun is finished, then edit.

See also