An AI tester drives your site through its most important user journeys — checkout, add-to-cart, search, signup — after every deploy or on a schedule. Plain-English scenarios. No selector maintenance. Real headless mobile browser. Results in 60 seconds.
No card. 14-day free trial. Currently production-proven on a live e-commerce Magento storefront.
Your team won't write Playwright tests. They'd run them once, then maintain them never. Your customers find the bugs first. These are the regressions UX Tester catches.
A plugin update or a CSS refactor broke the add-to-basket flow on mobile. No alerts fired. Conversion dropped 14% before anyone noticed.
A JavaScript error in the cart code blanks the displayed total on a specific product variant. Your dev catches it three days later, when a customer emails support.
The dev shipped from desktop Chrome and never checked the iPhone viewport. The mobile menu overlay covers the entire homepage. Two-thirds of your traffic is mobile.
No selectors. No Playwright code. No test maintenance. You write scenarios the way you'd describe them to a new hire.
Drop in your production URL (and a staging URL if you have one). We auto-detect your stack — Shopify, Magento, WooCommerce, Webflow, custom — and suggest 4-5 starter scenarios.
*"Add a product to the basket. Verify the total shows a sensible price greater than £0. No console errors."* The AI handles selectors, waits, lazy-load — and adapts when your UI changes.
POST to your webhook URL from GitHub Actions / GitLab CI / Envoyer / whatever ships your code, or set scheduled runs (hourly, daily, custom) for synthetic monitoring with no CI at all.
name: Add to basket + total start: / goal: Configure a plate with any sample text (e.g. "TEST123"). Add it to the basket. Verify the basket reflects the addition. Open the basket if it didn't open automatically. Verify the plate is listed and the total is a sensible £ amount > £0 — not £0.00, NaN, or "Loading...". fail_if: - add-to-basket button missing - basket empty after add - total is £0 or NaN - JS console error during add
There are four real ways small teams catch frontend regressions today. Here's the trade-off honestly.
| Playwright / Cypress | Manual QA | Synthetic monitoring (Pingdom) | UX Tester | |
|---|---|---|---|---|
| Test maintenance | Breaks on every UI change | Slow, expensive, inconsistent | N/A — only checks the URL | AI adapts to UI churn |
| Writing a test | Dev writes selectors + waits | Tester memorises checklist | URL + status code | Plain English, 60 seconds |
| After every deploy? | If CI runs them | No — too slow | No — wrong shape | Yes — CI webhook or schedule |
| Verdicts include | Pass/fail + stack trace | Tester's notes | Up/down | Verdict + reasoning + screenshots + console errors |
| Cost per check | Dev time × maintenance forever | £30-80/hr × scenarios × frequency | £0.001 | ~£0.20-1 / scenario, capped per run |
UX Tester doesn't replace Playwright if your team is happily writing and maintaining Playwright tests. It replaces the *intention* to write them — the gap where small teams know they should test on every deploy, never get around to writing the scripts, and ship broken code to customers.
All tiers include AI scenarios, screenshot evidence, per-scenario reasoning, and the cost cap. Each tier includes a number of sites and a monthly scenario quota; extra sites and scenarios are billed at the rates below.
1 site · 150 scenario runs/month · email-only support.
5 sites · 600 scenario runs/month · Slack + webhook integrations.
20 sites · 2,000 scenario runs/month · white-label reports.
Free trial: 14 days, 10 scenario runs, no card required. All paid tiers include LLM costs — we run on Claude Haiku 4.5 by default, with optional per-scenario upgrade to Claude Sonnet for hard cases. Enterprise customers can plug in their own Anthropic key and we drop the per-scenario charge to zero. You set a hard monthly cost ceiling per account; we suspend and email you before any open-ended bill accrues.
UX Tester started as an internal deploy gate for a real Magento e-commerce storefront — one that processes orders for actual customers. Multiple deploys per week, mobile-first, hard gate on FAIL. Four weeks of internal data before any of this was offered externally.
The runner, the prompt-injection mitigations, the cost ceiling, the heartbeat/reaper pattern — they all exist because the internal version made the same mistakes you'd make and fixed them in production.
Your scenarios are plain English. If you cancel, you take them with you — nothing's locked to our SaaS. No code to migrate, no test framework rewrite.
Claude doesn't write your test. It runs it, observes the page, and judges whether what it sees matches your stated goal. Verdicts cite specific observations — generic "looks good" PASS is rejected by design.
Three hard caps stop runaway tests: per-scenario (20 turns, 40k tokens), per-run (your cost ceiling, default $5), per-tenant (monthly ceiling you set). Tester errors default to WARN, not FAIL — your deploy gate isn't coupled to Anthropic's uptime.
Playwright and Cypress are selector-based: your dev writes click('.add-to-cart-btn') and the test breaks the moment the class name changes. UX Tester writes scenarios in plain English ("add a product to the basket, verify the total updates") and an AI drives the browser. UI churn doesn't break the test; only an actual user-facing regression does. We don't replace Playwright if your team is happily maintaining it — we replace the gap where teams know they should test and don't.
Yes — POST to your unique webhook URL with an API token to trigger a run from any CI (GitHub Actions, GitLab CI, CircleCI, Envoyer, Buddy). The run starts immediately and reports back via outbound webhook (Slack/Discord/Teams compatible) when it completes. You can also configure scheduled runs (hourly, daily, custom cron) for synthetic monitoring without a CI.
In plain English. "Open the homepage. Find the main call-to-action. Click it. Verify the next page loads without JavaScript console errors." When you add a site, we auto-detect your stack (Shopify, Magento, WooCommerce, Webflow, custom) and suggest 4-5 starter scenarios. Edit them, duplicate them, write your own from scratch. There's a "preview this scenario only" button so you can iterate cheaply before saving.
Three hard caps. Per-scenario: 20 turns and 40,000 output tokens. Per-run: a configurable cost ceiling (default $5). Per-tenant: a monthly hard ceiling — runs are suspended and you're emailed before any open-ended bill accrues. Tester-side failures (Anthropic API down, Chromium crash, network) default to a WARN verdict, never FAIL — so a flaky test never blocks your deploy because of our infrastructure.
Yes. Each site has a production URL and an optional UAT/staging URL. CI triggers can target either. The pattern most teams use: run on staging before the production deploy fires (gate), then a smoke run on production after deploy lands (verification). Both result-sets live in the same dashboard, so you can see if a pre-deploy PASS on UAT turned into a post-deploy FAIL on prod — which usually means an env-specific issue.
Often, yes — strict bot protection (Cloudflare "Under Attack", DataDome, Akamai Bot Manager) will block any headless browser, including ours. The mitigation: we provide a signed test header you allowlist at your WAF or origin so production traffic stays protected from real bots while our tester gets through. It's a one-off rule, and we'll write the Cloudflare API call for you on signup if you give us a token.
Tell us about your shop and we'll get you in. The first fifteen agencies through the door get hands-on setup — we configure your scenarios, wire your CI, and run the first month with you for free.
No card. No commitment. We'll talk through what you're trying to test and tell you honestly whether this fits.