For agencies, small SaaS teams, and e-commerce shops without QA

The buy button broke last night.
Find out before your customer does.

An AI tester drives your site through its most important user journeys — checkout, add-to-cart, search, signup — after every deploy or on a schedule. Plain-English scenarios. No selector maintenance. Real headless mobile browser. Results in 60 seconds.

No card. 14-day free trial. Currently production-proven on a live e-commerce Magento storefront.

acme-store.com · post-deploy smoke
Triggered by Envoyer · 4 scenarios · iPhone 12 viewport · 47s
1 FAIL
PASS
Homepage smoke
Primary CTA visible, no console errors, hero image rendered.
PASS
Plate Builder loads
Configurator UI present, sample input "TEST123" rendered, price £24.99 displayed.
FAIL
Add to basket + total
Add-to-basket button clicked successfully but basket total displays as £0.00 instead of the configured plate price. Console error: TypeError: undefined is not a function (cart.js:142).
WARN
Reach checkout
Could not reach checkout (basket regression upstream prevented this scenario from completing). Re-run after the fix.

The deploys that ship broken every week.

Your team won't write Playwright tests. They'd run them once, then maintain them never. Your customers find the bugs first. These are the regressions UX Tester catches.

"Add to basket" was working yesterday.

A plugin update or a CSS refactor broke the add-to-basket flow on mobile. No alerts fired. Conversion dropped 14% before anyone noticed.

Total displays as £0.00.

A JavaScript error in the cart code blanks the displayed total on a specific product variant. Your dev catches it three days later, when a customer emails support.

"Did anyone test this on mobile?"

The dev shipped from desktop Chrome and never checked the iPhone viewport. The mobile menu overlay covers the entire homepage. Two-thirds of your traffic is mobile.

How it works

Plain English in. Verdict out. After every deploy, or on a schedule.

No selectors. No Playwright code. No test maintenance. You write scenarios the way you'd describe them to a new hire.

1

Add a site.

Drop in your production URL (and a staging URL if you have one). We auto-detect your stack — Shopify, Magento, WooCommerce, Webflow, custom — and suggest 4-5 starter scenarios.

2

Write goals in English.

*"Add a product to the basket. Verify the total shows a sensible price greater than £0. No console errors."* The AI handles selectors, waits, lazy-load — and adapts when your UI changes.

3

Trigger from CI, or schedule.

POST to your webhook URL from GitHub Actions / GitLab CI / Envoyer / whatever ships your code, or set scheduled runs (hourly, daily, custom) for synthetic monitoring with no CI at all.

Sample scenario · written by an actual operator
name: Add to basket + total
start: /
goal: Configure a plate with any sample text (e.g. "TEST123"). Add it to
      the basket. Verify the basket reflects the addition. Open the basket
      if it didn't open automatically. Verify the plate is listed and the
      total is a sensible £ amount > £0 — not £0.00, NaN, or "Loading...".
fail_if:
  - add-to-basket button missing
  - basket empty after add
  - total is £0 or NaN
  - JS console error during add
vs. the alternatives

Why this, not Playwright / manual QA / synthetic monitoring.

There are four real ways small teams catch frontend regressions today. Here's the trade-off honestly.

Playwright / Cypress Manual QA Synthetic monitoring (Pingdom) UX Tester
Test maintenance Breaks on every UI change Slow, expensive, inconsistent N/A — only checks the URL AI adapts to UI churn
Writing a test Dev writes selectors + waits Tester memorises checklist URL + status code Plain English, 60 seconds
After every deploy? If CI runs them No — too slow No — wrong shape Yes — CI webhook or schedule
Verdicts include Pass/fail + stack trace Tester's notes Up/down Verdict + reasoning + screenshots + console errors
Cost per check Dev time × maintenance forever £30-80/hr × scenarios × frequency £0.001 ~£0.20-1 / scenario, capped per run

UX Tester doesn't replace Playwright if your team is happily writing and maintaining Playwright tests. It replaces the *intention* to write them — the gap where small teams know they should test on every deploy, never get around to writing the scripts, and ship broken code to customers.

Pricing

Priced per site. Scales with what you actually monitor.

All tiers include AI scenarios, screenshot evidence, per-scenario reasoning, and the cost cap. Each tier includes a number of sites and a monthly scenario quota; extra sites and scenarios are billed at the rates below.

Solo
£29 /month

1 site · 150 scenario runs/month · email-only support.

  • Plain-English scenarios, no selectors
  • CI-webhook trigger + scheduled runs
  • Email alerts on FAIL
  • 30-day screenshot retention
  • Extra scenarios: £0.50 each
Start free trial
Pro · most popular
£99 /month

5 sites · 600 scenario runs/month · Slack + webhook integrations.

  • Everything in Solo
  • Outbound webhooks (Slack/Discord/Teams)
  • Per-tenant API tokens
  • 90-day screenshot retention
  • Priority email support
  • Extra sites £15/mo · extra scenarios £0.30 each
Start free trial
Agency
£299 /month

20 sites · 2,000 scenario runs/month · white-label reports.

  • Everything in Pro
  • White-label reports for your clients
  • Sub-account customers
  • 1-business-day SLA
  • Extra sites £10/mo · extra scenarios £0.20 each
Talk to us

Free trial: 14 days, 10 scenario runs, no card required. All paid tiers include LLM costs — we run on Claude Haiku 4.5 by default, with optional per-scenario upgrade to Claude Sonnet for hard cases. Enterprise customers can plug in their own Anthropic key and we drop the per-scenario charge to zero. You set a hard monthly cost ceiling per account; we suspend and email you before any open-ended bill accrues.

Why trust this

Built by a team that ships e-commerce daily, on the same tool.

UX Tester started as an internal deploy gate for a real Magento e-commerce storefront — one that processes orders for actual customers. Multiple deploys per week, mobile-first, hard gate on FAIL. Four weeks of internal data before any of this was offered externally.

The runner, the prompt-injection mitigations, the cost ceiling, the heartbeat/reaper pattern — they all exist because the internal version made the same mistakes you'd make and fixed them in production.

No selector lock-in

Your scenarios are plain English. If you cancel, you take them with you — nothing's locked to our SaaS. No code to migrate, no test framework rewrite.

AI-judged, not AI-prompted

Claude doesn't write your test. It runs it, observes the page, and judges whether what it sees matches your stated goal. Verdicts cite specific observations — generic "looks good" PASS is rejected by design.

Cost is bounded, hard

Three hard caps stop runaway tests: per-scenario (20 turns, 40k tokens), per-run (your cost ceiling, default $5), per-tenant (monthly ceiling you set). Tester errors default to WARN, not FAIL — your deploy gate isn't coupled to Anthropic's uptime.

FAQ

Common questions from technical buyers.

How is this different from Playwright or Cypress?

Playwright and Cypress are selector-based: your dev writes click('.add-to-cart-btn') and the test breaks the moment the class name changes. UX Tester writes scenarios in plain English ("add a product to the basket, verify the total updates") and an AI drives the browser. UI churn doesn't break the test; only an actual user-facing regression does. We don't replace Playwright if your team is happily maintaining it — we replace the gap where teams know they should test and don't.

Does it work with my CI?

Yes — POST to your unique webhook URL with an API token to trigger a run from any CI (GitHub Actions, GitLab CI, CircleCI, Envoyer, Buddy). The run starts immediately and reports back via outbound webhook (Slack/Discord/Teams compatible) when it completes. You can also configure scheduled runs (hourly, daily, custom cron) for synthetic monitoring without a CI.

How do I write a scenario?

In plain English. "Open the homepage. Find the main call-to-action. Click it. Verify the next page loads without JavaScript console errors." When you add a site, we auto-detect your stack (Shopify, Magento, WooCommerce, Webflow, custom) and suggest 4-5 starter scenarios. Edit them, duplicate them, write your own from scratch. There's a "preview this scenario only" button so you can iterate cheaply before saving.

What stops a runaway test from costing me a fortune?

Three hard caps. Per-scenario: 20 turns and 40,000 output tokens. Per-run: a configurable cost ceiling (default $5). Per-tenant: a monthly hard ceiling — runs are suspended and you're emailed before any open-ended bill accrues. Tester-side failures (Anthropic API down, Chromium crash, network) default to a WARN verdict, never FAIL — so a flaky test never blocks your deploy because of our infrastructure.

Can I run it on staging before production?

Yes. Each site has a production URL and an optional UAT/staging URL. CI triggers can target either. The pattern most teams use: run on staging before the production deploy fires (gate), then a smoke run on production after deploy lands (verification). Both result-sets live in the same dashboard, so you can see if a pre-deploy PASS on UAT turned into a post-deploy FAIL on prod — which usually means an env-specific issue.

Will Cloudflare or bot detection block the tester?

Often, yes — strict bot protection (Cloudflare "Under Attack", DataDome, Akamai Bot Manager) will block any headless browser, including ours. The mitigation: we provide a signed test header you allowlist at your WAF or origin so production traffic stays protected from real bots while our tester gets through. It's a one-off rule, and we'll write the Cloudflare API call for you on signup if you give us a token.

Early access

Currently in private beta. First fifteen agencies are concierge-onboarded.

Tell us about your shop and we'll get you in. The first fifteen agencies through the door get hands-on setup — we configure your scenarios, wire your CI, and run the first month with you for free.

We reply within one business day.

No card. No commitment. We'll talk through what you're trying to test and tell you honestly whether this fits.