playwright-spec-for-AI-Agent

AI-assisted live staging QA for web apps that already have Playwright specs.

Quick start · Configuration · Annotations · npm package · Report issue

This tool reads Playwright *.spec.ts files, extracts @qa-scenario intent, turns it into structured QA scenarios, and uses Hermes Agent to validate the same intent against a staging page.

Hermes Agent is used here as the adapter layer for multiple agents. The CLI keeps the project-specific QA flow stable, while Hermes handles agent execution, browser access, model calls, and judgment/review steps through its agent adapters.

spec → abstract-ai → judge → review → slack (optional)

Why this exists
What it does
Command flow
Recommended workflow
Quick start
Configuration
Annotations
Output files
Prerequisites
Limits

Why this exists

Deterministic Playwright tests are best kept in CI. They are fast, stable, repeatable, and good at checking mocked UI states.

But mocked E2E tests do not fully answer whether a staging or production-like page still behaves correctly with real data, real routing, real copy, real DOM state, and real deployment conditions.

Production is non-deterministic. Data changes, feature flags drift, copy changes, backend latency varies, third-party services fail, and the visible UI can differ between runs. This library exists because live QA needs judgment and evidence, not only deterministic assertions.

playwright-spec-for-AI-Agent bridges that gap:

keep Playwright specs as the source of QA intent
avoid running brittle Playwright flows directly against staging
let an AI agent inspect the live page and judge whether the user-facing scenario still holds
escalate ambiguous results as manual_review

It is a live staging/production-like judgment layer, not a replacement for normal tests.

What it does

The CLI:

Parses Playwright specs with @qa-scenario comments.
Writes structured QA scenario artifacts.
Asks Hermes Agent to abstract the scenario into a live QA plan.
Uses Hermes Agent to log into staging and inspect the target page.
Produces pass, fail, manual_review, or skip.
Optionally posts fail/manual-review results to Slack.

Command flow

spec -> abstract-ai -> judge -> review -> slack (optional)

Command	Purpose
`spec`	Parse `*.spec.ts` files into QA scenario JSON.
`abstract-ai`	Ask Hermes to write a live Given/When/Then plan.
`judge`	Ask Hermes to inspect staging and judge the scenario.
`review`	Ask Hermes to review judgment quality without browsing.
`slack`	Send fail/manual-review verdicts to Slack.
`nightly`	Run the full pipeline.

Recommended workflow

Stage	Check	Purpose
PR	Mocked Playwright E2E	Stable UI regression checks.
PR / Nightly	API contract tests	Prevent mock/API drift.
Nightly	`playwright-spec-for-ai-agent nightly`	AI-assisted live staging judgment.
Release	Selected live scenarios	Human or AI-assisted smoke validation.

Mocked E2E = stable UI-state verification
API contract tests = real API schema guardrail
playwright-spec-for-AI-Agent = live staging QA judgment

Quick start

Run from your app repo root, where Playwright specs live.

AI agent install prompt:

Install and wire up playwright-spec-for-ai-agent in this app repo.

Reference README:
https://www.npmjs.com/package/playwright-spec-for-ai-agent

Use the README above as the source of truth for install steps, config shape, annotations, live policies, and safety rules.

Tasks:
1. Check where Playwright specs live and identify one page to start with.
2. Add playwright-spec-for-ai-agent as a dev dependency:
   npm install -D playwright-spec-for-ai-agent
3. Add package scripts for qa:spec, qa:judge, qa:slack, and qa:nightly.
4. Create playwright-spec-for-ai-agent.config.mjs with this app's specDir, staging.baseUrl, and per-page pageUrl or targetPath.
5. Add STAGING_QA_EMAIL and STAGING_QA_PASSWORD to the env setup or document them for CI secrets when the page requires login (base URL can live in config instead of STAGING_QA_BASE_URL).
6. Add file-level @qa-page and @qa-scenario annotations at the top of the first spec file.
7. Add @qa-live-policy above each test, and @qa-fixture above each test that uses a fixture file.
8. Run qa:spec for the selected page and show me the generated QA artifact paths.

Do not run destructive staging flows. Use safe-interaction-no-confirm, subscription-mutation, auth-mock, or skip when needed.

npx playwright-spec-for-ai-agent spec --page=dashboard

Install for team usage:

npm install -D playwright-spec-for-ai-agent

Add scripts:

{
  "scripts": {
    "qa:spec": "playwright-spec-for-ai-agent spec",
    "qa:judge": "playwright-spec-for-ai-agent judge",
    "qa:slack": "playwright-spec-for-ai-agent slack",
    "qa:nightly": "playwright-spec-for-ai-agent nightly"
  }
}

Pass CLI flags after --:

npm run qa:spec -- --page=billing

Run a page pipeline:

npx playwright-spec-for-ai-agent spec --page=pricing
npx playwright-spec-for-ai-agent abstract-ai --page=pricing
npx playwright-spec-for-ai-agent judge --page=pricing
npx playwright-spec-for-ai-agent review --page=pricing

When pages.pricing.targetPath (or pageUrl) is set in config, judge does not need --target-path. Override with --target-path=/custom when needed.

Run nightly with Slack:

npx playwright-spec-for-ai-agent nightly --page=pricing --with-slack --non-interactive

Configuration

Copy the example config into your app repo:

cp node_modules/playwright-spec-for-ai-agent/playwright-spec-for-ai-agent.config.example.mjs playwright-spec-for-ai-agent.config.mjs

Example:

export default {
  paths: {
    specDir: "tests/e2e/{page}",
  },
  staging: {
    baseUrl: "https://staging.your-app.com",
    loginPath: "/login",
    authRequired: true,
    expectedSubscriptionStatus: "INACTIVE",
    expectedPlan: "BASIC",
    accountNotes: "QA account on staging — do not mutate billing",
  },
  pages: {
    dashboard: {
      pageUrl: "https://staging.your-app.com/dashboard",
      expectedSubscriptionStatus: "ACTIVE",
    },
    billing: {
      targetPath: "/settings/billing",
    },
    pricing: {
      targetPath: "/pricing",
      authRequired: false,
    },
  },
};

Field	Purpose
`paths.specDir`	Where Playwright specs for `{page}` live.
`staging.baseUrl`	Staging origin used by `judge` (unless env/CLI overrides).
`staging.loginPath`	Login path relative to `baseUrl`.
`staging.authRequired`	Set `false` when pages can be judged without logging in.
`staging.expectedSubscriptionStatus`	Default `@qa-scenario` expectation for Hermes.
`pages.{page}.pageUrl`	Full URL Hermes opens for that page (highest priority after CLI).
`pages.{page}.targetPath`	Path joined with `staging.baseUrl` when `pageUrl` is not set.
`pages.{page}.authRequired`	Per-page override for public or no-login pages.
`pages.{page}.expectedSubscriptionStatus`	Per-page override of staging account expectations.

Legacy targetPaths.{page} still works; prefer pages.{page}.targetPath or pageUrl for new configs.

Output artifacts default to src/page/{page}/__QA__/. Override with paths.outputDir, pages.{page}.outputDir, or --output-dir=.

Set staging credentials with environment variables:

[email protected]
STAGING_QA_PASSWORD=your-staging-password

STAGING_QA_BASE_URL and --base-url= still override staging.baseUrl when set.

Prefer environment variables or CI secrets over --password=..., because shell history can leak CLI flags.

Login fields

When authRequired is true (the default), Hermes receives loginPath, STAGING_QA_EMAIL, and STAGING_QA_PASSWORD, then logs in through the browser before judging the target page. Make the login form easy to identify with normal accessible markup plus stable QA attributes:

<label for="qa-login-email">Email</label>
<input
  id="qa-login-email"
  data-qa="login-email"
  name="email"
  type="email"
  autocomplete="username"
/>

<label for="qa-login-password">Password</label>
<input
  id="qa-login-password"
  data-qa="login-password"
  name="password"
  type="password"
  autocomplete="current-password"
/>

<button type="submit">Log in</button>

The important tags are the email input (type="email", name="email", autocomplete="username", optional data-qa="login-email"), the password input (type="password", name="password", autocomplete="current-password", optional data-qa="login-password"), and a visible submit button. The library does not require a hard-coded selector, but these attributes make agent login much more reliable.

If the target page does not require login, set authRequired to false and do not send staging credentials:

export default {
  staging: {
    baseUrl: "https://staging.your-app.com",
    authRequired: false,
  },
  pages: {
    pricing: {
      targetPath: "/pricing",
      authRequired: false,
    },
  },
};

You can also pass it for a single run:

npx playwright-spec-for-ai-agent judge --page=pricing --auth-required=false

Or in CI:

STAGING_QA_AUTH_REQUIRED=false npx playwright-spec-for-ai-agent judge --page=pricing --non-interactive

Interactive `judge`

When stdin is a TTY and CI is not set, judge prompts for credentials and target confirmation before browsing.

If config already defines the target URL, the prompt shows that URL. Answer Y to proceed, or n to enter a different full URL or path (for example /ko or https://staging.your-app.com/ko).

Use --non-interactive, --yes, or -y to skip prompts (required for CI and nightly).

Annotations

Add annotations directly inside Playwright spec files.

Supported annotations:

Annotation	Scope	Purpose
`// @qa-page: billing`	File	Optional page id override.
`// @qa-scenario: ACTIVE`	File	Required QA scenario id or intent label.
`// @qa-live-skip: true`	File	Skip this scenario in live QA.
`// @qa-always-run: true`	File	Keep the scenario eligible even when default filtering would skip it.
`// @qa-fixture: avatar=tests/fixtures/qa-avatar.png`	`test`	Name a fixture file used by this specific test. Add it above each test that needs the fixture.
`// @qa-live-policy: readonly`	`test` or `describe`	Tell live QA how safely this test can be judged. Add it above each test; use `describe` only when every child test shares the same policy.

File-level annotations should be placed at the top of the spec file, before imports:

// @qa-page: billing
// @qa-scenario: ACTIVE
// @qa-live-skip: true
// @qa-always-run: true

import { expect, test } from "@playwright/test";

Supported @qa-live-policy values:

Value	Meaning
`readonly`	Safe read-only verification.
`safe-interaction`	Safe interaction is allowed.
`safe-interaction-no-confirm`	Interaction is allowed, but final confirmation/destructive submit should not be clicked.
`mock-judgment`	Original test depends on mocked API state; Hermes should judge whether live UI reasonably matches intent.
`subscription-mutation`	Subscription or billing mutation; block live execution and treat as unsafe for staging automation.
`auth-mock`	Auth is mocked in the Playwright test; block direct live execution.
`skip`	Explicitly skip live QA for this test.

Minimal annotation:

// @qa-scenario: A billing user can see the inactive subscription state

Test-level annotations:

// Put these above each test.
// @qa-live-policy: readonly
// @qa-fixture: avatar=tests/fixtures/qa-avatar.png

Example Playwright spec:

// @qa-page: billing
// @qa-scenario: A billing user can see the inactive subscription state

import { expect, test } from "@playwright/test";

// @qa-live-policy: readonly
// @qa-fixture: avatar=tests/fixtures/qa-avatar.png
test("shows inactive billing state", async ({ page }) => {
  await page.goto("/settings/billing");

  await expect(page.getByRole("heading", { name: "Billing" })).toBeVisible();
  await expect(page.getByText("Inactive subscription")).toBeVisible();
  await expect(page.getByRole("button", { name: "Upgrade" })).toBeVisible();
});

Example with a live policy:

// @qa-page: billing
// @qa-scenario: CANCEL_SUBSCRIPTION

import { expect, test } from "@playwright/test";

test.describe("Billing cancellation", () => {
  // @qa-live-policy: safe-interaction-no-confirm
  test("opens the cancellation confirmation dialog", async ({ page }) => {
    await page.goto("/settings/billing");
    await page.getByRole("button", { name: "Cancel subscription" }).click();

    await expect(page.getByRole("dialog")).toBeVisible();
    await expect(
      page.getByRole("button", { name: "Confirm cancellation" }),
    ).toBeVisible();
  });
});

The CLI reads the spec as source material. It does not run Playwright against staging.

Output files

Default paths:

src/page/{page}/__tests__/*.spec.ts
src/page/{page}/__QA__/

Common output files:

{page}-qa-spec.json
{page}-qa-spec-abstracted.json
{page}-qa-spec-live.json
{page}-qa-spec-live.md
{page}-qa-judge-plan.md
{page}-hermes-judgment.json
{page}-hermes-judgment.md
{page}-hermes-raw-output.txt

Prerequisites

Node.js 20+
Playwright specs in the app repo
Hermes Agent installed and configured
Staging credentials for judge and nightly when authRequired is not false

Install Hermes Agent:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh \
  | bash -s -- --skip-setup --skip-browser

Hermes needs an inference model in ~/.hermes/config.yaml or HERMES_INFERENCE_MODEL.

Limits

This tool is not:

a replacement for deterministic Playwright CI
a replacement for API contract tests
a deterministic production test runner
safe for destructive flows unless explicitly marked as skipped or safe
guaranteed to catch backend regressions that do not affect visible UI, DOM, screenshots, navigation, or user-facing copy
a full QA engineer replacement

It automates the first-pass live staging judgment layer and escalates uncertain cases.

playwright-spec-for-AI-Agent

playwright-spec-for-AI-Agent

Table of Contents

Why this exists

What it does

Command flow

Recommended workflow

Quick start

Configuration

Login fields

Interactive `judge`

Annotations

Output files

Prerequisites

Limits

Reviews (0)

playwright-spec-for-AI-Agent

Table of Contents

Why this exists

What it does

Command flow

Recommended workflow

Quick start

Configuration

Login fields

Interactive judge

Annotations

Output files

Prerequisites

Limits

Reviews (0)

Interactive `judge`