jie-worldstatelabs/ui-eval-gatepublicUI-change workflow with mandatory Playwright-based visual evaluation — agent translates fuzzy goals into machine-verifiable assertions, scores 8 dimensions from real screenshots, loops until pass.
jie-worldstatelabs/ui-eval-gatepublicUI-change workflow with mandatory Playwright-based visual evaluation — agent translates fuzzy goals into machine-verifiable assertions, scores 8 dimensions from real screenshots, loops until pass.
/stagent:start --flow=cloud://jie-worldstatelabs/ui-eval-gate <task_description>Paste in Claude Code and replace <task_description>
Click any stage above to view its instructions below.
briefinginline· interruptible · transitions: approved → executing
Runtime config (canonical): workflow.json → stages.briefing
Purpose: translate the user's fuzzy UI goal into a machine-verifiable contract — repo context, human acceptance list, derived machine assertions, secrets path, signals, thresholds — so downstream stages can implement and evaluate against ground truth instead of vibes.
Output artifact: write to the absolute path provided in your I/O context.
Valid results this stage writes: pending (briefing in progress, awaiting user approval), approved (user has explicitly confirmed and pre-flight passes).
Write result: approved only after both gates clear.
</HARD-GATE>
This is an interruptible stage — the stop hook allows natural pauses for Q&A.
You are the main agent driving the briefing dialogue with the user. Read state.md for the current epoch. Immediately write the artifact at the path shown in your I/O context with result: pending so the stop hook knows the stage is in progress. Then run the three-phase dialogue below, iterating until the user approves and pre-flight passes; finally rewrite the artifact with result: approved.
The user stays in natural language. You do the translation.
Before asking anything, scan the project to ground every later question in real code. Use Glob, Grep, and Read only — do not run the app yet.
Extract:
package.json, top-level configs.@radix-ui, @mui, shadcn, tailwind.config.*, components/ui/.components/, app/, pages/, src/ — list the 5–10 most-edited or most-imported components.tailwind.config.* (colors, spacing, fontSize, radius, shadow), CSS custom properties in globals.css / index.css, theme files.Compress what you found into 4–6 lines you'll cite back when proposing options. Do not dump file lists at the user.
Use the Discover findings to design forced-choice questions on real stack-level options. Avoid abstract design talk.
Tactics:
rounded-2xl (b) 字重从 font-semibold 降到 font-medium (c) 主色饱和度降低 (d) 区块 padding 翻倍。哪几项贴近?"app/(marketing)/page.tsx 的 hero,还是整套首页?"Internally — and without dumping the table at the user — translate fuzzy descriptors into measurable axes using mappings like:
| Fuzzy descriptor | Measurable axes |
|---|---|
| 现代 / 高端 | radius ↑, shadow softer, font-weight ↓, saturation ↓, whitespace ↑ |
| 紧凑 / 高密度 | padding ↓, line-height ↓, font-size ↓ small step |
| 专业 / 商务 | serif headline OR neutral sans, saturation ↓, whitespace ↑ |
| 活泼 / 友好 | accent saturation ↑, radius ↑, illustration / emoji-tier accents |
| 极简 | palette → 2–3 hues, remove decorative borders, max-width content column |
Translate the user's selections + concrete numbers (e.g. "圆角 16px","主标题 32 → 28px") into a machine_assertions[] list of the shape:
verify_method is one of:
browser_evaluate — runs JS in the page, returns a value the evaluator compares to expected.vision — qualitative check the evaluator performs by looking at the screenshot it took (only when no DOM-measurable proxy exists, e.g. "logo placement feels balanced"). Use sparingly — every vision assertion is a partial escape hatch.Re-scan the repo for secret usage:
process.env.* references (in TS/JS) and equivalents..env.example, .env.local.example, env.d.ts.Build the secrets_required list — only key names, why needed, source hint. Never write a value.
Then pick a path with the user. Default: ~/.config/stagent/secrets/<suffix>.env (the workflow suffix is ui-eval-gate, but if the user has multiple projects they may prefer a project-scoped path). Confirm absolute path.
Pick a fill mode with the user:
Write them to the secrets file (still NEVER echo them in artifact body).Write a placeholder template (KEY=) to the secrets file, the user fills it themselves outside the chat, then types done here so you re-read and confirm presence (not values).Set secrets_status:
provided — every required key is present in the file (you may grep ^KEY=.+, but DO NOT print the value).not_required — there are no required secrets for this evaluation.template_pending is NOT allowed to leave briefing — block on the user.Inspect login surface and pick login.type:
none — target_url is publicly reachable.simple_form — username/password form on a login page; record {login_url, username_selector, password_selector, submit_selector, post_login_url_pattern} and read username/password from the secrets file at evaluator runtime.storage_state — OAuth / 2FA / captcha detected → instruct user to do a one-time manual login in a fresh browser, export storage_state.json (Playwright recipe: npx playwright codegen --save-storage=...), record absolute path in secrets file. Do NOT introduce LLM-driven browsers (Browser Use etc.) — keep evaluator deterministic.approved)Run these and report results inline. Any failure → fix or document → keep result: pending.
mcp__playwright__browser_navigate, browser_resize, browser_take_screenshot are loaded in this session. If not, instruct user to enable the Playwright MCP plugin and re-confirm before approving.curl -sS -o /dev/null -w "%{http_code}\n" "<target_url>" ⇒ 2xx/3xx. If it errors, ask the user whether to record dev_server_start_command for the evaluator to start, or block until the user starts it manually.test -f "$SECRETS_FILE" && test -r "$SECRETS_FILE". If secrets_required is non-empty, grep -c '^[A-Z_][A-Z0-9_]*=.\+$' "$SECRETS_FILE" ≥ count of required keys. Never print the values.Write the output artifact (use the current epoch from state.md):
---
epoch: <epoch>
result: pending # flip to `approved` only after user confirms AND pre-flight passes
---
# Briefing — <Topic>
## Target URL
<absolute http(s) URL>
## Dev Server Start Command
<shell command, or "null">
## Repo Context
- framework: <...>
- design_system: <...>
- component_inventory_summary: <one-line list>
- design_tokens_summary: <key tokens that matter for this change>
## Human Acceptance (user-approved)
- [ ] <plain-language fact 1, with concrete value/element>
- [ ] <plain-language fact 2>
## Machine Assertions (derived — do NOT show user)
```jsonc
[
{ "id": "...", "human_ref": "...", "verify_method": "browser_evaluate", "verify_args": {...}, "expected": "...", "viewport": "desktop" }
]
```
## Secrets
- secrets_file_path: `<absolute path>`
- secrets_required: `[{key, why_needed, source_hint}, ...]` # NO values
- secrets_status: `provided` | `not_required`
## Login
```jsonc
{ "type": "none" | "simple_form" | "storage_state", "details": { ... } }
```
## Signals (which conditional dimensions to score)
```jsonc
{ "lighthouse": false, "performance": false, "console_zero_tolerance": true }
```
## Threshold
```jsonc
{ "total_min": 60, "brief_adherence_min": 6 }
```
## Pre-flight Results
- Playwright MCP: ✅ / ❌ <details>
- target_url reachable: ✅ / ❌ <http code>
- secrets file: ✅ / ❌ <required-key count vs found>result: pending signals "briefing drafted but not yet approved or not yet pre-flight clean."
"Briefing saved to the session's briefing-report.md. Please review the Human Acceptance list and the Secrets plan, then confirm or request changes."
If the user requests changes, iterate inside the artifact body and the underlying machine_assertions — keep result: pending. Do NOT show the raw machine_assertions block to the user unless they explicitly ask; the human_acceptance list is what they audit.
Once the user explicitly approves AND every pre-flight item is ✅, edit the artifact: change result: pending → result: approved.
That is the only action needed here. The SKILL.md main loop's step (e) reads the artifact's result: and calls update-status.sh to advance the state machine — do NOT call it yourself from this stage file.
approved.machine_assertions block is for downstream stages, not for user review.vision-only items.drives the state machine above