jie-worldstatelabs/perf-gatepublicProfile, propose hotspots, fix one, re-measure, accept only if delta clears the threshold and is statistically significant; auto-revert on reject and try the next hotspot.
jie-worldstatelabs/perf-gatepublicProfile, propose hotspots, fix one, re-measure, accept only if delta clears the threshold and is statistically significant; auto-revert on reject and try the next hotspot.
/stagent:start --flow=cloud://jie-worldstatelabs/perf-gate <task_description>Paste in Claude Code and replace <task_description>
Click any stage above to view its instructions below.
scopinginline· interruptible · transitions: approved → baseline
Runtime config (canonical): workflow.json → stages.scoping
Purpose: lock the measurement contract with the user — domain, repro script, primary + watchdog metrics, threshold, profiler, and N (warmup + measurement runs). Validate the repro is deterministic and runnable before any fix work begins.
Output artifact: write to the absolute path provided in your prompt
Valid results this stage writes: pending (contract drafted, awaiting user approval), approved (user explicitly confirmed)
This is an interruptible stage — the stop hook allows natural pauses for Q&A.
Note: by the time you read this,
state.mdalready exists withstatus: scopingand the currentepochis set. Readstate.mdfor the epoch, then immediately write the artifact at the path shown in your I/O context withresult: pendingso the stop hook knows the stage is in progress. Iterate, then overwrite withresult: approvedwhen the user signs off.
Scan the project worktree to detect stack:
package.json → Node.js (look for "scripts": {"start", "build", "test", "bench"} and check dependencies for vite, next, webpack, clinic, hyperfine-as-script, etc.)pyproject.toml / requirements.txt → Python (look for pytest-benchmark, scipy, py-spy)Cargo.toml → Rust (look for [[bench]], criterion, flamegraph)go.mod → Go (go test -bench, pprof)pubspec.yaml → Flutter / DartMakefile → check for bench, profile, test targetsbench/, benchmarks/, perf/ directories — surface them; reuse if compatibleRecord findings in repo_context = {language, runtime, package_manager, existing_benchmarks}.
Cover at minimum:
Domain — pick one: api_latency / cli_startup / render_fps / build_time / bundle_size / memory_peak / db_query / custom. If custom, ask for a one-line description of what's being measured.
Repro script — what command reproduces the workload? Constraints:
cd prefix)Primary metric — {name, unit, lower_is_better}. Examples: {name:"p95_latency_ms", unit:"ms", lower_is_better:true}, {name:"fps_min", unit:"fps", lower_is_better:false}. Exactly ONE primary.
Watchdog metrics —副指标 list, each {name, unit, lower_is_better, max_regression_pct}. Defaults to suggest based on domain:
api_latency / cli_startup → memory_peak (10%)render_fps → frame_p99_ms (15%), memory_peak (10%)build_time → bundle_size (5%)bundle_size → build_time (10%)memory_peak → p95_latency_ms or domain-equivalent throughput (10%)db_query → lock_wait_ms (10%)
User can add/remove. Watchdogs are optional; default empty if user opts out.Threshold — {type, pct_min, absolute_min, significance_required}. Default {type:"pct", pct_min:20, significance_required:true}. Confirm or adjust.
Run counts — repro_warmup_runs (default 1) and repro_measurement_runs (default 5, min 3). Confirm or adjust based on repro cost.
Profiler — recommend per stack from this table, then let the user pick:
| Domain / stack | Recommended tool |
|---|---|
| Node.js CPU-bound | node --prof + clinic flame / clinic doctor |
| Node.js startup | node --cpu-prof + hyperfine for wall time |
| Browser render / FPS | Chrome DevTools mcp__plugin_chrome-devtools-mcp_chrome-devtools__performance_start_trace |
| Python | py-spy record -o profile.svg -- <cmd> or cProfile + snakeviz |
| Go | go test -cpuprofile=cpu.prof or runtime/pprof |
| Rust | cargo flamegraph or samply |
| Bundle size | vite-bundle-visualizer / source-map-explorer |
| CLI wall time | hyperfine --warmup 1 --runs N -- '<cmd>' |
| DB query | EXPLAIN ANALYZE + pg_stat_statements |
| Memory peak (C/C++) | valgrind --tool=massif |
| Memory peak (Node) | node --heap-prof |
| Memory peak (Python) | tracemalloc |
| Generic macOS | Instruments (Time Profiler / Allocations) |
Capture profiler = {tool, command, output_path} — command is the full shell command including the output file path.
Significance test — default mann_whitney (uses scipy if importable; falls back to mean_2sigma). Ask only if the user has a strong preference.
Secrets — if the repro needs credentials/API keys, ask for a secrets_file_path (path reference only — sensitive values stay local, never enter cloud artifacts). Otherwise leave blank.
Stop asking when you have enough; usually 4–7 turns covers it.
Run the proposed repro_script once to confirm it works end-to-end. You don't need final timing data here — you're checking:
If the repro fails, iterate with the user. Do NOT advance.
Write the output artifact at the path in your I/O context:
---
epoch: <epoch from state.md>
result: pending
---
# Scoping Report: <topic>
## Repo Context
- language: <...>
- runtime: <...>
- package_manager: <...>
- existing_benchmarks: <path or "none">
## Contract
### Domain
<one of the enum values>
### Repro
- script: `<command>`
- warmup_runs: <N>
- measurement_runs: <N>
### Metrics
- primary: `{name, unit, lower_is_better}`
- watchdogs:
- `{name, unit, lower_is_better, max_regression_pct}`
- ...
### Threshold
- type: <pct|absolute|both>
- pct_min: <number>
- absolute_min: <number>
- significance_required: <bool>
### Significance Test
<mann_whitney | mean_2sigma>
### Profiler
- tool: <name>
- command: `<full shell command writing to output_path>`
- output_path: `<relative or absolute path>`
### Secrets
<path_reference or "none">
## Pre-flight result
<exit code, observed metric value, determinism notes>"Contract drafted. Please review and confirm to start measurement, or request changes."
If the user requests changes, iterate the body. Keep result: pending.
Once the user explicitly approves, edit the artifact: change result: pending → result: approved. That is the only action needed here. The main loop reads the artifact's result: and calls update-status.sh to advance — do NOT call it yourself.
drives the state machine above