Large-refactor gate that captures invariants as executable checks, decomposes the refactor into atomic rollback-able steps, applies and verifies each step (with up to 5 retries), then runs a full invariant audit at the end with auto-repair
1
by jie-worldstatelabsupdated Apr 27, 20264 stages0 runs
Purpose: capture the invariants the refactor must preserve as executable verification commands (test names, shell commands, metric queries) plus tolerance bounds, then confirm the baseline is green. Without an executable, currently-passing baseline, regressions cannot be distinguished from pre-existing brokenness — so this stage hard-stops if any invariant is currently FAIL.
Output artifact: write to the absolute path provided in your prompt
Valid results this stage writes:pending (interview / verification in progress), done (every invariant has a verification command and a confirmed PASS or explicit SKIPPED baseline)
This is an interruptible inline stage. The stop hook allows natural pauses for Q&A.
Immediately write the output artifact at the path shown in your I/O context with frontmatter result: pending and a stub body. This signals to the stop hook that the stage is in progress, so a natural pause for user Q&A is safe.
Read the baseline_state run file from the path provided in your I/O context. It contains the pre-refactor git state (backup_branch, head_sha, stash_ref, and a rollback_command). Quote those values back to the user before any interview begins — they MUST know which branch / stash will be used if rollback fires later.
psql -d app -tAc "SELECT count(*) FROM users WHERE email IS NULL" | grep -q '^0$'
tolerance — what counts as a pass. For perf invariants, an explicit numeric bound (e.g. p95 ≤ 200ms). For functional invariants, a strict equality.
baseline_result — PASS, FAIL, or SKIPPED. You MUST run each verification command at least once now to fill this in. SKIPPED is only valid when the command genuinely cannot run today (e.g. it depends on a service not in the dev env) AND the user explicitly accepts that gap.
If ANY I<n> baseline is FAIL, do not transition out of this stage. Surface the failing I-ID(s) to the user with the captured output. The user must either:
fix the baseline first (in a separate session — outside this workflow), then resume here, OR
explicitly remove that invariant from the list (because it's a known-broken thing the refactor isn't expected to preserve).
Either way, keep the artifact at result: pending and pause for user input. Do NOT silently downgrade FAIL to SKIPPED.
Once every invariant has baseline_result ∈ {PASS, SKIPPED} AND the user has explicitly confirmed the captured set is complete, overwrite the artifact with:
markdown
---
epoch: <epoch>
result: done
---
# Invariants — <Refactor Topic>
## Baseline State (from run_file)
- backup_branch: <value>
- head_sha: <value>
- stash_ref: <value or "none">
- rollback_command: <value>
## Refactor Scope
<file paths / module boundaries the user named>
## Invariants
| ID | Description | Verification Command | Tolerance | Baseline |
|----|-------------|----------------------|-----------|----------|
| I1 | <plain English> | <exact command / test name> | <bound> | PASS |
| I2 | ... | ... | ... | PASS |
| I3 | ... | ... | ... | SKIPPED — <reason user accepted> |
## Baseline Confirmation
All non-SKIPPED I-IDs ran cleanly at <ISO timestamp>. Output captured at: <path or inline below>.
<paste verbatim output of each I<n> verification command, or summarize if huge>
## Notes
<anything the planner needs — e.g. "I4 takes 8 minutes, batch into a smoke subset for per-step runs">
Writing the artifact with result: done is the only output. The SKILL.md main loop's step (e) reads the artifact and calls update-status.sh to advance — do NOT call it yourself from this stage file.
Do NOT skip running a verification command "because it looks fine." Every non-SKIPPED I-ID needs an actually-executed PASS in this epoch.
Do NOT auto-fix a failing baseline. That's out of scope for this workflow.
Do NOT ask the user more than 5 questions. If you need a 6th, you missed an answer in the repo scan — go re-read.
The baseline_state run file is captured ONCE at workflow setup. If apply_step later needs to roll back, it uses those exact values — so the values you read here MUST appear verbatim in your report.