Skip to content
Stagent
jie-worldstatelabs/invariant-safe-refactorpublic

Large-refactor gate that captures invariants as executable checks, decomposes the refactor into atomic rollback-able steps, applies and verifies each step (with up to 5 retries), then runs a full invariant audit at the end with auto-repair

1
by jie-worldstatelabsupdated Apr 27, 20264 stages0 runs

Run in Claude Code

/stagent:start --flow=cloud://jie-worldstatelabs/invariant-safe-refactor <task_description>

Paste in Claude Code and replace <task_description>

Template blueprint

State machine

Loading state machine…

Click any stage above to view its instructions below.

Stagecapture_invariants

capture_invariants.md

inline· interruptible · transitions: done → plan_steps

Stage: capture_invariants

Runtime config (canonical): workflow.jsonstages.capture_invariants

Purpose: capture the invariants the refactor must preserve as executable verification commands (test names, shell commands, metric queries) plus tolerance bounds, then confirm the baseline is green. Without an executable, currently-passing baseline, regressions cannot be distinguished from pre-existing brokenness — so this stage hard-stops if any invariant is currently FAIL. Output artifact: write to the absolute path provided in your prompt Valid results this stage writes: pending (interview / verification in progress), done (every invariant has a verification command and a confirmed PASS or explicit SKIPPED baseline)

This is an interruptible inline stage. The stop hook allows natural pauses for Q&A.

Protocol

You are the main agent. Read state.md for the current epoch. Then:

1. Pre-write the artifact with result: pending

Immediately write the output artifact at the path shown in your I/O context with frontmatter result: pending and a stub body. This signals to the stop hook that the stage is in progress, so a natural pause for user Q&A is safe.

2. Read the run file

Read the baseline_state run file from the path provided in your I/O context. It contains the pre-refactor git state (backup_branch, head_sha, stash_ref, and a rollback_command). Quote those values back to the user before any interview begins — they MUST know which branch / stash will be used if rollback fires later.

3. Explore the repo

Do not ask the user about things you can read. Look at, at minimum:

  • The existing test suite — note the test runner, conventions, and any tagged subsets (smoke, slow, integration).
  • Any contract / schema files (OpenAPI, GraphQL, protobuf, JSON schemas, DB migrations).
  • Performance budgets, SLO files, load-test scripts.
  • Monitoring / observability config (alerts, metric definitions) the user is known to depend on.
  • Existing CI config to see what counts as "green" today.

4. Interview the user (cap 5 questions)

Inline Q&A — one question per message, prefer multiple choice when possible. Cover:

  1. Refactor scope — which files / modules / package boundaries are in play.
  2. Must-preserve behaviors — public APIs, response shapes, side effects that downstream consumers rely on.
  3. Observability hooks — log lines, metrics, traces other systems consume.
  4. Performance SLOs — latency budgets, throughput targets, memory ceilings to hold.
  5. Data integrity rules — invariants over persisted state (uniqueness, monotonic counters, schema-level constraints).

Stop asking once you have enough. Five is a cap, not a target.

5. Convert each invariant into an executable check

For every invariant the user names AND every implicit one your repo scan turned up, record an I<n> row with:

  • idI1, I2, …
  • description — one line in plain English.
  • verification_command — a concrete shell command, test name, or metric query that returns success/failure deterministically. Examples:
    • npm test -- --testNamePattern='UserService.findByEmail returns 404 on miss'
    • pytest tests/integration/test_billing_invariants.py::test_no_double_charge
    • curl -fsS http://localhost:8080/api/v1/users/1 | jq -e '.id == 1'
    • psql -d app -tAc "SELECT count(*) FROM users WHERE email IS NULL" | grep -q '^0$'
  • tolerance — what counts as a pass. For perf invariants, an explicit numeric bound (e.g. p95 ≤ 200ms). For functional invariants, a strict equality.
  • baseline_resultPASS, FAIL, or SKIPPED. You MUST run each verification command at least once now to fill this in. SKIPPED is only valid when the command genuinely cannot run today (e.g. it depends on a service not in the dev env) AND the user explicitly accepts that gap.

6. Hard-stop on a red baseline

If ANY I<n> baseline is FAIL, do not transition out of this stage. Surface the failing I-ID(s) to the user with the captured output. The user must either:

  • fix the baseline first (in a separate session — outside this workflow), then resume here, OR
  • explicitly remove that invariant from the list (because it's a known-broken thing the refactor isn't expected to preserve).

Either way, keep the artifact at result: pending and pause for user input. Do NOT silently downgrade FAIL to SKIPPED.

7. Finalize

Once every invariant has baseline_result ∈ {PASS, SKIPPED} AND the user has explicitly confirmed the captured set is complete, overwrite the artifact with:

markdown
---
epoch: <epoch>
result: done
---
# Invariants — <Refactor Topic>

## Baseline State (from run_file)
- backup_branch: <value>
- head_sha: <value>
- stash_ref: <value or "none">
- rollback_command: <value>

## Refactor Scope
<file paths / module boundaries the user named>

## Invariants

| ID | Description | Verification Command | Tolerance | Baseline |
|----|-------------|----------------------|-----------|----------|
| I1 | <plain English> | <exact command / test name> | <bound> | PASS |
| I2 | ... | ... | ... | PASS |
| I3 | ... | ... | ... | SKIPPED — <reason user accepted> |

## Baseline Confirmation
All non-SKIPPED I-IDs ran cleanly at <ISO timestamp>. Output captured at: <path or inline below>.

<paste verbatim output of each I<n> verification command, or summarize if huge>

## Notes
<anything the planner needs — e.g. "I4 takes 8 minutes, batch into a smoke subset for per-step runs">

8. Done

Writing the artifact with result: done is the only output. The SKILL.md main loop's step (e) reads the artifact and calls update-status.sh to advance — do NOT call it yourself from this stage file.

Rules

  • Do NOT skip running a verification command "because it looks fine." Every non-SKIPPED I-ID needs an actually-executed PASS in this epoch.
  • Do NOT auto-fix a failing baseline. That's out of scope for this workflow.
  • Do NOT ask the user more than 5 questions. If you need a 6th, you missed an answer in the repo scan — go re-read.
  • The baseline_state run file is captured ONCE at workflow setup. If apply_step later needs to roll back, it uses those exact values — so the values you read here MUST appear verbatim in your report.
workflow.json· raw config

workflow.json

drives the state machine above