Skip to content
Stagent
jie-worldstatelabs/bug-huntpublic

Auto-scopes the whole app, hunts bugs with an adversarial subagent, fixes them surgically, and loops until verification is clean.

1
by jie-worldstatelabsupdated Apr 27, 20263 stages4 runs

Run in Claude Code

/stagent:start --flow=cloud://jie-worldstatelabs/bug-hunt <task_description>

Paste in Claude Code and replace <task_description>

Template blueprint

State machine

Loading state machine…

Click any stage above to view its instructions below.

Stagehunting

hunting.md

subagent · transitions: found → fixing, clean → complete

Stage: hunting

Runtime config (canonical): workflow.jsonstages.hunting

Purpose: auto-discover the project's detection suite (tests, lint, type-check, static analysis) across the entire repo, run every command, then perform an adversarial code review pass. Emit a structured bug list with severity, location, evidence, and fix direction — or declare the codebase clean. This stage records the exact detection commands it ran so verifying can replay them. Output artifact: write to the absolute path provided in your prompt Valid results this stage writes: found, clean

This file is the canonical protocol for the hunting stage. The main agent launches workflow-subagent with this file as the stage instructions; the subagent reads this file first before doing anything.

You are an adversarial bug hunter. The default scope is the whole app — auto-discover the project root and the detection toolchain rather than waiting on a scoping handoff. Your job is to find real defects, not rubber-stamp the codebase. Read the output artifact path and epoch from your prompt — never construct or hardcode paths.

Hunting Protocol

Step 1: Auto-Discover Scope

Treat the entire repository as in-scope.

  1. Resolve the project root: git rev-parse --show-toplevel from the working directory the prompt gives you. If that fails (no git), use the working directory itself.
  2. Inventory the top-level layout (ls -la, look for monorepo markers like packages/, apps/, crates/, services/).
  3. Identify the primary language(s) and tooling from manifest files:
    • package.json (npm / pnpm / yarn workspaces)
    • pyproject.toml, setup.cfg, requirements*.txt
    • go.mod, Cargo.toml, pubspec.yaml, Gemfile, composer.json, pom.xml, build.gradle
    • Makefile or justfile (note any test, lint, check targets)

Step 2: Auto-Discover Detection Commands

Build the detection command list from what the project actually has. Use this priority for each category — pick the first that applies:

CategoryDiscovery rule
Testspackage.json scripts.testnpm test; pyproject.toml/pytest.inipytest; go.modgo test ./...; Cargo.tomlcargo test; pubspec.yamlflutter test; Makefile test target → make test
Lintpackage.json scripts.lintnpm run lint; ESLint config → npx eslint .; ruff.toml/pyproject.toml [tool.ruff]ruff check .; flake8 config → flake8; golangci-lint config → golangci-lint run; Cargo.tomlcargo clippy --all-targets -- -D warnings
Type-checktsconfig.jsonnpx tsc --noEmit; mypy.ini/pyproject.toml [tool.mypy]mypy .; pyrightconfig.jsonnpx pyright; go.modgo vet ./...; Cargo.tomlcargo check --all-targets
Static analysisbandit.yamlbandit -r .; semgrep.yml/.semgrepsemgrep --config auto; staticcheck config → staticcheck ./... (skip if not installed)

If a category has no discoverable command, mark it none and move on. Do not invent commands the project does not already declare or have configured.

Step 3: Run Every Detection Command

For each non-none command:

bash
cd <project-root> && <command> 2>&1

Use a 3-minute timeout per command (timeout: 180000). Capture full output. Run every command even if earlier ones failed — you need a complete picture, not a short-circuit.

Step 4: Adversarial Code Review

Read representative source files across the repo (start with anything modified recently per git log --since="7 days ago" --name-only) and look for defects automated tools miss:

  • Correctness: logic errors, off-by-one, wrong operator, inverted condition
  • Error handling: swallowed exceptions, missing null/none checks, unhandled branches
  • Boundary conditions: empty inputs, max sizes, overflow, division by zero, race conditions
  • Resource management: leaks (handles, memory, subscriptions), missing cleanup
  • Security: injection, hardcoded secrets, unsafe deserialization, missing authz checks
  • Data integrity: mutation bugs, stale caches, inconsistent state
  • API contracts: callers passing wrong shapes, return types that lie about errors

Be adversarial — assume the code is wrong and look for evidence it's right.

Step 5: Classify Findings

For each finding, assign severity:

  • CRITICAL — broken functionality, security vulnerability, data loss risk
  • HIGH — significant logic error, missing error handling, likely crash
  • MEDIUM — edge-case bug, minor correctness issue, inadequate test
  • LOW — code smell, style, nitpick

Each finding needs: file:line, one-line description, severity, reproduction steps or evidence, and suggested fix direction (one sentence — the fixer will decide the actual change).

Step 6: Decide Verdict

Default severity gate is CRITICAL or HIGH.

  • clean — NO findings at CRITICAL or HIGH severity.
  • found — one or more findings at CRITICAL or HIGH severity.

If ambiguous, pick found and let the fixer/verifier sort it out.

Step 7: Write Hunt Report

Write the hunt report to the absolute output path in your prompt. The report MUST start with YAML frontmatter and MUST list every detection command verbatim under Detection Commands Run so verifying can replay them.

markdown
---
epoch: <epoch from your prompt>
result: found | clean
---
# Hunt Report

## Project Root
<absolute path discovered>

## Detection Commands Run
- `<cmd 1>` — <PASS / FAIL / N warnings>
- `<cmd 2>` — ...
- (list every command verbatim — `verifying` replays this exact list)

## Severity Gate
CRITICAL or HIGH

## Findings

### CRITICAL
- `<file:line>` — <description>
  - **Evidence:** <error output, reproduction, or reasoning>
  - **Fix direction:** <one sentence>

### HIGH
- ...

### MEDIUM
- ...

### LOW
- ...

## Summary
<one-line verdict and count at each severity>

Rules

  • Do NOT fix any bugs — that is the fixing stage's job.
  • Do NOT skip a detection command just because an earlier one failed.
  • Do NOT downgrade severity to force a clean verdict — honesty is the loop's only safeguard against shipping bugs.
  • ALWAYS list every detection command verbatim in the report — verifying depends on this list.
  • ALWAYS write the hunt report before returning.

Unrecoverable hunting issues

If the detection suite genuinely cannot run (missing toolchain, unbuildable project), still write the report with result: found and document the blocker as a CRITICAL finding describing the environment issue. Only the main agent can escalate via update-status.sh --status escalated; that's not your call.

workflow.json· raw config

workflow.json

drives the state machine above