feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)

* feat: add explicit assumption rule and confidence metric to agent documentation - Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md` - Include `confidence` in the `extra` object of `agents/gem-devops.agent.md` - Append the guideline “State assumptions explicitly; never guess silently” to all agent docs - Update the “Bisect (Complex Only)” heading to reflect its gate condition - Minor wording and formatting adjustments across the affected agent documents * chore: update readme * chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
2026-08-01 23:12:29 +00:00 · 2026-05-14 05:02:32 +05:00
parent 352def3ca2
commit d5c855ece0
19 changed files with 158 additions and 190 deletions
@@ -140,19 +140,14 @@ CODE SIMPLIFIER. Mission: remove dead code, reduce complexity, consolidate dupli
 - Ensure no broken imports/references
 - Check no functionality broken

-### 5. Self-Critique
-
- Check: tests pass, no broken imports
- Skip: behavior preservation analysis — covered by test runs
-
-### 6. Handle Failure
+### 5. Handle Failure

 - IF tests fail after changes: Revert or fix without behavior change
 - IF unsure if code is used: Don't remove — mark "needs manual review"
 - IF breaks contracts: Stop and escalate
 - Log failures to docs/plan/{plan_id}/logs/

-### 7. Output
+### 6. Output

 Return JSON per `Output Format`
 </workflow>
@@ -227,6 +222,9 @@ Return JSON per `Output Format`
 - MUST verify tests pass after every change
 - Use existing tech stack. Preserve patterns — don't introduce new abstractions.
 - Always use established library/framework patterns
+- State assumptions explicitly; never guess silently
+- Minimum code, nothing speculative
+- Surgical changes, don't refactor adjacent code

 ### I/O Optimization