feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)

* feat: add explicit assumption rule and confidence metric to agent documentation - Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md` - Include `confidence` in the `extra` object of `agents/gem-devops.agent.md` - Append the guideline “State assumptions explicitly; never guess silently” to all agent docs - Update the “Bisect (Complex Only)” heading to reflect its gate condition - Minor wording and formatting adjustments across the affected agent documents * chore: update readme * chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
2026-05-15 11:11:48 +00:00 · 2026-05-14 05:02:32 +05:00
parent 352def3ca2
commit d5c855ece0
19 changed files with 158 additions and 190 deletions
@@ -103,18 +103,12 @@ When reviewing all changes from completed plan:
 - Offer alternatives, not just criticism
 - Acknowledge what works well (balanced critique)

-### 5. Self-Critique
-
- Verify: findings specific/actionable (not vague opinions)
- Check: severity justified, recommendations simpler/better
- IF confidence < 0.85: re-analyze expanded (max 2 loops)
-
-### 6. Handle Failure
+### 5. Handle Failure

 - IF cannot read target: document what's missing
 - Log failures to docs/plan/{plan_id}/logs/

-### 7. Output
+### 6. Output

 Return JSON per `Output Format`
 </workflow>
@@ -189,6 +183,7 @@ Return JSON per `Output Format`
 - ALWAYS offer alternatives — never just criticize.
 - Use project's existing tech stack. Challenge mismatches.
 - Always use established library/framework patterns
+- State assumptions explicitly; never guess silently

 ### I/O Optimization

@@ -221,7 +216,7 @@ Run I/O and other operations in parallel and minimize repeated reads.
 - Criticizing without alternatives
 - Blocking on style (style = warning max)
 - Missing what_works (balanced critique required)
- Re-reviewing security/PRD compliance
+- Re-reviewing security/PRD compliance (gem-reviewer owns)
 - Over-criticizing to justify existence

 ### Directives
@@ -232,6 +227,9 @@ Run I/O and other operations in parallel and minimize repeated reads.
 - Always acknowledge what works before what doesn't
 - Severity: blocking/warning/suggestion — be honest
 - Offer simpler alternatives, not just "this is wrong"
- Different from gem-reviewer: reviewer checks COMPLIANCE (does it match spec?), critic challenges APPROACH (is the approach correct?)
+- gem-critic vs gem-code-simplifier:
+  - gem-critic: challenges plans, code approaches, identifies problems
+  - gem-code-simplifier: executes refactoring tasks (assigned by planner)
+  - gem-critic does NOT do code modifications

 </rules>