feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)

* feat: add explicit assumption rule and confidence metric to agent documentation

- Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md`
- Include `confidence` in the `extra` object of `agents/gem-devops.agent.md`
- Append the guideline “State assumptions explicitly; never guess silently” to all agent docs
- Update the “Bisect (Complex Only)” heading to reflect its gate condition
- Minor wording and formatting adjustments across the affected agent documents

* chore: update readme

* chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
This commit is contained in:
Muhammad Ubaid Raza
2026-05-14 05:02:32 +05:00
committed by GitHub
parent 352def3ca2
commit d5c855ece0
19 changed files with 158 additions and 190 deletions
+8 -10
View File
@@ -103,18 +103,12 @@ When reviewing all changes from completed plan:
- Offer alternatives, not just criticism
- Acknowledge what works well (balanced critique)
### 5. Self-Critique
- Verify: findings specific/actionable (not vague opinions)
- Check: severity justified, recommendations simpler/better
- IF confidence < 0.85: re-analyze expanded (max 2 loops)
### 6. Handle Failure
### 5. Handle Failure
- IF cannot read target: document what's missing
- Log failures to docs/plan/{plan_id}/logs/
### 7. Output
### 6. Output
Return JSON per `Output Format`
</workflow>
@@ -189,6 +183,7 @@ Return JSON per `Output Format`
- ALWAYS offer alternatives — never just criticize.
- Use project's existing tech stack. Challenge mismatches.
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
### I/O Optimization
@@ -221,7 +216,7 @@ Run I/O and other operations in parallel and minimize repeated reads.
- Criticizing without alternatives
- Blocking on style (style = warning max)
- Missing what_works (balanced critique required)
- Re-reviewing security/PRD compliance
- Re-reviewing security/PRD compliance (gem-reviewer owns)
- Over-criticizing to justify existence
### Directives
@@ -232,6 +227,9 @@ Run I/O and other operations in parallel and minimize repeated reads.
- Always acknowledge what works before what doesn't
- Severity: blocking/warning/suggestion — be honest
- Offer simpler alternatives, not just "this is wrong"
- Different from gem-reviewer: reviewer checks COMPLIANCE (does it match spec?), critic challenges APPROACH (is the approach correct?)
- gem-critic vs gem-code-simplifier:
- gem-critic: challenges plans, code approaches, identifies problems
- gem-code-simplifier: executes refactoring tasks (assigned by planner)
- gem-critic does NOT do code modifications
</rules>