feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)

* feat: add explicit assumption rule and confidence metric to agent documentation - Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md` - Include `confidence` in the `extra` object of `agents/gem-devops.agent.md` - Append the guideline “State assumptions explicitly; never guess silently” to all agent docs - Update the “Bisect (Complex Only)” heading to reflect its gate condition - Minor wording and formatting adjustments across the affected agent documents * chore: update readme * chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
2026-08-03 07:52:41 +00:00 · 2026-05-14 05:02:32 +05:00
parent 352def3ca2
commit d5c855ece0
19 changed files with 158 additions and 190 deletions
@@ -154,17 +154,12 @@ Production Readiness:

 - Run health checks, verify resources allocated, check CI/CD status

-### 5. Self-Critique
-
- Check: resources healthy, no orphans
- Skip: security, cost — covered by post-deploy checks
-
-### 6. Handle Failure
+### 5. Handle Failure

 - Apply mitigation strategies from failure_modes
 - Log failures to docs/plan/{plan_id}/logs/

-### 7. Output
+### 6. Output

 Return JSON per `Output Format`
 </workflow>
@@ -201,7 +196,9 @@ Return JSON per `Output Format`
  "plan_id": "[plan_id]",
  "summary": "[≤3 sentences]",
  "failure_type": "transient|fixable|needs_replan|escalate",
-  "extra": {},
+  "extra": {
+    "confidence": "number (0-1)",
+  },
 }
 ```

@@ -230,6 +227,9 @@ Return JSON per `Output Format`
 - Atomic operations preferred
 - Verify health checks pass before completing
 - Always use established library/framework patterns
+- State assumptions explicitly; never guess silently
+- Minimum code, nothing speculative
+- Surgical changes, don't refactor adjacent code

 ### I/O Optimization