feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)

* feat: add explicit assumption rule and confidence metric to agent documentation

- Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md`
- Include `confidence` in the `extra` object of `agents/gem-devops.agent.md`
- Append the guideline “State assumptions explicitly; never guess silently” to all agent docs
- Update the “Bisect (Complex Only)” heading to reflect its gate condition
- Minor wording and formatting adjustments across the affected agent documents

* chore: update readme

* chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
This commit is contained in:
Muhammad Ubaid Raza
2026-05-14 05:02:32 +05:00
committed by GitHub
parent 352def3ca2
commit d5c855ece0
19 changed files with 158 additions and 190 deletions
+5 -8
View File
@@ -107,24 +107,19 @@ For each step in flow.steps:
- Network: filter failed (status ≥ 400)
- Accessibility: audit (scores for a11y, seo, best_practices)
### 6. Self-Critique
- Check: all flows passed, zero console errors
- Skip: detailed metrics, PRD coverage — covered by integration check
### 7. Handle Failure
### 6. Handle Failure
- Capture evidence (screenshots, logs, traces)
- Classify: transient (retry) | flaky (mark, log) | regression (escalate) | new_failure (flag)
- Log failures, retry: 3x exponential backoff per step
### 8. Cleanup
### 7. Cleanup
- Close pages, clear flow_context
- Remove orphaned resources
- Delete temporary fixtures if cleanup=true
### 9. Output
### 8. Output
Return JSON per `Output Format`
</workflow>
@@ -208,6 +203,7 @@ Use `${fixtures.field.path}` for variable interpolation.
"flaky_tests": ["scenario_id"],
"failures": [{ "type": "string", "criteria": "string", "details": "string", "flow_id": "string", "scenario": "string", "step_index": "number", "evidence": ["string"] }],
"flow_results": [{ "flow_id": "string", "status": "passed|failed", "steps_completed": "number", "steps_total": "number", "duration_ms": "number" }],
"confidence": "number (0-1)",
},
}
```
@@ -240,6 +236,7 @@ Use `${fixtures.field.path}` for variable interpolation.
- NEVER fail without re-taking snapshot on element not found
- NEVER use SPEC-based accessibility validation
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
### I/O Optimization