mirror of
https://github.com/github/awesome-copilot.git
synced 2026-05-15 11:11:48 +00:00
feat: [gem-team] Add confidence metric, optimize planner workflow (#1695)
* feat: add explicit assumption rule and confidence metric to agent documentation - Add `confidence` field (0‑1) to the output schema in `agents/gem-browser-tester.agent.md` - Include `confidence` in the `extra` object of `agents/gem-devops.agent.md` - Append the guideline “State assumptions explicitly; never guess silently” to all agent docs - Update the “Bisect (Complex Only)” heading to reflect its gate condition - Minor wording and formatting adjustments across the affected agent documents * chore: update readme * chore(release): Streamline agent documentation sections (remove self‑critique steps, renumber Handle Failure/Output)
This commit is contained in:
committed by
GitHub
parent
352def3ca2
commit
d5c855ece0
@@ -146,18 +146,13 @@ For each platform in task_definition.platforms:
|
||||
- Frame rate: iOS (Core Animation FPS), Android (`adb shell dumpsys gfxstats`)
|
||||
- Bundle size (JS/Flutter)
|
||||
|
||||
### 6. Self-Critique
|
||||
|
||||
- Check: all tests passed, zero crashes
|
||||
- Skip: performance, device farm — covered by integration check
|
||||
|
||||
### 7. Handle Failure
|
||||
### 6. Handle Failure
|
||||
|
||||
- Capture evidence (screenshots, videos, logs, crash reports)
|
||||
- Classify: transient (retry) | flaky (mark, log) | regression (escalate) | platform_specific | new_failure
|
||||
- Log failures, retry: 3x exponential backoff
|
||||
|
||||
### 8. Error Recovery
|
||||
### 7. Error Recovery
|
||||
|
||||
| Error | Recovery |
|
||||
| ---------------------- | ----------------------------------------------------------------------------------- |
|
||||
@@ -166,13 +161,13 @@ For each platform in task_definition.platforms:
|
||||
| Android build fail | Check Gradle, `./gradlew clean`, rebuild |
|
||||
| Simulator unresponsive | iOS: `xcrun simctl shutdown all && xcrun simctl boot all` / Android: `adb emu kill` |
|
||||
|
||||
### 9. Cleanup
|
||||
### 8. Cleanup
|
||||
|
||||
- Stop Metro if started
|
||||
- Close simulators/emulators if opened
|
||||
- Clear artifacts if `cleanup = true`
|
||||
|
||||
### 10. Output
|
||||
### 9. Output
|
||||
|
||||
Return JSON per `Output Format`
|
||||
</workflow>
|
||||
@@ -246,6 +241,7 @@ Return JSON per `Output Format`
|
||||
"extra": {
|
||||
"execution_details": { "platforms_tested": ["ios", "android"], "framework": "string", "tests_total": "number", "time_elapsed": "string" },
|
||||
"test_results": { "ios": { "total": "number", "passed": "number", "failed": "number", "skipped": "number" }, "android": {...} },
|
||||
"confidence": "number (0-1)",
|
||||
"performance_metrics": { "cold_start_ms": {...}, "memory_mb": {...}, "bundle_size_kb": "number" },
|
||||
"gesture_results": [{ "gesture_id": "string", "status": "passed|failed", "platform": "string" }],
|
||||
"push_notification_results": [{ "scenario_id": "string", "status": "passed|failed", "platform": "string" }],
|
||||
@@ -288,6 +284,7 @@ Return JSON per `Output Format`
|
||||
- NEVER skip app lifecycle testing
|
||||
- NEVER test simulator only if device farm required
|
||||
- Always use established library/framework patterns
|
||||
- State assumptions explicitly; never guess silently
|
||||
|
||||
### I/O Optimization
|
||||
|
||||
|
||||
Reference in New Issue
Block a user