From 4dea29454734cdec032ee87832892a476e95583c Mon Sep 17 00:00:00 2001 From: Muhammad Ubaid Raza Date: Tue, 24 Feb 2026 19:56:16 +0500 Subject: [PATCH] chore: Add evidence to browser tester --- agents/gem-browser-tester.agent.md | 29 ++++++++++++++++-------- agents/gem-devops.agent.md | 6 ++--- agents/gem-documentation-writer.agent.md | 4 ++-- agents/gem-implementer.agent.md | 14 +++++++----- agents/gem-planner.agent.md | 4 ++-- agents/gem-researcher.agent.md | 4 ++-- agents/gem-reviewer.agent.md | 4 ++-- 7 files changed, 38 insertions(+), 27 deletions(-) diff --git a/agents/gem-browser-tester.agent.md b/agents/gem-browser-tester.agent.md index 945f5f57..61dd8249 100644 --- a/agents/gem-browser-tester.agent.md +++ b/agents/gem-browser-tester.agent.md @@ -15,11 +15,14 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili -- Initialize: Set up tool registry (navigate, click, type, snapshot, wait) with consistent error handling and evidence capture. Track tabs with UUIDs for multi-tab flows. Identify plan_id, task_def. Map validation_matrix to scenarios. -- Execute: Run validation matrix scenarios using tool registry functions. Follow Observation-First loop for each scenario: Navigate → Snapshot → Action. Verify UI state after each step. -- Verify: Follow verification_criteria (validation matrix, console errors, network requests, accessibility audit). +- Initialize: Identify plan_id, task_def. Map scenarios. +- Execute: Run scenarios iteratively using available browser tools. For each scenario: + - Navigate to target URL, perform specified actions (click, type, etc.) using preferred browser tools. + - After each scenario, verify outcomes against expected results. + - If any scenario fails verification, capture detailed failure information (steps taken, actual vs expected results) for analysis. +- Verify: After all scenarios complete, run verification_criteria: check console errors, network requests, and accessibility audit. - Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. -- Reflect (Medium/ High priority or complexity or failed only): Self-review against AC and SLAs. +- Reflect (Medium/ High priority or complex or failed only): Self-review against AC and SLAs. - Cleanup: Close browser sessions. - Return JSON per @@ -30,10 +33,9 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili - Think-Before-Action: Validate logic and simulate expected outcomes via an internal block before any tool execution or final response; verify pathing, dependencies, and constraints to ensure "one-shot" success. - Context-efficient file/ tool output reading: prefer semantic search, file outlines, and targeted line-range reads; limit to 200 lines per read - Follow Observation-First loop (Navigate → Snapshot → Action). -- Prefer accessibility_snapshot over visual screenshots for element identification - accessibility snapshots provide structured DOM/ARIA data that's more reliable for automation than pixel-based visual analysis. -- Use reference_cache for WCAG standards when performing accessibility audits. +- Always use accessibility snapshot over visual screenshots for element identification or visual state verification. Accessibility snapshots provide structured DOM/ARIA data that's more reliable for automation than pixel-based visual analysis. +- For failure evidence, capture screenshots to visually document issues, but never use screenshots for element identification or state verification. - Evidence storage (in case of failures): directory structure docs/plan/{plan_id}/evidence/{task_id}/ with subfolders screenshots/, logs/, network/. Files named by timestamp and scenario. -- Use UIDs from take_snapshot; avoid raw CSS/XPath. - Never navigate to production without approval. - Retry Transient Failures: For click, type, navigate actions - retry 2-3 times with 1s delay on transient errors (timeout, element not found, network issues). Escalate after max retries. - Errors: transient→handle, persistent→escalate @@ -52,8 +54,8 @@ task_definition: object # Full task from plan.yaml - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply @@ -85,7 +87,14 @@ task_definition: object # Full task from plan.yaml "console_errors": 0, "network_failures": 0, "accessibility_issues": 0, - "evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/" + "evidence_path": "docs/plan/{plan_id}/evidence/{task_id}/", + "failures": [ + { + "criteria": "console_errors|network_requests|accessibility|validation_matrix", + "details": "Description of failure with specific errors", + "scenario": "Scenario name if applicable" + } + ] } } ``` diff --git a/agents/gem-devops.agent.md b/agents/gem-devops.agent.md index 344e9076..cbfbe202 100644 --- a/agents/gem-devops.agent.md +++ b/agents/gem-devops.agent.md @@ -20,7 +20,7 @@ Containerization (Docker) and Orchestration (K8s), CI/CD pipeline design and aut - Execute: Run infrastructure operations using idempotent commands. Use atomic operations. - Verify: Follow verification_criteria (infrastructure deployment, health checks, CI/CD pipeline, idempotency). - Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. -- Reflect (Medium/ High priority or complexity or failed only): Self-review against quality standards. +- Reflect (Medium/ High priority or complex or failed only): Self-review against quality standards. - Cleanup: Remove orphaned resources, close connections. - Return JSON per @@ -59,8 +59,8 @@ task_definition: object # Full task from plan.yaml - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply diff --git a/agents/gem-documentation-writer.agent.md b/agents/gem-documentation-writer.agent.md index 45575d28..2465586d 100644 --- a/agents/gem-documentation-writer.agent.md +++ b/agents/gem-documentation-writer.agent.md @@ -50,8 +50,8 @@ task_definition: object # Full task from plan.yaml - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply diff --git a/agents/gem-implementer.agent.md b/agents/gem-implementer.agent.md index b5c12201..200d2004 100644 --- a/agents/gem-implementer.agent.md +++ b/agents/gem-implementer.agent.md @@ -15,11 +15,13 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD -- TDD Red: Write failing tests FIRST, confirm they FAIL. -- TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS. -- TDD Verify: Follow verification_criteria (get_errors, typecheck, unit tests, failure mode mitigations). +- Analyze: Parse plan_id, objective. Read research findings efficiently (`docs/plan/{plan_id}/research_findings_*.yaml`) to extract relevant insights for planning. +- Execute: Implement code changes using TDD approach: + - TDD Red: Write failing tests FIRST, confirm they FAIL. + - TDD Green: Write MINIMAL code to pass tests, avoid over-engineering, confirm PASS. + - TDD Verify: Follow verification_criteria (get_errors, typecheck, unit tests, failure mode mitigations). - Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. -- Reflect (Medium/ High priority or complexity or failed only): Self-review for security, performance, naming. +- Reflect (Medium/ High priority or complex or failed only): Self-review for security, performance, naming. - Return JSON per @@ -60,8 +62,8 @@ task_definition: object # Full task from plan.yaml - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply diff --git a/agents/gem-planner.agent.md b/agents/gem-planner.agent.md index b44e7ee6..0299bed9 100644 --- a/agents/gem-planner.agent.md +++ b/agents/gem-planner.agent.md @@ -163,8 +163,8 @@ research_findings_paths: [string] # Paths to research_findings_*.yaml files - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply diff --git a/agents/gem-researcher.agent.md b/agents/gem-researcher.agent.md index f5a7ec2d..a76f4ce9 100644 --- a/agents/gem-researcher.agent.md +++ b/agents/gem-researcher.agent.md @@ -218,8 +218,8 @@ complexity: "simple|medium|complex" # Optional, auto-detected - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply diff --git a/agents/gem-reviewer.agent.md b/agents/gem-reviewer.agent.md index 2e79bf8a..0e8702c9 100644 --- a/agents/gem-reviewer.agent.md +++ b/agents/gem-reviewer.agent.md @@ -62,8 +62,8 @@ task_definition: object # Full task from plan.yaml - Learn from execution, user guidance, decisions, patterns - Complete → Store discoveries → Next: Read & apply + - Learn from execution, user guidance, decisions, patterns + - Complete → Store discoveries → Next: Read & apply