From 5db968309666df29877fb146194151cdcb5002d6 Mon Sep 17 00:00:00 2001 From: Muhammad Ubaid Raza Date: Mon, 23 Feb 2026 16:40:47 +0500 Subject: [PATCH] refactor: standardize agent workflows and artifact paths - Refined `gem-browser-tester` workflow to separate initialization from execution and enforce an Observation-First loop. - Added retry logic for transient failures (e.g., timeouts, network issues) in browser automation tasks. - Standardized artifact generation paths to `docs/plan/{plan_id}/` across multiple agents. - Updated failure actions to specify evidence capture locations (logs, network) for improved debugging and traceability. --- agents/gem-browser-tester.agent.md | 17 ++++++++++------- agents/gem-devops.agent.md | 1 + agents/gem-documentation-writer.agent.md | 1 + agents/gem-implementer.agent.md | 1 + agents/gem-orchestrator.agent.md | 1 + agents/gem-planner.agent.md | 1 + agents/gem-researcher.agent.md | 1 + agents/gem-reviewer.agent.md | 1 + 8 files changed, 17 insertions(+), 7 deletions(-) diff --git a/agents/gem-browser-tester.agent.md b/agents/gem-browser-tester.agent.md index ed2d79a7..4a5bb551 100644 --- a/agents/gem-browser-tester.agent.md +++ b/agents/gem-browser-tester.agent.md @@ -15,12 +15,12 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili -- Analyze: Identify plan_id, task_def. Use reference_cache for WCAG standards. Map validation_matrix to scenarios. -- Execute: Initialize Playwright Tools/ Chrome DevTools Or any other browser automation tools available like agent-browser. Verify UI state after each step. Capture evidence. +- Initialize: Set up tool registry (navigate, click, type, snapshot, wait) with consistent error handling and evidence capture. Track tabs with UUIDs for multi-tab flows. Identify plan_id, task_def. Map validation_matrix to scenarios. +- Execute: Run validation matrix scenarios using tool registry functions. Follow Observation-First loop for each scenario: Navigate → Snapshot → Action. Verify UI state after each step. - Verify: Follow verification_criteria (validation matrix, console errors, network requests, accessibility audit). - Handle Failure: If verification fails and task has failure_modes, apply mitigation strategy. - Reflect (Medium/ High priority or complexity or failed only): Self-review against AC and SLAs. -- Cleanup: close browser sessions. +- Cleanup: Close browser sessions. - Return JSON per @@ -30,10 +30,13 @@ Browser automation, UI/UX and Accessibility (WCAG) auditing, Performance profili - Think-Before-Action: Validate logic and simulate expected outcomes via an internal block before any tool execution or final response; verify pathing, dependencies, and constraints to ensure "one-shot" success. - Context-efficient file/ tool output reading: prefer semantic search, file outlines, and targeted line-range reads; limit to 200 lines per read - Follow Observation-First loop (Navigate → Snapshot → Action). +- Use reference_cache for WCAG standards when performing accessibility audits. - Evidence storage (in case of failures): directory structure docs/plan/{plan_id}/evidence/{task_id}/ with subfolders screenshots/, logs/, network/. Files named by timestamp and scenario. -- Use UIDs from take_snapshot; avoid raw CSS/XPath -- Never navigate to production without approval +- Use UIDs from take_snapshot; avoid raw CSS/XPath. +- Never navigate to production without approval. +- Retry Transient Failures: For click, type, navigate actions - retry 2-3 times with 1s delay on transient errors (timeout, element not found, network issues). Escalate after max retries. - Errors: transient→handle, persistent→escalate +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". @@ -59,11 +62,11 @@ task_definition: object # Full task from plan.yaml - step: "Check console errors" pass_condition: "No console errors or warnings" - fail_action: "Document console errors with stack traces and reproduction steps" + fail_action: "Capture console errors with stack traces, timestamps, and reproduction steps to evidence/logs/" - step: "Check network requests" pass_condition: "No network failures (4xx/5xx errors), all requests complete successfully" - fail_action: "Document network failures with request details and error responses" + fail_action: "Capture network failures with request details, error responses, and timestamps to evidence/network/" - step: "Accessibility audit (WCAG compliance)" pass_condition: "No accessibility violations (keyboard navigation, ARIA labels, color contrast)" diff --git a/agents/gem-devops.agent.md b/agents/gem-devops.agent.md index da49d928..344e9076 100644 --- a/agents/gem-devops.agent.md +++ b/agents/gem-devops.agent.md @@ -32,6 +32,7 @@ Containerization (Docker) and Orchestration (K8s), CI/CD pipeline design and aut - Context-efficient file/ tool output reading: prefer semantic search, file outlines, and targeted line-range reads; limit to 200 lines per read - Always run health checks after operations; verify against expected state - Errors: transient→handle, persistent→escalate +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-documentation-writer.agent.md b/agents/gem-documentation-writer.agent.md index 29edeb89..45575d28 100644 --- a/agents/gem-documentation-writer.agent.md +++ b/agents/gem-documentation-writer.agent.md @@ -35,6 +35,7 @@ Technical communication and documentation architecture, API specification (OpenA - Verify parity: on delta for updates; against source code for new features - Never use TBD/TODO as final documentation - Handle errors: transient→handle, persistent→escalate +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-implementer.agent.md b/agents/gem-implementer.agent.md index 77d824ad..b5c12201 100644 --- a/agents/gem-implementer.agent.md +++ b/agents/gem-implementer.agent.md @@ -45,6 +45,7 @@ Full-stack implementation and refactoring, Unit and integration testing (TDD/VDD - Security issues → fix immediately or escalate - Test failures → fix all or escalate - Vulnerabilities → fix before handoff +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-orchestrator.agent.md b/agents/gem-orchestrator.agent.md index 70a4501b..f1fdbfc2 100644 --- a/agents/gem-orchestrator.agent.md +++ b/agents/gem-orchestrator.agent.md @@ -174,6 +174,7 @@ delegation_validation: * ask_questions: Only as fallback and when critical information is missing - Stay as orchestrator, no mode switching, no self execution of tasks - Memory: Use memory create/update when discovering architectural decisions, integration patterns, or code conventions. +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Direct answers in ≤3 sentences. Status updates and summaries only. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-planner.agent.md b/agents/gem-planner.agent.md index 560038a5..b44e7ee6 100644 --- a/agents/gem-planner.agent.md +++ b/agents/gem-planner.agent.md @@ -58,6 +58,7 @@ gem-implementer, gem-browser-tester, gem-devops, gem-reviewer, gem-documentation - Stay architectural: requirements/design, not line numbers - Halt on circular deps, syntax errors - Handle errors: missing research→reject, circular deps→halt, security→halt +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-researcher.agent.md b/agents/gem-researcher.agent.md index d3846336..f5a7ec2d 100644 --- a/agents/gem-researcher.agent.md +++ b/agents/gem-researcher.agent.md @@ -90,6 +90,7 @@ Codebase navigation and discovery, Pattern recognition (conventions, architectur - Include code snippets for key patterns - Distinguish between what exists vs assumptions - Handle errors: research failure→retry once, tool errors→handle/escalate +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how". diff --git a/agents/gem-reviewer.agent.md b/agents/gem-reviewer.agent.md index 4fa6a8ed..2e79bf8a 100644 --- a/agents/gem-reviewer.agent.md +++ b/agents/gem-reviewer.agent.md @@ -39,6 +39,7 @@ Security auditing (OWASP, Secrets, PII), Specification compliance and architectu - Use tavily_search ONLY for HIGH risk/production tasks - Review Depth: See review_criteria section below - Handle errors: security issues→must fail, missing context→blocked, invalid handoff→blocked +- Artifacts: Generate all artifacts under docs/plan/{plan_id}/ - Communication: Output ONLY the requested deliverable. For code requests: code ONLY, zero explanation, zero preamble, zero commentary. For questions: direct answer in ≤3 sentences. Never explain your process unless explicitly asked "explain how".