[gem-team] Designer Updates, hanlde failures in all agents (#1474)

* feat: move to xml top tags for ebtter llm parsing and structure - Orchestrator is now purely an orchestrator - Added new calrify phase for immediate user erequest understanding and task parsing before workflow - Enforce review/ critic to plan instea dof 3x plan generation retries for better error handling and self-correction - Add hins to all agents - Optimize defitons for simplicity/ conciseness while maintaining clarity * feat(critic): add holistic review and final review enhancements * chore: bump marketplace version to 1.10.0 - Updated `.github/plugin/marketplace.json` to version 1.10.0. - Revised `agents/gem-browser-tester.agent.md` to improve the BROWSER TESTER role documentation with a clearer structure, explicit role header, and organized knowledge sources section. * refactor: streamline verification and self‑critique steps across browser‑tester, code‑simplifier, critic, and debugger agents * feat(researcher): improve mode selection workflow and research implementation details - Refine **Clarify** mode description to emphasize minimal research for detecting ambiguities. - Reorder steps and clarify intent detection (`continue_plan`, `modify_plan`, `new_task`). - Add explicit sub‑steps for presenting architectural and task‑specific clarifications. - Update **Research** mode section with clearer initialization workflow. - Simplify and reformat the confidence calculation comments for readability. - Minor formatting tweaks and added blank lines for visual separation. * Update gem-orchestrator.agent.md * docs(gem-browser-tester): enhance BROWSER TESTER role description and clarify workflow steps- Expanded the BROWSER TESTER role with explicit responsibilities and constraints - Reformatted the Knowledge Sources list using consistent numbered items for readability- Updated the Workflow section to detail initialization, execution, and teardown steps more clearly- Refined the Output Format and Research Format Guide structures to use proper markdown syntax - Improved overall formatting and consistency of documentation for better maintainability * docs: fix typo in delegation description
2026-06-14 20:05:15 +00:00 · 2026-04-29 06:49:09 +05:00
parent f047d64ce3
commit 689ac4d33c
18 changed files with 2212 additions and 810 deletions
@@ -17,7 +17,9 @@
    "./agents/gem-mobile-tester.md"
  ],
  "author": {
-    "name": "Awesome Copilot Community"
+    "email": "mubaidr@gmail.com",
+    "name": "mubaidr",
+    "url": "https://github.com/mubaidr"
  },
  "description": "Multi-agent orchestration framework for spec-driven development and automated verification.",
  "keywords": [
@@ -32,8 +34,8 @@
    "prd",
    "mobile"
  ],
-  "license": "MIT",
+  "license": "Apache-2.0",
  "name": "gem-team",
-  "repository": "https://github.com/github/awesome-copilot",
-  "version": "1.6.6"
+  "repository": "https://github.com/mubaidr/gem-team",
+  "version": "1.13.0"
 }
@@ -1,9 +1,23 @@
 # 💎 Gem Team
-
+>
 > Multi-agent orchestration framework for spec-driven development and automated verification.
+>
+> **Turning Model Quality into System Quality.**
+>

-[![Copilot Plugin](https://img.shields.io/badge/Plugin-Awesome%20Copilot-0078D4?style=flat-square&logo=microsoft)](https://awesome-copilot.github.com/plugins/#file=plugins%2Fgem-team)
-![Version](https://img.shields.io/badge/Version-1.6.6-6366f1?style=flat-square)
+![VS Code](https://img.shields.io/badge/VS_Code-5A6D7C?style=flat)
+![VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-5A6D7C?style=flat)
+![Copilot CLI](https://img.shields.io/badge/Copilot_CLI-5A6D7C?style=flat)
+![Cursor](https://img.shields.io/badge/Cursor-5A6D7C?style=flat)
+![OpenCode](https://img.shields.io/badge/OpenCode-5A6D7C?style=flat)
+![Claude Code](https://img.shields.io/badge/Claude_Code-5A6D7C?style=flat)
+![Windsurf](https://img.shields.io/badge/Windsurf-5A6D7C?style=flat)
+
+---
+
+## 🚀 Quick Start
+
+See [all installation options](#-installation) below.

 ---

@@ -17,6 +31,8 @@
 - ♻️ **Pattern Reuse** — Codebase pattern discovery prevents reinventing wheels
 - 📏 **Established Patterns** — Uses library/framework conventions over custom implementations
 - 🪞 **Self-Correcting** — All agents self-critique at 0.85 confidence threshold
+- 🧠 **Context Scaffolding** — Maps large-scale dependencies _before_ the model reads code, preventing context-loss in legacy repos
+- ⚖️ **Intent vs. Compliance** — Shifts the burden from writing "perfect prompts" to enforcing strict, YAML-based approval gates
 - 📋 **Source Verified** — Every factual claim cites its source; no guesswork
 - ♿ **Accessibility-First** — WCAG compliance validated at spec and runtime layers
 - 🔬 **Smart Debugging** — Root-cause analysis with stack trace parsing + confidence-scored fixes
@@ -26,7 +42,7 @@
 - 🛠️ **Skills & Guidelines** — Built-in skill & guidelines (web-design-guidelines)
 - 📐 **Spec-Driven** — Multi-step refinement defines "what" before "how"
 - 🌊 **Wave-Based** — Parallel agents with integration gates per wave
- 🗂️ **Verified-Plan** — Complex tasks: Plan → Verificationn → Critic
+- 🗂️ **Verified-Plan** — Complex tasks: Plan → Verification → Critic
 - 🔎 **Final Review** — Optional user-triggered comprehensive review of all changed files
 - 🩺 **Diagnose-then-Fix** — gem-debugger diagnoses → gem-implementer fixes → re-verifies
 - ⚠️ **Pre-Mortem** — Failure modes identified BEFORE execution
@@ -34,35 +50,66 @@
 - 📝 **Contract-First** — Contract tests written before implementation
 - 📱 **Mobile Agents** — Native mobile implementation (React Native, Flutter) + iOS/Android testing

---
+### 🚀 The "System-IQ" Multiplier

-## 📦 Installation
+Raw reasoning isn't enough in single-pass chat. Gem-Team wraps your preferred LLM in a rigid, verification-first loop, fundamentally boosting its effective capability on SWE-benchmarks:

-```bash
-# Using Copilot CLI
-copilot plugin install gem-team@awesome-copilot
-```
+- **For Small Models (e.g., Qwen 1.7B - 8B):** The framework provides the "executive brain." Task decomposition and isolated 50-line chunks can up to **double** their localized debugging success rates.
+- **For Reasoning Models (e.g., DeepSeek 3.2):** TDD loops and parallel research stabilize their native file I/O fragility, yielding up to a **+25% lift** in execution reliability.
+- **For SOTA Models (e.g., GLM 5.1, Kimi K2.5):** The `gem-reviewer` acts as a noise-filter, pruning verbosity and enforcing strict PRD compliance to prevent over-engineering.

-> **[Install Gem Team Now →](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%253A%252F%252Fraw.githubusercontent.com%252Fgithub%252Fawesome-copilot%252Fmain%252F.%252Fagents)**
+### 🎨 Design Support
+
+Gem Team includes specialized design agents with **anti-"AI slop" guidelines** for distinctive, modern aesthetics:
+
+| Agent | Focus | Key Capabilities |
+|:------|:------|:-----------------|
+| **DESIGNER** | Web UI/UX | Layouts, themes, design systems, accessibility (WCAG), 7 design movements (Brutalism → Maximalism), 5-level elevation system |
+| **DESIGNER-MOBILE** | Mobile UI/UX | iOS HIG, Material 3, safe areas, haptics, platform-specific adaptations of design movements |
+
+**Anti-AI Slop Principles:**
+- Distinctive fonts (Cabinet Grotesk, Satoshi, Clash Display — never Inter/Roboto defaults)
+- 60-30-10 color strategy with sharp accents
+- Break predictable layouts (asymmetric grids, overlap, bento patterns)
+- Purposeful motion with orchestrated page loads
+- Design movement library: Brutalism, Neo-brutalism, Glassmorphism, Claymorphism, Minimalist Luxury, Retro-futurism, Maximalism
+
+Both agents include quality checklists for generating unique, memorable designs.

 ---

 ## 🔄 Core Workflow

-**Phase Flow:** User Goal → Orchestrator → Discuss (medium|complex) → PRD → Research → Planning → Plan Review (medium|complex) → Execution → Summary → [Optional] Final Review
+**Phase Flow:** User Goal → Orchestrator → Discuss (medium|complex) → PRD → Research → Planning → Plan Review (medium|complex) → Execution → Summary → (Optional) Final Review

 **Error Handling:** Diagnose-then-Fix loop (Debugger → Implementer → Re-verify)

 **Orchestrator** auto-detects phase and routes accordingly. Any feedback or steer message is handled to re-plan.

-| Condition | Phase |
-|:----------|:------|
-| No plan + simple | Research |
-| No plan + medium\|complex | Discuss → PRD → Research |
-| Plan + pending tasks | Execution |
-| Plan + feedback | Planning |
-| Plan + completed → Summary | User decision (feedback / final review / approve) |
-| User requests final review | Final Review (parallel gem-reviewer + gem-critic) |
+| Condition | Phase | Outcome |
+|:----------|:------|:--------|
+| No plan + simple | Research → Planning | Quick execution path |
+| No plan + medium\|complex | Discuss → PRD → Research | Spec-driven approach |
+| Plan + pending tasks | Execution | Wave-based implementation |
+| Plan + feedback | Planning | Replan with steer |
+| Plan + completed | Summary | User decision (feedback / final review / approve) |
+| User requests final review | Final Review | Parallel review by gem-reviewer + gem-critic |
+
+---
+
+## 📦 Installation
+
+| Method | Command / Link | Docs |
+|:-------|:---------------|:-----|
+| **Code** | **[Install Now](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%253A%252F%252Fraw.githubusercontent.com%252Fgithub%252Fawesome-copilot%252Fmain%252F.%252Fagents)** | [Copilot Docs](https://docs.github.com/en/copilot/using-github-copilot/using-github-copilot-chat) |
+| **Code Insiders** | **[Install Now](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%253A%252F%252Fraw.githubusercontent.com%252Fgithub%252Fawesome-copilot%252Fmain%252F.%252Fagents)** | [Copilot Docs](https://docs.github.com/en/copilot/using-github-copilot/using-github-copilot-chat) |
+| **APM <br/> (All AI coding agents)** | `apm install mubaidr/gem-team` | [APM Docs](https://microsoft.github.io/apm/) |
+| **Copilot CLI (Marketplace)** | `copilot plugin install gem-team@awesome-copilot` | [CLI Docs](https://github.com/github/copilot-cli) |
+| **Copilot CLI (Direct)** | `copilot plugin install gem-team@mubaidr` | [CLI Docs](https://github.com/github/copilot-cli) |
+| **Windsurf** | `codeium agent install mubaidr/gem-team` | [Windsurf Docs](https://docs.codeium.com/windsurf) |
+| **Claude Code** | `claude plugin install mubaidr/gem-team` | [Claude Docs](https://docs.anthropic.com/en/docs/claude-code) |
+| **OpenCode** | `opencode plugin install mubaidr/gem-team` | [OpenCode Docs](https://opencode.ai/docs/) |
+| **Manual <br/> (Copy agent files)** | VS Code: `~/.vscode/agents/` <br/> VS Code Insiders: `~/.vscode-insiders/agents/` <br/> GitHub Copilot: `~/.github/copilot/agents/` <br/> GitHub Copilot (project): `.github/plugin/agents/` <br/> Windsurf: `~/.windsurf/agents/` <br/> Claude: `~/.claude/agents/` <br/> Cursor: `~/.cursor/agents/` <br/> OpenCode: `~/.opencode/agents/` | — |

 ---

@@ -117,48 +164,21 @@ flowchart

 | Role | Description | Output | Recommended LLM |
 |:-----|:------------|:-------|:---------------|
-| 🎯 **ORCHESTRATOR** (`gem-orchestrator`) | The team lead: Orchestrates research, planning, implementation, and verification | 📋 PRD, plan.yaml | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** GLM-5, Kimi K2.5, Qwen3.5 |
-| 🔍 **RESEARCHER** (`gem-researcher`) | Codebase exploration — patterns, dependencies, architecture discovery | 🔍 findings | **Closed:** Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6<br>**Open:** GLM-5, Qwen3.5-9B, DeepSeek-V3.2 |
-| 📋 **PLANNER** (`gem-planner`) | DAG-based execution plans — task decomposition, wave scheduling, risk analysis | 📄 plan.yaml | **Closed:** Gemini 3.1 Pro, Claude Sonnet 4.6, GPT-5.4<br>**Open:** Kimi K2.5, GLM-5, Qwen3.5 |
-| 🔧 **IMPLEMENTER** (`gem-implementer`) | TDD code implementation — features, bugs, refactoring. Never reviews own work | 💻 code | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
-| 🧪 **BROWSER TESTER** (`gem-browser-tester`) | E2E browser testing, UI/UX validation, visual regression with Playwright | 🧪 evidence | **Closed:** GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash<br>**Open:** Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7 |
-| 🚀 **DEVOPS** (`gem-devops`) | Infrastructure deployment, CI/CD pipelines, container management | 🌍 infra | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3.5 |
-| 🛡️ **REVIEWER** (`gem-reviewer`) | Security auditing, code review, OWASP scanning, PRD compliance verification | 📊 review report | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** Kimi K2.5, GLM-5, DeepSeek-V3.2 |
-| 📝 **DOCUMENTATION** (`gem-documentation-writer`) | Technical documentation, README files, API docs, diagrams, walkthroughs | 📝 docs | **Closed:** Claude Sonnet 4.6, Gemini 3.1 Flash, GPT-5.4 Mini<br>**Open:** Llama 4 Scout, Qwen3.5-9B, MiniMax M2.7 |
-| 🔬 **DEBUGGER** (`gem-debugger`) | Root-cause analysis, stack trace diagnosis, regression bisection, error reproduction | 🔬 diagnosis | **Closed:** Gemini 3.1 Pro (Retrieval King), Claude Opus 4.6, GPT-5.4<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
-| 🎯 **CRITIC** (`gem-critic`) | Challenges assumptions, finds edge cases, spots over-engineering and logic gaps | 💬 critique | **Closed:** Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** Kimi K2.5, GLM-5, Qwen3.5 |
-| ✂️ **SIMPLIFIER** (`gem-code-simplifier`) | Refactoring specialist — removes dead code, reduces complexity, consolidates duplicates | ✂️ change log | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
-| 🎨 **DESIGNER** (`gem-designer`) | UI/UX design specialist — layouts, themes, color schemes, design systems, accessibility | 🎨 DESIGN.md | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** Qwen3.5, GLM-5, MiniMax M2.7 |
-| 📱 **IMPLEMENTER-MOBILE** (`gem-implementer-mobile`) | Mobile implementation — React Native, Expo, Flutter with TDD | 💻 code | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
-| 📱 **DESIGNER-MOBILE** (`gem-designer-mobile`) | Mobile UI/UX specialist — HIG, Material Design, safe areas, touch targets | 🎨 DESIGN.md | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** Qwen3.5, GLM-5, MiniMax M2.7 |
-| 📱 **MOBILE TESTER** (`gem-mobile-tester`) | Mobile E2E testing — Detox, Maestro, iOS/Android simulators | 🧪 evidence | **Closed:** GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash<br>**Open:** Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7 |
-
-### Agent File Skeleton
-
-Each `.agent.md` file follows this structure:
-
-```
---                                    # Frontmatter: description, name, triggers
-# Role                                 # One-line identity
-# Expertise                            # Core competencies
-# Knowledge Sources                    # Prioritized reference list
-# Workflow                             # Step-by-step execution phases
-  ## 1. Initialize                     # Setup and context gathering
-  ## 2. Analyze/Execute                # Role-specific work
-  ## N. Self-Critique                  # Confidence check (≥0.85)
-  ## N+1. Handle Failure               # Retry/escalate logic
-  ## N+2. Output                       # JSON deliverable format
-# Input Format                         # Expected JSON schema
-# Output Format                        # Return JSON schema
-# Rules
-  ## Execution                         # Tool usage, batching, error handling
-  ## Constitutional                    # IF-THEN decision rules
-  ## Anti-Patterns                     # Behaviors to avoid
-  ## Anti-Rationalization              # Excuse → Rebuttal table
-  ## Directives                        # Non-negotiable commands
-```
-
-All agents share: Execution rules, Constitutional rules, Anti-Patterns, and Directives sections. Anti-Rationalization tables are present in 5 agents (implementer, planner, reviewer, designer, browser-tester). Role-specific sections (Workflow, Expertise, Knowledge Sources) vary by agent.
+| 🎯 **ORCHESTRATOR** | The team lead: Orchestrates research, planning, implementation, and verification | 📋 PRD, plan.yaml | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** GLM-5, Kimi K2.5, Qwen3.5 |
+| 🔍 **RESEARCHER** | Codebase exploration — patterns, dependencies, architecture discovery | 🔍 findings | **Closed:** Gemini 3.1 Pro, GPT-5.4, Claude Sonnet 4.6<br>**Open:** GLM-5, Qwen3.5-9B, DeepSeek-V3.2 |
+| 📋 **PLANNER** | DAG-based execution plans — task decomposition, wave scheduling, risk analysis | 📄 plan.yaml | **Closed:** Gemini 3.1 Pro, Claude Sonnet 4.6, GPT-5.4<br>**Open:** Kimi K2.5, GLM-5, Qwen3.5 |
+| 🔧 **IMPLEMENTER** | TDD code implementation — features, bugs, refactoring. Never reviews own work | 💻 code | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
+| 🧪 **BROWSER TESTER** | E2E browser testing, UI/UX validation, visual regression with Playwright | 🧪 evidence | **Closed:** GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash<br>**Open:** Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7 |
+| 🚀 **DEVOPS** | Infrastructure deployment, CI/CD pipelines, container management | 🌍 infra | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3.5 |
+| 🛡️ **REVIEWER** | **Zero-Hallucination Filter** — Security auditing, code review, OWASP scanning, PRD compliance verification | 📊 review report | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** Kimi K2.5, GLM-5, DeepSeek-V3.2 |
+| 📝 **DOCUMENTATION** | Technical documentation, README files, API docs, diagrams, walkthroughs | 📝 docs | **Closed:** Claude Sonnet 4.6, Gemini 3.1 Flash, GPT-5.4 Mini<br>**Open:** Llama 4 Scout, Qwen3.5-9B, MiniMax M2.7 |
+| 🔬 **DEBUGGER** | Root-cause analysis, stack trace diagnosis, regression bisection, error reproduction | 🔬 diagnosis | **Closed:** Gemini 3.1 Pro (Retrieval King), Claude Opus 4.6, GPT-5.4<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
+| 🎯 **CRITIC** | Challenges assumptions, finds edge cases, spots over-engineering and logic gaps | 💬 critique | **Closed:** Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** Kimi K2.5, GLM-5, Qwen3.5 |
+| ✂️ **SIMPLIFIER** | Refactoring specialist — removes dead code, reduces complexity, consolidates duplicates | ✂️ change log | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
+| 🎨 **DESIGNER** | UI/UX design specialist — layouts, themes, color schemes, design systems, accessibility | 🎨 DESIGN.md | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** Qwen3.5, GLM-5, MiniMax M2.7 |
+| 📱 **IMPLEMENTER-MOBILE** | Mobile implementation — React Native, Expo, Flutter with TDD | 💻 code | **Closed:** Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro<br>**Open:** DeepSeek-V3.2, GLM-5, Qwen3-Coder-Next |
+| 📱 **DESIGNER-MOBILE** | Mobile UI/UX specialist — HIG, Material Design, safe areas, touch targets | 🎨 DESIGN.md | **Closed:** GPT-5.4, Gemini 3.1 Pro, Claude Sonnet 4.6<br>**Open:** Qwen3.5, GLM-5, MiniMax M2.7 |
+| 📱 **MOBILE TESTER** | Mobile E2E testing — Detox, Maestro, iOS/Android simulators | 🧪 evidence | **Closed:** GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Flash<br>**Open:** Llama 4 Maverick, Qwen3.5-Flash, MiniMax M2.7 |

 ---

@@ -193,7 +213,7 @@ Contributions are welcome! Please feel free to submit a Pull Request. [CONTRIBUT

 ## 📄 License

-This project is licensed under the MIT License.
+This project is licensed under the Apache License 2.0.

 ## 💬 Support