Add tool guardian hook (#1044)

* Tool Guardian Hook Add Tool Guardian hook for blocking dangerous tool operations Introduces a preToolUse hook that scans Copilot agent tool invocations against ~20 threat patterns (destructive file ops, force pushes, DB drops, permission abuse, network exfiltration) and blocks or warns before execution. * Address review feedback: move hook to .github/, remove accidental log file - Move hooks/tool-guardian/ to .github/hooks/tool-guardian/ - Remove accidentally committed guard.log - Update all path references in README.md * Move log directory to .github/, revert hook files back to hooks/ - Revert hook files from .github/hooks/ back to hooks/tool-guardian/ - Update default log path to .github/logs/copilot/tool-guardian/ - Update all path references in README.md and hooks.json
2026-06-27 17:51:02 +00:00 · 2026-03-19 10:36:48 +05:30
parent cb6cf924fb
commit 7446df7054
4 changed files with 402 additions and 0 deletions
@@ -36,3 +36,4 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-hooks) for guidelines on how to
 | [Secrets Scanner](../hooks/secrets-scanner/README.md) | Scans files modified during a Copilot coding agent session for leaked secrets, credentials, and sensitive data | sessionEnd | `hooks.json`<br />`scan-secrets.sh` |
 | [Session Auto-Commit](../hooks/session-auto-commit/README.md) | Automatically commits and pushes changes when a Copilot coding agent session ends | sessionEnd | `auto-commit.sh`<br />`hooks.json` |
 | [Session Logger](../hooks/session-logger/README.md) | Logs all Copilot coding agent session activity for audit and analysis | sessionStart, sessionEnd, userPromptSubmitted | `hooks.json`<br />`log-prompt.sh`<br />`log-session-end.sh`<br />`log-session-start.sh` |
+| [Tool Guardian](../hooks/tool-guardian/README.md) | Blocks dangerous tool operations (destructive file ops, force pushes, DB drops) before the Copilot coding agent executes them | preToolUse | `guard-tool.sh`<br />`hooks.json` |
@@ -0,0 +1,183 @@
+---
+name: 'Tool Guardian'
+description: 'Blocks dangerous tool operations (destructive file ops, force pushes, DB drops) before the Copilot coding agent executes them'
+tags: ['security', 'safety', 'preToolUse', 'guardrails']
+---
+
+# Tool Guardian Hook
+
+Blocks dangerous tool operations before a GitHub Copilot coding agent executes them, acting as a safety net against destructive commands, force pushes, database drops, and other high-risk actions.
+
+## Overview
+
+AI coding agents can autonomously execute shell commands, file operations, and database queries. Without guardrails, a misinterpreted instruction could lead to irreversible damage. This hook intercepts every tool invocation at the `preToolUse` event and scans it against ~20 threat patterns across 6 categories:
+
+- **Destructive file ops**: `rm -rf /`, deleting `.env` or `.git`
+- **Destructive git ops**: `git push --force` to main/master, `git reset --hard`
+- **Database destruction**: `DROP TABLE`, `DROP DATABASE`, `TRUNCATE`, `DELETE FROM` without `WHERE`
+- **Permission abuse**: `chmod 777`, recursive world-writable permissions
+- **Network exfiltration**: `curl | bash`, `wget | sh`, uploading files via `curl --data @`
+- **System danger**: `sudo`, `npm publish`
+
+## Features
+
+- **Two guard modes**: `block` (exit non-zero to prevent execution) or `warn` (log only)
+- **Safer alternatives**: Every blocked pattern includes a suggestion for a safer command
+- **Allowlist support**: Skip specific patterns via `TOOL_GUARD_ALLOWLIST`
+- **Structured logging**: JSON Lines output for integration with monitoring tools
+- **Fast execution**: 10-second timeout; no external network calls
+- **Zero dependencies**: Uses only standard Unix tools (`grep`, `sed`); optional `jq` for input parsing
+
+## Installation
+
+1. Copy the hook folder to your repository:
+
+   ```bash
+   cp -r hooks/tool-guardian your-repo/hooks/
+   ```
+
+2. Ensure the script is executable:
+
+   ```bash
+   chmod +x hooks/tool-guardian/guard-tool.sh
+   ```
+
+3. Create the logs directory and add it to `.gitignore`:
+
+   ```bash
+   mkdir -p .github/logs/copilot/tool-guardian
+   echo ".github/logs/" >> .gitignore
+   ```
+
+4. Commit the hook configuration to your repository's default branch.
+
+## Configuration
+
+The hook is configured in `hooks.json` to run on the `preToolUse` event:
+
+```json
+{
+  "version": 1,
+  "hooks": {
+    "preToolUse": [
+      {
+        "type": "command",
+        "bash": "hooks/tool-guardian/guard-tool.sh",
+        "cwd": ".",
+        "env": {
+          "GUARD_MODE": "block"
+        },
+        "timeoutSec": 10
+      }
+    ]
+  }
+}
+```
+
+### Environment Variables
+
+| Variable | Values | Default | Description |
+|----------|--------|---------|-------------|
+| `GUARD_MODE` | `warn`, `block` | `block` | `warn` logs threats only; `block` exits non-zero to prevent tool execution |
+| `SKIP_TOOL_GUARD` | `true` | unset | Disable the guardian entirely |
+| `TOOL_GUARD_LOG_DIR` | path | `.github/logs/copilot/tool-guardian` | Directory where guard logs are written |
+| `TOOL_GUARD_ALLOWLIST` | comma-separated | unset | Patterns to skip (e.g., `git push --force,npm publish`) |
+
+## How It Works
+
+1. Before the Copilot coding agent executes a tool, the hook receives the tool invocation as JSON on stdin
+2. Extracts `toolName` and `toolInput` fields (via `jq` if available, regex fallback otherwise)
+3. Checks the combined text against the allowlist — if matched, skips all scanning
+4. Scans combined text against ~20 regex threat patterns across 6 severity categories
+5. Reports findings with category, severity, matched text, and a safer alternative
+6. Writes a structured JSON log entry for audit purposes
+7. In `block` mode, exits non-zero to prevent the tool from executing
+8. In `warn` mode, logs the threat and allows execution to proceed
+
+## Threat Categories
+
+| Category | Severity | Key Patterns | Suggestion |
+|----------|----------|-------------|------------|
+| `destructive_file_ops` | critical | `rm -rf /`, `rm -rf ~`, `rm -rf .`, delete `.env`/`.git` | Use targeted paths or `mv` to back up |
+| `destructive_git_ops` | critical/high | `git push --force` to main/master, `git reset --hard`, `git clean -fd` | Use `--force-with-lease`, `git stash`, dry-run |
+| `database_destruction` | critical/high | `DROP TABLE`, `DROP DATABASE`, `TRUNCATE`, `DELETE FROM` without WHERE | Use migrations, backups, add WHERE clause |
+| `permission_abuse` | high | `chmod 777`, `chmod -R 777` | Use `755` for dirs, `644` for files |
+| `network_exfiltration` | critical/high | `curl \| bash`, `wget \| sh`, `curl --data @file` | Download first, review, then execute |
+| `system_danger` | high | `sudo`, `npm publish` | Use least privilege; `--dry-run` first |
+
+## Examples
+
+### Safe command (exit 0)
+
+```bash
+echo '{"toolName":"bash","toolInput":"git status"}' | bash hooks/tool-guardian/guard-tool.sh
+```
+
+### Blocked command (exit 1)
+
+```bash
+echo '{"toolName":"bash","toolInput":"git push --force origin main"}' | \
+  GUARD_MODE=block bash hooks/tool-guardian/guard-tool.sh
+```
+
+```
+🛡️  Tool Guardian: 1 threat(s) detected in 'bash' invocation
+
+  CATEGORY                 SEVERITY   MATCH                                    SUGGESTION
+  --------                 --------   -----                                    ----------
+  destructive_git_ops      critical   git push --force origin main             Use 'git push --force-with-lease' or push to a feature branch
+
+🚫 Operation blocked: resolve the threats above or adjust TOOL_GUARD_ALLOWLIST.
+   Set GUARD_MODE=warn to log without blocking.
+```
+
+### Warn mode (exit 0, threat logged)
+
+```bash
+echo '{"toolName":"bash","toolInput":"rm -rf /"}' | \
+  GUARD_MODE=warn bash hooks/tool-guardian/guard-tool.sh
+```
+
+### Allowlisted command (exit 0)
+
+```bash
+echo '{"toolName":"bash","toolInput":"git push --force origin main"}' | \
+  TOOL_GUARD_ALLOWLIST="git push --force" bash hooks/tool-guardian/guard-tool.sh
+```
+
+## Log Format
+
+Guard events are written to `.github/logs/copilot/tool-guardian/guard.log` in JSON Lines format:
+
+```json
+{"timestamp":"2026-03-16T10:30:00Z","event":"threats_detected","mode":"block","tool":"bash","threat_count":1,"threats":[{"category":"destructive_git_ops","severity":"critical","match":"git push --force origin main","suggestion":"Use 'git push --force-with-lease' or push to a feature branch"}]}
+```
+
+```json
+{"timestamp":"2026-03-16T10:30:00Z","event":"guard_passed","mode":"block","tool":"bash"}
+```
+
+```json
+{"timestamp":"2026-03-16T10:30:00Z","event":"guard_skipped","reason":"allowlisted","tool":"bash"}
+```
+
+## Customization
+
+- **Add custom patterns**: Edit the `PATTERNS` array in `guard-tool.sh` to add project-specific threat patterns
+- **Adjust severity**: Change severity levels for patterns that need different treatment
+- **Allowlist known commands**: Use `TOOL_GUARD_ALLOWLIST` for commands that are safe in your context
+- **Change log location**: Set `TOOL_GUARD_LOG_DIR` to route logs to your preferred directory
+
+## Disabling
+
+To temporarily disable the guardian:
+
+- Set `SKIP_TOOL_GUARD=true` in the hook environment
+- Or remove the `preToolUse` entry from `hooks.json`
+
+## Limitations
+
+- Pattern-based detection; does not perform semantic analysis of command intent
+- May produce false positives for commands that match patterns in safe contexts (use the allowlist to suppress these)
+- Scans the text representation of tool input; cannot detect obfuscated or encoded commands
+- Requires tool invocations to be passed as JSON on stdin with `toolName` and `toolInput` fields
@@ -0,0 +1,202 @@
+#!/bin/bash
+
+# Tool Guardian Hook
+# Blocks dangerous tool operations (destructive file ops, force pushes, DB drops,
+# etc.) before the Copilot coding agent executes them.
+#
+# Environment variables:
+#   GUARD_MODE           - "warn" (log only) or "block" (exit non-zero on threats) (default: block)
+#   SKIP_TOOL_GUARD      - "true" to disable entirely (default: unset)
+#   TOOL_GUARD_LOG_DIR   - Directory for guard logs (default: logs/copilot/tool-guardian)
+#   TOOL_GUARD_ALLOWLIST - Comma-separated patterns to skip (default: unset)
+
+set -euo pipefail
+
+# ---------------------------------------------------------------------------
+# Early exit if disabled
+# ---------------------------------------------------------------------------
+if [[ "${SKIP_TOOL_GUARD:-}" == "true" ]]; then
+  exit 0
+fi
+
+# ---------------------------------------------------------------------------
+# Read tool invocation from stdin (JSON with toolName + toolInput)
+# ---------------------------------------------------------------------------
+INPUT=$(cat)
+
+MODE="${GUARD_MODE:-block}"
+LOG_DIR="${TOOL_GUARD_LOG_DIR:-.github/logs/copilot/tool-guardian}"
+TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+mkdir -p "$LOG_DIR"
+LOG_FILE="$LOG_DIR/guard.log"
+
+# ---------------------------------------------------------------------------
+# Extract tool name and input text
+# ---------------------------------------------------------------------------
+TOOL_NAME=""
+TOOL_INPUT=""
+
+if command -v jq &>/dev/null; then
+  TOOL_NAME=$(printf '%s' "$INPUT" | jq -r '.toolName // empty' 2>/dev/null || echo "")
+  TOOL_INPUT=$(printf '%s' "$INPUT" | jq -r '.toolInput // empty' 2>/dev/null || echo "")
+fi
+
+# Fallback: extract with grep/sed if jq unavailable or fields empty
+if [[ -z "$TOOL_NAME" ]]; then
+  TOOL_NAME=$(printf '%s' "$INPUT" | grep -oE '"toolName"\s*:\s*"[^"]*"' | head -1 | sed 's/.*"toolName"\s*:\s*"//;s/"//')
+fi
+if [[ -z "$TOOL_INPUT" ]]; then
+  TOOL_INPUT=$(printf '%s' "$INPUT" | grep -oE '"toolInput"\s*:\s*"[^"]*"' | head -1 | sed 's/.*"toolInput"\s*:\s*"//;s/"//')
+fi
+
+# Combine for pattern matching
+COMBINED="${TOOL_NAME} ${TOOL_INPUT}"
+
+# ---------------------------------------------------------------------------
+# Parse allowlist
+# ---------------------------------------------------------------------------
+ALLOWLIST=()
+if [[ -n "${TOOL_GUARD_ALLOWLIST:-}" ]]; then
+  IFS=',' read -ra ALLOWLIST <<< "$TOOL_GUARD_ALLOWLIST"
+fi
+
+is_allowlisted() {
+  local text="$1"
+  for pattern in "${ALLOWLIST[@]}"; do
+    pattern=$(printf '%s' "$pattern" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+    [[ -z "$pattern" ]] && continue
+    if [[ "$text" == *"$pattern"* ]]; then
+      return 0
+    fi
+  done
+  return 1
+}
+
+# Check allowlist early — if the combined text matches, skip all scanning
+if [[ ${#ALLOWLIST[@]} -gt 0 ]] && is_allowlisted "$COMBINED"; then
+  printf '{"timestamp":"%s","event":"guard_skipped","reason":"allowlisted","tool":"%s"}\n' \
+    "$TIMESTAMP" "$TOOL_NAME" >> "$LOG_FILE"
+  exit 0
+fi
+
+# ---------------------------------------------------------------------------
+# Threat patterns (6 categories, ~20 patterns)
+#
+# Each entry: "CATEGORY:::SEVERITY:::REGEX:::SUGGESTION"
+# Uses ::: as delimiter to avoid conflicts with regex pipe characters
+# ---------------------------------------------------------------------------
+PATTERNS=(
+  # Destructive file operations
+  "destructive_file_ops:::critical:::rm -rf /:::Use targeted 'rm' on specific paths instead of root"
+  "destructive_file_ops:::critical:::rm -rf ~:::Use targeted 'rm' on specific paths instead of home directory"
+  "destructive_file_ops:::critical:::rm -rf \.:::Use targeted 'rm' on specific files instead of current directory"
+  "destructive_file_ops:::critical:::rm -rf \.\.:::Never remove parent directories recursively"
+  "destructive_file_ops:::critical:::(rm|del|unlink).*\.env:::Use 'mv' to back up .env files before removing"
+  "destructive_file_ops:::critical:::(rm|del|unlink).*\.git[^i]:::Never delete .git directory — use 'git' commands to manage repo state"
+
+  # Destructive git operations
+  "destructive_git_ops:::critical:::git push --force.*(main|master):::Use 'git push --force-with-lease' or push to a feature branch"
+  "destructive_git_ops:::critical:::git push -f.*(main|master):::Use 'git push --force-with-lease' or push to a feature branch"
+  "destructive_git_ops:::high:::git reset --hard:::Use 'git stash' to preserve changes, or 'git reset --soft'"
+  "destructive_git_ops:::high:::git clean -fd:::Use 'git clean -n' (dry run) first to preview what will be deleted"
+
+  # Database destruction
+  "database_destruction:::critical:::DROP TABLE:::Use 'ALTER TABLE' or create a migration with rollback support"
+  "database_destruction:::critical:::DROP DATABASE:::Create a backup first; consider revoking DROP privileges"
+  "database_destruction:::critical:::TRUNCATE:::Use 'DELETE FROM ... WHERE' with a condition for safer data removal"
+  "database_destruction:::high:::DELETE FROM [a-zA-Z_]+ *;:::Add a WHERE clause to 'DELETE FROM' to avoid deleting all rows"
+
+  # Permission abuse
+  "permission_abuse:::high:::chmod 777:::Use 'chmod 755' for directories or 'chmod 644' for files"
+  "permission_abuse:::high:::chmod -R 777:::Use specific permissions ('chmod -R 755') and limit scope"
+
+  # Network exfiltration
+  "network_exfiltration:::critical:::curl.*\|.*bash:::Download the script first, review it, then execute"
+  "network_exfiltration:::critical:::wget.*\|.*sh:::Download the script first, review it, then execute"
+  "network_exfiltration:::high:::curl.*--data.*@:::Review what data is being sent before using 'curl --data @file'"
+
+  # System danger
+  "system_danger:::high:::sudo :::Avoid 'sudo' — run commands with the least privilege needed"
+  "system_danger:::high:::npm publish:::Use 'npm publish --dry-run' first to verify package contents"
+)
+
+# ---------------------------------------------------------------------------
+# Escape a string for safe JSON embedding
+# ---------------------------------------------------------------------------
+json_escape() {
+  printf '%s' "$1" | sed 's/\\/\\\\/g; s/"/\\"/g; s/	/\\t/g'
+}
+
+# ---------------------------------------------------------------------------
+# Scan combined text against threat patterns
+# ---------------------------------------------------------------------------
+THREATS=()
+THREAT_COUNT=0
+
+for entry in "${PATTERNS[@]}"; do
+  category="${entry%%:::*}"
+  rest="${entry#*:::}"
+  severity="${rest%%:::*}"
+  rest="${rest#*:::}"
+  regex="${rest%%:::*}"
+  suggestion="${rest#*:::}"
+
+  if printf '%s\n' "$COMBINED" | grep -qiE "$regex" 2>/dev/null; then
+    local_match=$(printf '%s\n' "$COMBINED" | grep -oiE "$regex" 2>/dev/null | head -1)
+    THREATS+=("${category}	${severity}	${local_match}	${suggestion}")
+    THREAT_COUNT=$((THREAT_COUNT + 1))
+  fi
+done
+
+# ---------------------------------------------------------------------------
+# Output and logging
+# ---------------------------------------------------------------------------
+if [[ $THREAT_COUNT -gt 0 ]]; then
+  echo ""
+  echo "🛡️  Tool Guardian: $THREAT_COUNT threat(s) detected in '$TOOL_NAME' invocation"
+  echo ""
+  printf "  %-24s %-10s %-40s %s\n" "CATEGORY" "SEVERITY" "MATCH" "SUGGESTION"
+  printf "  %-24s %-10s %-40s %s\n" "--------" "--------" "-----" "----------"
+
+  # Build JSON findings array
+  FINDINGS_JSON="["
+  FIRST=true
+  for threat in "${THREATS[@]}"; do
+    IFS=$'\t' read -r category severity match suggestion <<< "$threat"
+
+    # Truncate match for display
+    display_match="$match"
+    if [[ ${#match} -gt 38 ]]; then
+      display_match="${match:0:35}..."
+    fi
+    printf "  %-24s %-10s %-40s %s\n" "$category" "$severity" "$display_match" "$suggestion"
+
+    if [[ "$FIRST" != "true" ]]; then
+      FINDINGS_JSON+=","
+    fi
+    FIRST=false
+    FINDINGS_JSON+="{\"category\":\"$(json_escape "$category")\",\"severity\":\"$(json_escape "$severity")\",\"match\":\"$(json_escape "$match")\",\"suggestion\":\"$(json_escape "$suggestion")\"}"
+  done
+  FINDINGS_JSON+="]"
+
+  echo ""
+
+  # Write structured log entry
+  printf '{"timestamp":"%s","event":"threats_detected","mode":"%s","tool":"%s","threat_count":%d,"threats":%s}\n' \
+    "$TIMESTAMP" "$MODE" "$(json_escape "$TOOL_NAME")" "$THREAT_COUNT" "$FINDINGS_JSON" >> "$LOG_FILE"
+
+  if [[ "$MODE" == "block" ]]; then
+    echo "🚫 Operation blocked: resolve the threats above or adjust TOOL_GUARD_ALLOWLIST."
+    echo "   Set GUARD_MODE=warn to log without blocking."
+    exit 1
+  else
+    echo "⚠️  Threats logged in warn mode. Set GUARD_MODE=block to prevent dangerous operations."
+  fi
+else
+  # Log clean result
+  printf '{"timestamp":"%s","event":"guard_passed","mode":"%s","tool":"%s"}\n' \
+    "$TIMESTAMP" "$MODE" "$(json_escape "$TOOL_NAME")" >> "$LOG_FILE"
+fi
+
+exit 0
@@ -0,0 +1,16 @@
+{
+  "version": 1,
+  "hooks": {
+    "preToolUse": [
+      {
+        "type": "command",
+        "bash": "hooks/tool-guardian/guard-tool.sh",
+        "cwd": ".",
+        "env": {
+          "GUARD_MODE": "block"
+        },
+        "timeoutSec": 10
+      }
+    ]
+  }
+}