add mini-context-graph skill (#1580)

* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
2026-05-05 14:42:12 +00:00 · 2026-05-05 09:34:37 +05:30
parent 1f96bce626
commit 746ba555b6
16 changed files with 2343 additions and 0 deletions
@@ -0,0 +1,196 @@
+# Ingestion Instructions
+
+This file defines how the agent extracts entities and relations from a raw document.
+
+---
+
+## Step 1: Read the Document
+
+Read the provided text carefully. Identify:
+- **Entities**: noun phrases that refer to real-world objects, systems, components, actors, concepts, or events.
+- **Relations**: verb phrases that describe how one entity affects, contains, causes, uses, or is related to another.
+
+---
+
+## Step 2: Extract Entities
+
+For each entity:
+- Record its **name** (normalized: lowercase, strip leading/trailing whitespace)
+- Assign a **type**: a short label (1–3 words) that categorizes the entity
+
+### Entity Type Examples
+
+| Entity Name | Suggested Type |
+|-------------|---------------|
+| Python interpreter | software |
+| memory leak | issue |
+| operating system | system |
+| database | infrastructure |
+| user | actor |
+| API endpoint | interface |
+| server | infrastructure |
+
+**Rules:**
+- Types must be general enough to reuse across documents
+- Do NOT create unique types per entity (e.g., avoid `python-interpreter-type`)
+- Use `ontology.md` normalization rules to canonicalize types
+
+---
+
+## Step 3: Extract Relations
+
+For each pair of entities with an explicit connection in the text:
+- Record the **source** entity name
+- Record the **target** entity name
+- Record the **relation type**: a verb or verb phrase (normalized: lowercase)
+- Assign a **confidence** score between 0 and 1:
+  - 1.0 = stated explicitly ("A causes B")
+  - 0.8 = strongly implied ("A is linked to B")
+  - 0.6 = weakly implied ("A may affect B")
+  - < 0.6 = do NOT include
+
+---
+
+## Step 4: Output Format
+
+Produce a JSON object in this exact format:
+
+```json
+{
+  "entities": [
+    { "name": "entity name", "type": "entity type", "supporting_text": "exact quote mentioning this entity" }
+  ],
+  "relations": [
+    {
+      "source": "source entity name",
+      "target": "target entity name",
+      "type": "relation type",
+      "confidence": 0.9,
+      "supporting_text": "exact quote that justifies this relation"
+    }
+  ]
+}
+```
+
+The `supporting_text` field is **required for provenance**. It must be a verbatim or near-verbatim quote from the document that mentions or supports the entity/relation. This is what links graph nodes and edges back to their source.
+
+---
+
+## Rules
+
+- All names and types must be **lowercase**
+- Only include relations where **both entities** are present in the entities list
+- Do NOT invent entities or relations not supported by the text
+- Prefer **reusing existing entity and relation types** from the ontology over creating new ones
+- One entity can appear in multiple relations (as source or target)
+- Always include `supporting_text` — this enables evidence retrieval and audit trails
+
+---
+
+## Step 5: Write Wiki Pages (Required)
+
+After calling `skill.ingest_with_content(...)`, you MUST write wiki pages:
+
+### 5a. Write a summary page for the document
+
+```python
+from scripts.tools import wiki_store
+
+wiki_store.write_page(
+    category="summary",
+    title=f"{title} Summary",
+    content=f"""---
+title: {title}
+source_document: {doc_id}
+tags: [summary]
+---
+
+# {title}
+
+**Source:** {source}
+
+## Key Claims
+
+{chr(10).join(f'- [[{r["source"].replace(" ", "-")}]] {r["type"]} [[{r["target"].replace(" ", "-")}]] (confidence: {r["confidence"]})' for r in relations)}
+
+## Entities
+
+{chr(10).join(f'- [[{e["name"].replace(" ", "-")}]] ({e["type"]})' for e in entities)}
+
+## Open Questions
+
+- (Add questions from reading the document here)
+""",
+    summary=f"Summary of {title}",
+)
+```
+
+### 5b. Write or update entity pages
+
+For each **new** entity not already in the wiki, write an entity page:
+
+```python
+wiki_store.write_page(
+    category="entity",
+    title=entity_name,
+    content=f"""---
+title: {entity_name}
+type: {entity_type}
+source_document: {doc_id}
+tags: [{entity_type}]
+---
+
+# {entity_name}
+
+(Description from the document or prior knowledge.)
+
+## Relations
+
+(List any wikilinks to related entities extracted from relations.)
+
+## Mentioned in
+
+- [[{doc_id}-summary]]
+""",
+    summary=f"{entity_name}: {entity_type}",
+)
+```
+
+For **existing** entity pages, read the current page and append new information, updated relations, or flag contradictions.
+
+---
+
+## Example
+
+**Input document:**
+```
+System crashes due to memory leaks.
+Memory leaks occur when objects are not released.
+```
+
+**Expected extraction output:**
+```json
+{
+  "entities": [
+    { "name": "system crash", "type": "issue",     "supporting_text": "system crashes due to memory leaks" },
+    { "name": "memory leak",  "type": "issue",     "supporting_text": "memory leaks occur when objects are not released" },
+    { "name": "object",       "type": "component", "supporting_text": "objects are not released" }
+  ],
+  "relations": [
+    {
+      "source": "memory leak",
+      "target": "system crash",
+      "type": "causes",
+      "confidence": 1.0,
+      "supporting_text": "System crashes due to memory leaks."
+    },
+    {
+      "source": "object",
+      "target": "memory leak",
+      "type": "contributes to",
+      "confidence": 0.9,
+      "supporting_text": "Memory leaks occur when objects are not released."
+    }
+  ]
+}
+```
@@ -0,0 +1,163 @@
+# Lint Instructions
+
+This file defines the wiki health-check workflow.
+
+Run this periodically (or after a large batch of ingests) to keep the wiki
+clean and accurate. The pattern is from Karpathy's LLM Wiki: detect contradictions,
+orphans, broken links, stale claims, and data gaps.
+
+---
+
+## When to Run
+
+- After ingesting 5+ documents
+- When the user asks "check the wiki" or "health check"
+- When answers seem inconsistent or contradictory
+- Before a major synthesis or presentation
+
+---
+
+## Step 1: Run the Automated Health Check
+
+```python
+from scripts.tools import wiki_store
+
+issues = wiki_store.lint_wiki()
+# Returns:
+# {
+#   "orphan_pages": [list of slugs in files but not in index],
+#   "missing_pages": [list of slugs in index but file deleted],
+#   "broken_wikilinks": {slug: [broken link targets]},
+#   "isolated_pages": [slugs with no wikilinks at all],
+# }
+```
+
+---
+
+## Step 2: Triage Each Issue Type
+
+### Orphan Pages
+Pages exist on disk but are not in the index. They are invisible to search.
+**Fix**: Add them to the index or delete if stale.
+
+```python
+# To add to index, re-write the page (this auto-updates the index):
+wiki_store.write_page(category="...", title="...", content=existing_content)
+
+# To delete (manual step — confirm with user first):
+# rm wiki/{category}/{slug}.md
+```
+
+### Missing Pages
+In the index but the file was deleted. Dangling references.
+**Fix**: Either recreate the page from knowledge or remove from index.
+
+### Broken Wikilinks
+`[[slug]]` references that point to pages that don't exist.
+**Fix**: Create the missing page, or correct the link.
+
+### Isolated Pages
+Pages with no `[[wikilinks]]` — they are unreachable via link traversal.
+**Fix**: Add links from/to related pages.
+
+---
+
+## Step 3: Check for Contradictions
+
+Read the wiki index and scan for pages that might contradict each other:
+
+```python
+pages = wiki_store.list_pages()
+# Returns [{slug, category, summary, date}, ...]
+```
+
+Look for:
+- Same entity with conflicting `type` in different pages
+- Same relation with different direction in different pages
+- Newer ingests that update/supersede older claims
+
+**When you find a contradiction:**
+- Add a `## Contradictions` section to the relevant entity/topic pages:
+  ```markdown
+  ## Contradictions
+  - doc_001 says X; doc_003 says not-X — unresolved
+  ```
+- Flag it in the log:
+  ```python
+  # Handled by wiki_store.write_page which auto-appends to log.md
+  ```
+
+---
+
+## Step 4: Check for Stale Claims
+
+Review pages ingested more than N days ago (use the `date` field from the index).
+Ask: "Has any newer document superseded this claim?"
+
+**When a claim is stale:**
+- Update the page: add a `## Superseded` section or update the body.
+- Mark the old claim with _(superseded by [[newer-doc-summary]])_.
+
+---
+
+## Step 5: Check for Missing Cross-References
+
+For each entity page, check: does it link back to all summary pages that mention it?
+For each summary page, check: does it link to all entity pages it extracted?
+
+**Fix**: Read the page and add missing `[[slug]]` links.
+
+---
+
+## Step 6: Identify Data Gaps
+
+Review entity pages that lack:
+- A proper description (just a stub)
+- Any `## Relations` section
+- Any `## Mentioned in` links
+
+These are candidates for deeper research or new ingests.
+
+---
+
+## Step 7: Log the Lint Pass
+
+```python
+# wiki_store.write_page automatically logs the activity.
+# For a manual lint summary, append to log.md via write_page on a topic:
+wiki_store.write_page(
+    category="topic",
+    title="Lint Pass YYYY-MM-DD",
+    content="# Lint Pass\n\n## Issues Found\n\n...\n\n## Fixed\n\n...",
+    summary="Lint pass results",
+)
+```
+
+---
+
+## Quick Lint Commands
+
+```python
+from scripts.tools import wiki_store
+
+# Full health check
+issues = wiki_store.lint_wiki()
+
+# Get recent history
+log = wiki_store.get_log(last_n=10)
+
+# List all pages
+all_pages = wiki_store.list_pages()
+
+# Search for a concept across wiki
+results = wiki_store.search_wiki("memory leak")
+```
+
+---
+
+## Rules
+
+- NEVER delete pages without user confirmation
+- NEVER auto-resolve a contradiction — flag it for human review
+- File all lint results as a topic page in the wiki (so the history is visible)
+- Prefer adding cross-references over rewriting existing content
@@ -0,0 +1,99 @@
+# Ontology Instructions
+
+This file defines the rules for maintaining and evolving the dynamic ontology used by the Context Graph.
+
+---
+
+## Core Principle
+
+The ontology is **NOT fixed**. Types and relations emerge from documents as they are ingested.
+However, the ontology must remain **compact, consistent, and reusable**.
+
+---
+
+## Entity Type Rules
+
+### Normalization
+
+When assigning an entity type, apply these transformations:
+1. Convert to **lowercase**
+2. Strip leading/trailing whitespace
+3. Replace underscores and hyphens with spaces
+4. Merge synonymous types using the mapping table below
+
+### Synonym Mapping (Entity Types)
+
+| Variant | Canonical Type |
+|---------|---------------|
+| component, module, class, function | component |
+| bug, defect, fault, error, failure | issue |
+| server, host, machine, node | infrastructure |
+| user, person, operator, admin, actor | actor |
+| app, application, service, program, software | software |
+| database, datastore, db, storage | storage |
+| api, endpoint, interface, connection | interface |
+| event, incident, occurrence, trigger | event |
+| concept, idea, principle, theory | concept |
+| process, thread, task, job, workflow | process |
+
+### Adding New Types
+
+If an entity does not match any existing type:
+- Create a **new type** if it is genuinely distinct
+- Keep the label short (1–3 words, lowercase)
+- Consider whether an existing type is close enough before creating a new one
+
+### Constraint
+
+- Maximum ~50 distinct entity types across the entire ontology
+- If the limit is approached, merge similar types rather than creating new ones
+
+---
+
+## Relation Type Rules
+
+### Normalization
+
+When assigning a relation type:
+1. Convert to **lowercase**
+2. Strip whitespace
+3. Use verb phrases in **present tense** (e.g., "causes", "contains", "uses")
+4. Merge synonyms using the mapping table below
+
+### Synonym Mapping (Relation Types)
+
+| Variant | Canonical Relation |
+|---------|-------------------|
+| triggers, leads to, results in, produces | causes |
+| is part of, belongs to, lives in, sits in | contains |
+| depends on, requires, needs | depends on |
+| uses, calls, invokes, consumes | uses |
+| affects, impacts, influences | affects |
+| creates, instantiates, spawns | creates |
+| connects to, links to, references | connects to |
+| inherits from, extends, subclasses | extends |
+| reads from, queries, fetches | reads from |
+| writes to, stores in, persists to | writes to |
+
+### Adding New Relations
+
+- Only add new relation types if no existing type accurately describes the relationship
+- Prefer canonical relations over creating new ones
+
+---
+
+## Ontology Update Protocol
+
+When processing extracted entities/relations from `ingestion.md`:
+
+1. For each entity type:
+   - Run through the synonym mapping
+   - Call `ontology_store.normalize_type(type_name)` to get the canonical form
+   - Call `ontology_store.add_type(canonical_type)` to register it
+
+2. For each relation type:
+   - Run through the synonym mapping
+   - Call `ontology_store.normalize_relation(relation_name)` to get the canonical form
+   - Call `ontology_store.add_relation(canonical_relation)` to register it
+
+3. Use the **canonical** type/relation names when creating nodes and edges in the graph.
@@ -0,0 +1,163 @@
+# Retrieval Instructions
+
+This file defines how the agent answers queries using the two-layer retrieval strategy:
+**wiki-first** (fast path), then **graph traversal with evidence** (deep path).
+
+---
+
+## Overview
+
+Retrieval is a 7-step process:
+
+1. Parse the query
+2. **Check the wiki first** (fast path)
+3. Find seed nodes in the graph
+4. Expand the graph via BFS
+5. Prune noisy nodes
+6. Build the subgraph with provenance
+7. Return structured context
+
+---
+
+## Step 1: Parse the Query
+
+Read the query string and identify:
+- **Key noun phrases**: potential entity names (e.g., "system crash", "memory leak")
+- **Keywords**: individual meaningful words (e.g., "crash", "leak", "memory")
+- Normalize all terms to **lowercase**
+
+Ignore stopwords (e.g., "the", "a", "is", "why", "does", "how", "what").
+
+---
+
+## Step 2: Check the Wiki First (Fast Path)
+
+Before touching the graph, search the wiki. The wiki contains compiled knowledge —
+cross-references already resolved, contradictions flagged, syntheses written.
+
+```python
+from scripts.tools import wiki_store
+
+results = wiki_store.search_wiki(query)
+```
+
+For each relevant result, read the page:
+
+```python
+content = wiki_store.read_page_by_slug(result["slug"])
+```
+
+**If the wiki has a sufficient answer:**
+- Synthesize from wiki pages.
+- Cite the source pages (e.g., "According to [[memory-leak]] and [[system-crash]]...").
+- File the answer as a new wiki topic page if it's valuable and not already captured:
+  ```python
+  wiki_store.write_page(category="topic", title="Why System Crashes", content=..., summary=...)
+  ```
+- **Return early** — no graph traversal needed.
+
+**If the wiki answer is incomplete or missing:** proceed to Step 3.
+
+---
+
+## Step 3: Find Seed Nodes
+
+Call `index_store.search(query)` with the original query string.
+
+This returns node IDs matching entity names or keywords.
+
+If no seed nodes are found:
+- Try searching with individual keywords from Step 1.
+- If still no results, return an empty subgraph: "No relevant entities found."
+
+---
+
+## Step 4: Expand the Graph (BFS)
+
+Call `retrieval_engine.retrieve(seed_node_ids, depth=2)`.
+
+BFS from seed nodes:
+- **Depth 1**: direct neighbors
+- **Depth 2**: neighbors of neighbors
+
+Rules:
+- Only traverse edges with confidence ≥ MIN_CONFIDENCE (from config.py)
+- Do NOT traverse beyond depth 2
+- Collect all visited node IDs
+
+---
+
+## Step 5: Prune Nodes
+
+- Limit total nodes to MAX_NODES (from config.py)
+- Prioritize:
+  1. Seed nodes (always include)
+  2. Nodes at depth 1
+  3. Nodes at depth 2 (as space allows)
+- Remove nodes only weakly connected (edge confidence < MIN_CONFIDENCE)
+
+---
+
+## Step 6: Build the Subgraph with Provenance
+
+For a standard query, call:
+
+```python
+subgraph = skill.query(query)
+# Returns: {"nodes": {node_id: {name, type, source_document, source_chunks}},
+#           "edges": [{source, target, type, confidence, source_document, supporting_text, chunk_id}]}
+```
+
+For queries requiring evidence (citations, fact-checking), call:
+
+```python
+result = skill.query_with_evidence(query)
+# Returns:
+# {
+#   "query": str,
+#   "subgraph": {"nodes": {...}, "edges": [...]},
+#   "supporting_documents": [
+#     {
+#       "doc_id": str,
+#       "doc_title": str,
+#       "supporting_chunks": [{"chunk_id": str, "text": str}, ...]
+#     }
+#   ],
+#   "evidence_chain": "memory leak --[causes]--> system crash"
+# }
+```
+
+---
+
+## Step 7: Return Structured Context
+
+Return the result with:
+- **Subgraph**: nodes + edges (the graph answer)
+- **Supporting documents**: source chunks that prove each relation
+- **Evidence chain**: human-readable path summary
+- **Wiki references**: links to relevant wiki pages found in Step 2
+
+**If valuable, file the answer back into the wiki:**
+
+```python
+wiki_store.write_page(
+    category="topic",
+    title=query,
+    content=f"# {query}\n\n**Evidence chain:** {result['evidence_chain']}\n\n...",
+    summary="...",
+)
+```
+
+This way, future queries on the same topic find the answer instantly in the wiki.
+
+---
+
+## Rules
+
+- NEVER fabricate nodes or edges not present in the graph
+- NEVER traverse deeper than depth 2
+- ALWAYS check the wiki before the graph (wiki-first)
+- Always include seed nodes in the result, even if they have no edges
+- Prefer edges with higher confidence when pruning
+- File valuable answers back into the wiki as topic pages
+- Return an empty subgraph (not an error) if no relevant nodes are found