mirror of
https://github.com/github/awesome-copilot.git
synced 2026-05-05 06:35:56 +00:00
746ba555b6
* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
7.9 KiB
7.9 KiB
name, description
| name | description |
|---|---|
| mini-context-graph | A persistent, compounding knowledge base combining Karpathy's LLM Wiki pattern with a structured knowledge graph. Ingest documents once — the LLM writes wiki pages, extracts entities/relations into the graph, and stores raw content for evidence retrieval. Knowledge accumulates and cross-references; it is never re-derived from scratch. |
Mini Context Graph Skill
The Core Idea
Standard RAG re-discovers knowledge from scratch on every query. This skill is different:
- Wiki layer — The LLM writes and maintains persistent markdown pages (summaries, entity pages, topic syntheses). Cross-references are already there. The wiki gets richer with every ingest.
- Graph layer — Entities and relations are extracted once and stored as a navigable knowledge graph. BFS traversal answers structural queries without re-reading sources.
- Raw source layer — Original documents are stored immutably with chunks. Provenance links tie every graph node and edge back to the exact text that supports it.
The LLM writes; the Python tools handle all bookkeeping.
Three Layers
| Layer | Where | What the LLM does | What Python does |
|---|---|---|---|
| Raw Sources | data/documents.json |
Reads (never modifies) | Stores chunks + metadata |
| Wiki | wiki/ (markdown) |
Writes/updates pages | Manages index.md + log.md |
| Graph | data/graph.json |
Extracts entities + relations | Persists, deduplicates, traverses |
⚡ Quick Start for Agents
from scripts.contextgraph import ContextGraphSkill
from scripts.tools import wiki_store
skill = ContextGraphSkill()
# ===== INGEST WITH FULL RAG + WIKI =====
# 1. Read references/ingestion.md and references/ontology.md first
# 2. Extract entities and relations (LLM reasoning step)
entities = [
{"name": "memory leak", "type": "issue", "supporting_text": "memory leaks cause crashes"},
{"name": "system crash", "type": "issue", "supporting_text": "system crashes due to memory leaks"},
]
relations = [
{"source": "memory leak", "target": "system crash", "type": "causes",
"confidence": 1.0, "supporting_text": "System crashes due to memory leaks."},
]
result = skill.ingest_with_content(
doc_id="doc_001",
title="System Crash Analysis",
source="/docs/incident_report.pdf",
raw_content="System crashes due to memory leaks. Memory leaks occur when objects are not released.",
entities=entities,
relations=relations,
)
# result = {"doc_id": "doc_001", "chunk_count": 1, "nodes_added": 2, "edges_added": 1}
# 3. Write a wiki summary page for this document
wiki_store.write_page(
category="summary",
title="System Crash Analysis Summary",
content="""---
title: System Crash Analysis
source_document: doc_001
tags: [summary, incident]
---
# System Crash Analysis
**Source:** incident_report.pdf
## Key Claims
- [[memory-leak]] causes [[system-crash]] (confidence: 1.0)
## Entities
- [[memory-leak]] (issue)
- [[system-crash]] (issue)
""",
summary="Incident report: memory leaks cause system crashes.",
)
# ===== QUERY WITH EVIDENCE =====
result = skill.query_with_evidence("Why does the system crash?")
# Returns: {"query": ..., "subgraph": ..., "supporting_documents": [...], "evidence_chain": ...}
# ===== WIKI SEARCH (read wiki before answering) =====
pages = wiki_store.search_wiki("memory leak")
# Returns: [{slug, category, path, snippet}, ...]
Operations
Ingest
When a user provides a new document:
- Read
references/ingestion.md— entity/relation extraction rules. - Read
references/ontology.md— type normalization rules. - Extract entities and relations using your LLM reasoning.
- Call
skill.ingest_with_content(...)— stores raw content + chunks + graph nodes + provenance. - Write a wiki summary page using
wiki_store.write_page(category="summary", ...). - Update entity pages — for each new/updated entity, write or update
wiki_store.write_page(category="entity", ...). - Update topic pages if the document touches an existing synthesis topic.
- A single document ingest will typically touch 3–10 wiki pages.
Query
When a user asks a question:
- Check the wiki first —
wiki_store.search_wiki(query)to find relevant pages. Read them. - If the wiki has a good answer, synthesize from wiki pages (fast path).
- If deeper graph traversal is needed, call
skill.query_with_evidence(query). - Return the answer with evidence citations from
supporting_documents. - If the answer is valuable, file it back as a new wiki topic page.
Lint
Periodically health-check the wiki:
from scripts.tools import wiki_store
issues = wiki_store.lint_wiki()
# Returns: {orphan_pages, missing_pages, broken_wikilinks, isolated_pages}
Ask the LLM to review and fix: broken links, orphan pages, stale claims, missing cross-references. See references/lint.md for full lint workflow.
Ingestion Constraints
- ❌ Do NOT hallucinate entities not present in the text
- ❌ Do NOT add relations without explicit textual evidence
- ❌ Do NOT add edges with confidence < 0.6
- ✅ Provide
supporting_textfor every entity and relation — this enables provenance - ✅ Write a wiki summary page for every ingested document
- ✅ Update existing entity pages when new information arrives
- ✅ Flag contradictions in wiki pages when new data conflicts with old claims
Retrieval Constraints
- 🔒 Traversal depth MUST NOT exceed 2 (config: MAX_GRAPH_DEPTH)
- 🔒 Only edges with confidence ≥ 0.6 (config: MIN_CONFIDENCE)
- 🔒 Maximum 50 nodes returned (config: MAX_NODES)
- ❌ Do NOT fabricate nodes or edges not in the graph
Full Python API Reference
| Method | Purpose | When to Use |
|---|---|---|
skill.ingest_with_content(doc_id, title, source, raw_content, entities, relations) |
Full RAG ingest: raw docs + graph + provenance | Every new document |
skill.add_node(name, node_type) |
Add single entity (no provenance) | Quick additions without a source doc |
skill.add_edge(source_name, target_name, relation, confidence) |
Add single relation | Quick additions without a source doc |
skill.query(query) |
Graph-only retrieval → subgraph | Structural queries |
skill.query_with_evidence(query) |
Graph + provenance → subgraph + source chunks | Queries requiring citations |
wiki_store.write_page(category, title, content, summary) |
Write/update a wiki page | After every ingest; after answering queries |
wiki_store.read_page(category, title) |
Read a wiki page | Before answering; for cross-referencing |
wiki_store.search_wiki(query) |
Keyword search across wiki | Fast path before graph traversal |
wiki_store.list_pages(category) |
List all wiki pages | Getting an overview |
wiki_store.get_log(last_n) |
Read recent operations | Understanding wiki history |
wiki_store.lint_wiki() |
Health check | Periodic maintenance |
documents_store.list_documents() |
List all ingested raw sources | Audit / provenance checking |
documents_store.search_chunks(query) |
Chunk-level search | Finding specific evidence |
Design Philosophy
"The wiki is a persistent, compounding artifact. The cross-references are already there. The synthesis already reflects everything you've read." — Karpathy
| Layer | What Happens | Who Owns It |
|---|---|---|
| LLM Reasoning | Extraction, synthesis, writing wiki pages | Agent (.md guidance files) |
| Wiki Persistence | Index, log, file I/O | wiki_store.py |
| Graph Persistence | Dedup, index, BFS traverse | graph_store.py, retrieval_engine.py |
| Raw Source Storage | Immutable docs + chunks + provenance | documents_store.py |
The human curates sources and asks questions. The LLM writes the wiki, extracts the graph, and answers with citations. Python handles all bookkeeping.