add mini-context-graph skill (#1580)

* add mini-context-graph skill

* remove pycache files

* filename case update to SKILL.md

* update readme
This commit is contained in:
Nixon Kurian
2026-05-05 09:34:37 +05:30
committed by GitHub
parent 1f96bce626
commit 746ba555b6
16 changed files with 2343 additions and 0 deletions
@@ -0,0 +1,196 @@
# Ingestion Instructions
This file defines how the agent extracts entities and relations from a raw document.
---
## Step 1: Read the Document
Read the provided text carefully. Identify:
- **Entities**: noun phrases that refer to real-world objects, systems, components, actors, concepts, or events.
- **Relations**: verb phrases that describe how one entity affects, contains, causes, uses, or is related to another.
---
## Step 2: Extract Entities
For each entity:
- Record its **name** (normalized: lowercase, strip leading/trailing whitespace)
- Assign a **type**: a short label (13 words) that categorizes the entity
### Entity Type Examples
| Entity Name | Suggested Type |
|-------------|---------------|
| Python interpreter | software |
| memory leak | issue |
| operating system | system |
| database | infrastructure |
| user | actor |
| API endpoint | interface |
| server | infrastructure |
**Rules:**
- Types must be general enough to reuse across documents
- Do NOT create unique types per entity (e.g., avoid `python-interpreter-type`)
- Use `ontology.md` normalization rules to canonicalize types
---
## Step 3: Extract Relations
For each pair of entities with an explicit connection in the text:
- Record the **source** entity name
- Record the **target** entity name
- Record the **relation type**: a verb or verb phrase (normalized: lowercase)
- Assign a **confidence** score between 0 and 1:
- 1.0 = stated explicitly ("A causes B")
- 0.8 = strongly implied ("A is linked to B")
- 0.6 = weakly implied ("A may affect B")
- < 0.6 = do NOT include
---
## Step 4: Output Format
Produce a JSON object in this exact format:
```json
{
"entities": [
{ "name": "entity name", "type": "entity type", "supporting_text": "exact quote mentioning this entity" }
],
"relations": [
{
"source": "source entity name",
"target": "target entity name",
"type": "relation type",
"confidence": 0.9,
"supporting_text": "exact quote that justifies this relation"
}
]
}
```
The `supporting_text` field is **required for provenance**. It must be a verbatim or near-verbatim quote from the document that mentions or supports the entity/relation. This is what links graph nodes and edges back to their source.
---
## Rules
- All names and types must be **lowercase**
- Only include relations where **both entities** are present in the entities list
- Do NOT invent entities or relations not supported by the text
- Prefer **reusing existing entity and relation types** from the ontology over creating new ones
- One entity can appear in multiple relations (as source or target)
- Always include `supporting_text` — this enables evidence retrieval and audit trails
---
## Step 5: Write Wiki Pages (Required)
After calling `skill.ingest_with_content(...)`, you MUST write wiki pages:
### 5a. Write a summary page for the document
```python
from scripts.tools import wiki_store
wiki_store.write_page(
category="summary",
title=f"{title} Summary",
content=f"""---
title: {title}
source_document: {doc_id}
tags: [summary]
---
# {title}
**Source:** {source}
## Key Claims
{chr(10).join(f'- [[{r["source"].replace(" ", "-")}]] {r["type"]} [[{r["target"].replace(" ", "-")}]] (confidence: {r["confidence"]})' for r in relations)}
## Entities
{chr(10).join(f'- [[{e["name"].replace(" ", "-")}]] ({e["type"]})' for e in entities)}
## Open Questions
- (Add questions from reading the document here)
""",
summary=f"Summary of {title}",
)
```
### 5b. Write or update entity pages
For each **new** entity not already in the wiki, write an entity page:
```python
wiki_store.write_page(
category="entity",
title=entity_name,
content=f"""---
title: {entity_name}
type: {entity_type}
source_document: {doc_id}
tags: [{entity_type}]
---
# {entity_name}
(Description from the document or prior knowledge.)
## Relations
(List any wikilinks to related entities extracted from relations.)
## Mentioned in
- [[{doc_id}-summary]]
""",
summary=f"{entity_name}: {entity_type}",
)
```
For **existing** entity pages, read the current page and append new information, updated relations, or flag contradictions.
---
## Example
**Input document:**
```
System crashes due to memory leaks.
Memory leaks occur when objects are not released.
```
**Expected extraction output:**
```json
{
"entities": [
{ "name": "system crash", "type": "issue", "supporting_text": "system crashes due to memory leaks" },
{ "name": "memory leak", "type": "issue", "supporting_text": "memory leaks occur when objects are not released" },
{ "name": "object", "type": "component", "supporting_text": "objects are not released" }
],
"relations": [
{
"source": "memory leak",
"target": "system crash",
"type": "causes",
"confidence": 1.0,
"supporting_text": "System crashes due to memory leaks."
},
{
"source": "object",
"target": "memory leak",
"type": "contributes to",
"confidence": 0.9,
"supporting_text": "Memory leaks occur when objects are not released."
}
]
}
```
@@ -0,0 +1,163 @@
# Lint Instructions
This file defines the wiki health-check workflow.
Run this periodically (or after a large batch of ingests) to keep the wiki
clean and accurate. The pattern is from Karpathy's LLM Wiki: detect contradictions,
orphans, broken links, stale claims, and data gaps.
---
## When to Run
- After ingesting 5+ documents
- When the user asks "check the wiki" or "health check"
- When answers seem inconsistent or contradictory
- Before a major synthesis or presentation
---
## Step 1: Run the Automated Health Check
```python
from scripts.tools import wiki_store
issues = wiki_store.lint_wiki()
# Returns:
# {
# "orphan_pages": [list of slugs in files but not in index],
# "missing_pages": [list of slugs in index but file deleted],
# "broken_wikilinks": {slug: [broken link targets]},
# "isolated_pages": [slugs with no wikilinks at all],
# }
```
---
## Step 2: Triage Each Issue Type
### Orphan Pages
Pages exist on disk but are not in the index. They are invisible to search.
**Fix**: Add them to the index or delete if stale.
```python
# To add to index, re-write the page (this auto-updates the index):
wiki_store.write_page(category="...", title="...", content=existing_content)
# To delete (manual step — confirm with user first):
# rm wiki/{category}/{slug}.md
```
### Missing Pages
In the index but the file was deleted. Dangling references.
**Fix**: Either recreate the page from knowledge or remove from index.
### Broken Wikilinks
`[[slug]]` references that point to pages that don't exist.
**Fix**: Create the missing page, or correct the link.
### Isolated Pages
Pages with no `[[wikilinks]]` — they are unreachable via link traversal.
**Fix**: Add links from/to related pages.
---
## Step 3: Check for Contradictions
Read the wiki index and scan for pages that might contradict each other:
```python
pages = wiki_store.list_pages()
# Returns [{slug, category, summary, date}, ...]
```
Look for:
- Same entity with conflicting `type` in different pages
- Same relation with different direction in different pages
- Newer ingests that update/supersede older claims
**When you find a contradiction:**
- Add a `## Contradictions` section to the relevant entity/topic pages:
```markdown
## Contradictions
- doc_001 says X; doc_003 says not-X — unresolved
```
- Flag it in the log:
```python
# Handled by wiki_store.write_page which auto-appends to log.md
```
---
## Step 4: Check for Stale Claims
Review pages ingested more than N days ago (use the `date` field from the index).
Ask: "Has any newer document superseded this claim?"
**When a claim is stale:**
- Update the page: add a `## Superseded` section or update the body.
- Mark the old claim with _(superseded by [[newer-doc-summary]])_.
---
## Step 5: Check for Missing Cross-References
For each entity page, check: does it link back to all summary pages that mention it?
For each summary page, check: does it link to all entity pages it extracted?
**Fix**: Read the page and add missing `[[slug]]` links.
---
## Step 6: Identify Data Gaps
Review entity pages that lack:
- A proper description (just a stub)
- Any `## Relations` section
- Any `## Mentioned in` links
These are candidates for deeper research or new ingests.
---
## Step 7: Log the Lint Pass
```python
# wiki_store.write_page automatically logs the activity.
# For a manual lint summary, append to log.md via write_page on a topic:
wiki_store.write_page(
category="topic",
title="Lint Pass YYYY-MM-DD",
content="# Lint Pass\n\n## Issues Found\n\n...\n\n## Fixed\n\n...",
summary="Lint pass results",
)
```
---
## Quick Lint Commands
```python
from scripts.tools import wiki_store
# Full health check
issues = wiki_store.lint_wiki()
# Get recent history
log = wiki_store.get_log(last_n=10)
# List all pages
all_pages = wiki_store.list_pages()
# Search for a concept across wiki
results = wiki_store.search_wiki("memory leak")
```
---
## Rules
- NEVER delete pages without user confirmation
- NEVER auto-resolve a contradiction — flag it for human review
- File all lint results as a topic page in the wiki (so the history is visible)
- Prefer adding cross-references over rewriting existing content
@@ -0,0 +1,99 @@
# Ontology Instructions
This file defines the rules for maintaining and evolving the dynamic ontology used by the Context Graph.
---
## Core Principle
The ontology is **NOT fixed**. Types and relations emerge from documents as they are ingested.
However, the ontology must remain **compact, consistent, and reusable**.
---
## Entity Type Rules
### Normalization
When assigning an entity type, apply these transformations:
1. Convert to **lowercase**
2. Strip leading/trailing whitespace
3. Replace underscores and hyphens with spaces
4. Merge synonymous types using the mapping table below
### Synonym Mapping (Entity Types)
| Variant | Canonical Type |
|---------|---------------|
| component, module, class, function | component |
| bug, defect, fault, error, failure | issue |
| server, host, machine, node | infrastructure |
| user, person, operator, admin, actor | actor |
| app, application, service, program, software | software |
| database, datastore, db, storage | storage |
| api, endpoint, interface, connection | interface |
| event, incident, occurrence, trigger | event |
| concept, idea, principle, theory | concept |
| process, thread, task, job, workflow | process |
### Adding New Types
If an entity does not match any existing type:
- Create a **new type** if it is genuinely distinct
- Keep the label short (13 words, lowercase)
- Consider whether an existing type is close enough before creating a new one
### Constraint
- Maximum ~50 distinct entity types across the entire ontology
- If the limit is approached, merge similar types rather than creating new ones
---
## Relation Type Rules
### Normalization
When assigning a relation type:
1. Convert to **lowercase**
2. Strip whitespace
3. Use verb phrases in **present tense** (e.g., "causes", "contains", "uses")
4. Merge synonyms using the mapping table below
### Synonym Mapping (Relation Types)
| Variant | Canonical Relation |
|---------|-------------------|
| triggers, leads to, results in, produces | causes |
| is part of, belongs to, lives in, sits in | contains |
| depends on, requires, needs | depends on |
| uses, calls, invokes, consumes | uses |
| affects, impacts, influences | affects |
| creates, instantiates, spawns | creates |
| connects to, links to, references | connects to |
| inherits from, extends, subclasses | extends |
| reads from, queries, fetches | reads from |
| writes to, stores in, persists to | writes to |
### Adding New Relations
- Only add new relation types if no existing type accurately describes the relationship
- Prefer canonical relations over creating new ones
---
## Ontology Update Protocol
When processing extracted entities/relations from `ingestion.md`:
1. For each entity type:
- Run through the synonym mapping
- Call `ontology_store.normalize_type(type_name)` to get the canonical form
- Call `ontology_store.add_type(canonical_type)` to register it
2. For each relation type:
- Run through the synonym mapping
- Call `ontology_store.normalize_relation(relation_name)` to get the canonical form
- Call `ontology_store.add_relation(canonical_relation)` to register it
3. Use the **canonical** type/relation names when creating nodes and edges in the graph.
@@ -0,0 +1,163 @@
# Retrieval Instructions
This file defines how the agent answers queries using the two-layer retrieval strategy:
**wiki-first** (fast path), then **graph traversal with evidence** (deep path).
---
## Overview
Retrieval is a 7-step process:
1. Parse the query
2. **Check the wiki first** (fast path)
3. Find seed nodes in the graph
4. Expand the graph via BFS
5. Prune noisy nodes
6. Build the subgraph with provenance
7. Return structured context
---
## Step 1: Parse the Query
Read the query string and identify:
- **Key noun phrases**: potential entity names (e.g., "system crash", "memory leak")
- **Keywords**: individual meaningful words (e.g., "crash", "leak", "memory")
- Normalize all terms to **lowercase**
Ignore stopwords (e.g., "the", "a", "is", "why", "does", "how", "what").
---
## Step 2: Check the Wiki First (Fast Path)
Before touching the graph, search the wiki. The wiki contains compiled knowledge —
cross-references already resolved, contradictions flagged, syntheses written.
```python
from scripts.tools import wiki_store
results = wiki_store.search_wiki(query)
```
For each relevant result, read the page:
```python
content = wiki_store.read_page_by_slug(result["slug"])
```
**If the wiki has a sufficient answer:**
- Synthesize from wiki pages.
- Cite the source pages (e.g., "According to [[memory-leak]] and [[system-crash]]...").
- File the answer as a new wiki topic page if it's valuable and not already captured:
```python
wiki_store.write_page(category="topic", title="Why System Crashes", content=..., summary=...)
```
- **Return early** — no graph traversal needed.
**If the wiki answer is incomplete or missing:** proceed to Step 3.
---
## Step 3: Find Seed Nodes
Call `index_store.search(query)` with the original query string.
This returns node IDs matching entity names or keywords.
If no seed nodes are found:
- Try searching with individual keywords from Step 1.
- If still no results, return an empty subgraph: "No relevant entities found."
---
## Step 4: Expand the Graph (BFS)
Call `retrieval_engine.retrieve(seed_node_ids, depth=2)`.
BFS from seed nodes:
- **Depth 1**: direct neighbors
- **Depth 2**: neighbors of neighbors
Rules:
- Only traverse edges with confidence ≥ MIN_CONFIDENCE (from config.py)
- Do NOT traverse beyond depth 2
- Collect all visited node IDs
---
## Step 5: Prune Nodes
- Limit total nodes to MAX_NODES (from config.py)
- Prioritize:
1. Seed nodes (always include)
2. Nodes at depth 1
3. Nodes at depth 2 (as space allows)
- Remove nodes only weakly connected (edge confidence < MIN_CONFIDENCE)
---
## Step 6: Build the Subgraph with Provenance
For a standard query, call:
```python
subgraph = skill.query(query)
# Returns: {"nodes": {node_id: {name, type, source_document, source_chunks}},
# "edges": [{source, target, type, confidence, source_document, supporting_text, chunk_id}]}
```
For queries requiring evidence (citations, fact-checking), call:
```python
result = skill.query_with_evidence(query)
# Returns:
# {
# "query": str,
# "subgraph": {"nodes": {...}, "edges": [...]},
# "supporting_documents": [
# {
# "doc_id": str,
# "doc_title": str,
# "supporting_chunks": [{"chunk_id": str, "text": str}, ...]
# }
# ],
# "evidence_chain": "memory leak --[causes]--> system crash"
# }
```
---
## Step 7: Return Structured Context
Return the result with:
- **Subgraph**: nodes + edges (the graph answer)
- **Supporting documents**: source chunks that prove each relation
- **Evidence chain**: human-readable path summary
- **Wiki references**: links to relevant wiki pages found in Step 2
**If valuable, file the answer back into the wiki:**
```python
wiki_store.write_page(
category="topic",
title=query,
content=f"# {query}\n\n**Evidence chain:** {result['evidence_chain']}\n\n...",
summary="...",
)
```
This way, future queries on the same topic find the answer instantly in the wiki.
---
## Rules
- NEVER fabricate nodes or edges not present in the graph
- NEVER traverse deeper than depth 2
- ALWAYS check the wiki before the graph (wiki-first)
- Always include seed nodes in the result, even if they have no edges
- Prefer edges with higher confidence when pruning
- File valuable answers back into the wiki as topic pages
- Return an empty subgraph (not an error) if no relevant nodes are found