mirror of
https://github.com/github/awesome-copilot.git
synced 2026-05-05 14:42:12 +00:00
add mini-context-graph skill (#1580)
* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
This commit is contained in:
@@ -0,0 +1,196 @@
|
||||
# Ingestion Instructions
|
||||
|
||||
This file defines how the agent extracts entities and relations from a raw document.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Read the Document
|
||||
|
||||
Read the provided text carefully. Identify:
|
||||
- **Entities**: noun phrases that refer to real-world objects, systems, components, actors, concepts, or events.
|
||||
- **Relations**: verb phrases that describe how one entity affects, contains, causes, uses, or is related to another.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Extract Entities
|
||||
|
||||
For each entity:
|
||||
- Record its **name** (normalized: lowercase, strip leading/trailing whitespace)
|
||||
- Assign a **type**: a short label (1–3 words) that categorizes the entity
|
||||
|
||||
### Entity Type Examples
|
||||
|
||||
| Entity Name | Suggested Type |
|
||||
|-------------|---------------|
|
||||
| Python interpreter | software |
|
||||
| memory leak | issue |
|
||||
| operating system | system |
|
||||
| database | infrastructure |
|
||||
| user | actor |
|
||||
| API endpoint | interface |
|
||||
| server | infrastructure |
|
||||
|
||||
**Rules:**
|
||||
- Types must be general enough to reuse across documents
|
||||
- Do NOT create unique types per entity (e.g., avoid `python-interpreter-type`)
|
||||
- Use `ontology.md` normalization rules to canonicalize types
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Extract Relations
|
||||
|
||||
For each pair of entities with an explicit connection in the text:
|
||||
- Record the **source** entity name
|
||||
- Record the **target** entity name
|
||||
- Record the **relation type**: a verb or verb phrase (normalized: lowercase)
|
||||
- Assign a **confidence** score between 0 and 1:
|
||||
- 1.0 = stated explicitly ("A causes B")
|
||||
- 0.8 = strongly implied ("A is linked to B")
|
||||
- 0.6 = weakly implied ("A may affect B")
|
||||
- < 0.6 = do NOT include
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Output Format
|
||||
|
||||
Produce a JSON object in this exact format:
|
||||
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{ "name": "entity name", "type": "entity type", "supporting_text": "exact quote mentioning this entity" }
|
||||
],
|
||||
"relations": [
|
||||
{
|
||||
"source": "source entity name",
|
||||
"target": "target entity name",
|
||||
"type": "relation type",
|
||||
"confidence": 0.9,
|
||||
"supporting_text": "exact quote that justifies this relation"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The `supporting_text` field is **required for provenance**. It must be a verbatim or near-verbatim quote from the document that mentions or supports the entity/relation. This is what links graph nodes and edges back to their source.
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
- All names and types must be **lowercase**
|
||||
- Only include relations where **both entities** are present in the entities list
|
||||
- Do NOT invent entities or relations not supported by the text
|
||||
- Prefer **reusing existing entity and relation types** from the ontology over creating new ones
|
||||
- One entity can appear in multiple relations (as source or target)
|
||||
- Always include `supporting_text` — this enables evidence retrieval and audit trails
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Write Wiki Pages (Required)
|
||||
|
||||
After calling `skill.ingest_with_content(...)`, you MUST write wiki pages:
|
||||
|
||||
### 5a. Write a summary page for the document
|
||||
|
||||
```python
|
||||
from scripts.tools import wiki_store
|
||||
|
||||
wiki_store.write_page(
|
||||
category="summary",
|
||||
title=f"{title} Summary",
|
||||
content=f"""---
|
||||
title: {title}
|
||||
source_document: {doc_id}
|
||||
tags: [summary]
|
||||
---
|
||||
|
||||
# {title}
|
||||
|
||||
**Source:** {source}
|
||||
|
||||
## Key Claims
|
||||
|
||||
{chr(10).join(f'- [[{r["source"].replace(" ", "-")}]] {r["type"]} [[{r["target"].replace(" ", "-")}]] (confidence: {r["confidence"]})' for r in relations)}
|
||||
|
||||
## Entities
|
||||
|
||||
{chr(10).join(f'- [[{e["name"].replace(" ", "-")}]] ({e["type"]})' for e in entities)}
|
||||
|
||||
## Open Questions
|
||||
|
||||
- (Add questions from reading the document here)
|
||||
""",
|
||||
summary=f"Summary of {title}",
|
||||
)
|
||||
```
|
||||
|
||||
### 5b. Write or update entity pages
|
||||
|
||||
For each **new** entity not already in the wiki, write an entity page:
|
||||
|
||||
```python
|
||||
wiki_store.write_page(
|
||||
category="entity",
|
||||
title=entity_name,
|
||||
content=f"""---
|
||||
title: {entity_name}
|
||||
type: {entity_type}
|
||||
source_document: {doc_id}
|
||||
tags: [{entity_type}]
|
||||
---
|
||||
|
||||
# {entity_name}
|
||||
|
||||
(Description from the document or prior knowledge.)
|
||||
|
||||
## Relations
|
||||
|
||||
(List any wikilinks to related entities extracted from relations.)
|
||||
|
||||
## Mentioned in
|
||||
|
||||
- [[{doc_id}-summary]]
|
||||
""",
|
||||
summary=f"{entity_name}: {entity_type}",
|
||||
)
|
||||
```
|
||||
|
||||
For **existing** entity pages, read the current page and append new information, updated relations, or flag contradictions.
|
||||
|
||||
---
|
||||
|
||||
## Example
|
||||
|
||||
**Input document:**
|
||||
```
|
||||
System crashes due to memory leaks.
|
||||
Memory leaks occur when objects are not released.
|
||||
```
|
||||
|
||||
**Expected extraction output:**
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{ "name": "system crash", "type": "issue", "supporting_text": "system crashes due to memory leaks" },
|
||||
{ "name": "memory leak", "type": "issue", "supporting_text": "memory leaks occur when objects are not released" },
|
||||
{ "name": "object", "type": "component", "supporting_text": "objects are not released" }
|
||||
],
|
||||
"relations": [
|
||||
{
|
||||
"source": "memory leak",
|
||||
"target": "system crash",
|
||||
"type": "causes",
|
||||
"confidence": 1.0,
|
||||
"supporting_text": "System crashes due to memory leaks."
|
||||
},
|
||||
{
|
||||
"source": "object",
|
||||
"target": "memory leak",
|
||||
"type": "contributes to",
|
||||
"confidence": 0.9,
|
||||
"supporting_text": "Memory leaks occur when objects are not released."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,163 @@
|
||||
# Lint Instructions
|
||||
|
||||
This file defines the wiki health-check workflow.
|
||||
|
||||
Run this periodically (or after a large batch of ingests) to keep the wiki
|
||||
clean and accurate. The pattern is from Karpathy's LLM Wiki: detect contradictions,
|
||||
orphans, broken links, stale claims, and data gaps.
|
||||
|
||||
---
|
||||
|
||||
## When to Run
|
||||
|
||||
- After ingesting 5+ documents
|
||||
- When the user asks "check the wiki" or "health check"
|
||||
- When answers seem inconsistent or contradictory
|
||||
- Before a major synthesis or presentation
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Run the Automated Health Check
|
||||
|
||||
```python
|
||||
from scripts.tools import wiki_store
|
||||
|
||||
issues = wiki_store.lint_wiki()
|
||||
# Returns:
|
||||
# {
|
||||
# "orphan_pages": [list of slugs in files but not in index],
|
||||
# "missing_pages": [list of slugs in index but file deleted],
|
||||
# "broken_wikilinks": {slug: [broken link targets]},
|
||||
# "isolated_pages": [slugs with no wikilinks at all],
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Triage Each Issue Type
|
||||
|
||||
### Orphan Pages
|
||||
Pages exist on disk but are not in the index. They are invisible to search.
|
||||
**Fix**: Add them to the index or delete if stale.
|
||||
|
||||
```python
|
||||
# To add to index, re-write the page (this auto-updates the index):
|
||||
wiki_store.write_page(category="...", title="...", content=existing_content)
|
||||
|
||||
# To delete (manual step — confirm with user first):
|
||||
# rm wiki/{category}/{slug}.md
|
||||
```
|
||||
|
||||
### Missing Pages
|
||||
In the index but the file was deleted. Dangling references.
|
||||
**Fix**: Either recreate the page from knowledge or remove from index.
|
||||
|
||||
### Broken Wikilinks
|
||||
`[[slug]]` references that point to pages that don't exist.
|
||||
**Fix**: Create the missing page, or correct the link.
|
||||
|
||||
### Isolated Pages
|
||||
Pages with no `[[wikilinks]]` — they are unreachable via link traversal.
|
||||
**Fix**: Add links from/to related pages.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Check for Contradictions
|
||||
|
||||
Read the wiki index and scan for pages that might contradict each other:
|
||||
|
||||
```python
|
||||
pages = wiki_store.list_pages()
|
||||
# Returns [{slug, category, summary, date}, ...]
|
||||
```
|
||||
|
||||
Look for:
|
||||
- Same entity with conflicting `type` in different pages
|
||||
- Same relation with different direction in different pages
|
||||
- Newer ingests that update/supersede older claims
|
||||
|
||||
**When you find a contradiction:**
|
||||
- Add a `## Contradictions` section to the relevant entity/topic pages:
|
||||
```markdown
|
||||
## Contradictions
|
||||
- doc_001 says X; doc_003 says not-X — unresolved
|
||||
```
|
||||
- Flag it in the log:
|
||||
```python
|
||||
# Handled by wiki_store.write_page which auto-appends to log.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Check for Stale Claims
|
||||
|
||||
Review pages ingested more than N days ago (use the `date` field from the index).
|
||||
Ask: "Has any newer document superseded this claim?"
|
||||
|
||||
**When a claim is stale:**
|
||||
- Update the page: add a `## Superseded` section or update the body.
|
||||
- Mark the old claim with _(superseded by [[newer-doc-summary]])_.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Check for Missing Cross-References
|
||||
|
||||
For each entity page, check: does it link back to all summary pages that mention it?
|
||||
For each summary page, check: does it link to all entity pages it extracted?
|
||||
|
||||
**Fix**: Read the page and add missing `[[slug]]` links.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Identify Data Gaps
|
||||
|
||||
Review entity pages that lack:
|
||||
- A proper description (just a stub)
|
||||
- Any `## Relations` section
|
||||
- Any `## Mentioned in` links
|
||||
|
||||
These are candidates for deeper research or new ingests.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Log the Lint Pass
|
||||
|
||||
```python
|
||||
# wiki_store.write_page automatically logs the activity.
|
||||
# For a manual lint summary, append to log.md via write_page on a topic:
|
||||
wiki_store.write_page(
|
||||
category="topic",
|
||||
title="Lint Pass YYYY-MM-DD",
|
||||
content="# Lint Pass\n\n## Issues Found\n\n...\n\n## Fixed\n\n...",
|
||||
summary="Lint pass results",
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Lint Commands
|
||||
|
||||
```python
|
||||
from scripts.tools import wiki_store
|
||||
|
||||
# Full health check
|
||||
issues = wiki_store.lint_wiki()
|
||||
|
||||
# Get recent history
|
||||
log = wiki_store.get_log(last_n=10)
|
||||
|
||||
# List all pages
|
||||
all_pages = wiki_store.list_pages()
|
||||
|
||||
# Search for a concept across wiki
|
||||
results = wiki_store.search_wiki("memory leak")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
- NEVER delete pages without user confirmation
|
||||
- NEVER auto-resolve a contradiction — flag it for human review
|
||||
- File all lint results as a topic page in the wiki (so the history is visible)
|
||||
- Prefer adding cross-references over rewriting existing content
|
||||
@@ -0,0 +1,99 @@
|
||||
# Ontology Instructions
|
||||
|
||||
This file defines the rules for maintaining and evolving the dynamic ontology used by the Context Graph.
|
||||
|
||||
---
|
||||
|
||||
## Core Principle
|
||||
|
||||
The ontology is **NOT fixed**. Types and relations emerge from documents as they are ingested.
|
||||
However, the ontology must remain **compact, consistent, and reusable**.
|
||||
|
||||
---
|
||||
|
||||
## Entity Type Rules
|
||||
|
||||
### Normalization
|
||||
|
||||
When assigning an entity type, apply these transformations:
|
||||
1. Convert to **lowercase**
|
||||
2. Strip leading/trailing whitespace
|
||||
3. Replace underscores and hyphens with spaces
|
||||
4. Merge synonymous types using the mapping table below
|
||||
|
||||
### Synonym Mapping (Entity Types)
|
||||
|
||||
| Variant | Canonical Type |
|
||||
|---------|---------------|
|
||||
| component, module, class, function | component |
|
||||
| bug, defect, fault, error, failure | issue |
|
||||
| server, host, machine, node | infrastructure |
|
||||
| user, person, operator, admin, actor | actor |
|
||||
| app, application, service, program, software | software |
|
||||
| database, datastore, db, storage | storage |
|
||||
| api, endpoint, interface, connection | interface |
|
||||
| event, incident, occurrence, trigger | event |
|
||||
| concept, idea, principle, theory | concept |
|
||||
| process, thread, task, job, workflow | process |
|
||||
|
||||
### Adding New Types
|
||||
|
||||
If an entity does not match any existing type:
|
||||
- Create a **new type** if it is genuinely distinct
|
||||
- Keep the label short (1–3 words, lowercase)
|
||||
- Consider whether an existing type is close enough before creating a new one
|
||||
|
||||
### Constraint
|
||||
|
||||
- Maximum ~50 distinct entity types across the entire ontology
|
||||
- If the limit is approached, merge similar types rather than creating new ones
|
||||
|
||||
---
|
||||
|
||||
## Relation Type Rules
|
||||
|
||||
### Normalization
|
||||
|
||||
When assigning a relation type:
|
||||
1. Convert to **lowercase**
|
||||
2. Strip whitespace
|
||||
3. Use verb phrases in **present tense** (e.g., "causes", "contains", "uses")
|
||||
4. Merge synonyms using the mapping table below
|
||||
|
||||
### Synonym Mapping (Relation Types)
|
||||
|
||||
| Variant | Canonical Relation |
|
||||
|---------|-------------------|
|
||||
| triggers, leads to, results in, produces | causes |
|
||||
| is part of, belongs to, lives in, sits in | contains |
|
||||
| depends on, requires, needs | depends on |
|
||||
| uses, calls, invokes, consumes | uses |
|
||||
| affects, impacts, influences | affects |
|
||||
| creates, instantiates, spawns | creates |
|
||||
| connects to, links to, references | connects to |
|
||||
| inherits from, extends, subclasses | extends |
|
||||
| reads from, queries, fetches | reads from |
|
||||
| writes to, stores in, persists to | writes to |
|
||||
|
||||
### Adding New Relations
|
||||
|
||||
- Only add new relation types if no existing type accurately describes the relationship
|
||||
- Prefer canonical relations over creating new ones
|
||||
|
||||
---
|
||||
|
||||
## Ontology Update Protocol
|
||||
|
||||
When processing extracted entities/relations from `ingestion.md`:
|
||||
|
||||
1. For each entity type:
|
||||
- Run through the synonym mapping
|
||||
- Call `ontology_store.normalize_type(type_name)` to get the canonical form
|
||||
- Call `ontology_store.add_type(canonical_type)` to register it
|
||||
|
||||
2. For each relation type:
|
||||
- Run through the synonym mapping
|
||||
- Call `ontology_store.normalize_relation(relation_name)` to get the canonical form
|
||||
- Call `ontology_store.add_relation(canonical_relation)` to register it
|
||||
|
||||
3. Use the **canonical** type/relation names when creating nodes and edges in the graph.
|
||||
@@ -0,0 +1,163 @@
|
||||
# Retrieval Instructions
|
||||
|
||||
This file defines how the agent answers queries using the two-layer retrieval strategy:
|
||||
**wiki-first** (fast path), then **graph traversal with evidence** (deep path).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Retrieval is a 7-step process:
|
||||
|
||||
1. Parse the query
|
||||
2. **Check the wiki first** (fast path)
|
||||
3. Find seed nodes in the graph
|
||||
4. Expand the graph via BFS
|
||||
5. Prune noisy nodes
|
||||
6. Build the subgraph with provenance
|
||||
7. Return structured context
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Parse the Query
|
||||
|
||||
Read the query string and identify:
|
||||
- **Key noun phrases**: potential entity names (e.g., "system crash", "memory leak")
|
||||
- **Keywords**: individual meaningful words (e.g., "crash", "leak", "memory")
|
||||
- Normalize all terms to **lowercase**
|
||||
|
||||
Ignore stopwords (e.g., "the", "a", "is", "why", "does", "how", "what").
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Check the Wiki First (Fast Path)
|
||||
|
||||
Before touching the graph, search the wiki. The wiki contains compiled knowledge —
|
||||
cross-references already resolved, contradictions flagged, syntheses written.
|
||||
|
||||
```python
|
||||
from scripts.tools import wiki_store
|
||||
|
||||
results = wiki_store.search_wiki(query)
|
||||
```
|
||||
|
||||
For each relevant result, read the page:
|
||||
|
||||
```python
|
||||
content = wiki_store.read_page_by_slug(result["slug"])
|
||||
```
|
||||
|
||||
**If the wiki has a sufficient answer:**
|
||||
- Synthesize from wiki pages.
|
||||
- Cite the source pages (e.g., "According to [[memory-leak]] and [[system-crash]]...").
|
||||
- File the answer as a new wiki topic page if it's valuable and not already captured:
|
||||
```python
|
||||
wiki_store.write_page(category="topic", title="Why System Crashes", content=..., summary=...)
|
||||
```
|
||||
- **Return early** — no graph traversal needed.
|
||||
|
||||
**If the wiki answer is incomplete or missing:** proceed to Step 3.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Find Seed Nodes
|
||||
|
||||
Call `index_store.search(query)` with the original query string.
|
||||
|
||||
This returns node IDs matching entity names or keywords.
|
||||
|
||||
If no seed nodes are found:
|
||||
- Try searching with individual keywords from Step 1.
|
||||
- If still no results, return an empty subgraph: "No relevant entities found."
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Expand the Graph (BFS)
|
||||
|
||||
Call `retrieval_engine.retrieve(seed_node_ids, depth=2)`.
|
||||
|
||||
BFS from seed nodes:
|
||||
- **Depth 1**: direct neighbors
|
||||
- **Depth 2**: neighbors of neighbors
|
||||
|
||||
Rules:
|
||||
- Only traverse edges with confidence ≥ MIN_CONFIDENCE (from config.py)
|
||||
- Do NOT traverse beyond depth 2
|
||||
- Collect all visited node IDs
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Prune Nodes
|
||||
|
||||
- Limit total nodes to MAX_NODES (from config.py)
|
||||
- Prioritize:
|
||||
1. Seed nodes (always include)
|
||||
2. Nodes at depth 1
|
||||
3. Nodes at depth 2 (as space allows)
|
||||
- Remove nodes only weakly connected (edge confidence < MIN_CONFIDENCE)
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Build the Subgraph with Provenance
|
||||
|
||||
For a standard query, call:
|
||||
|
||||
```python
|
||||
subgraph = skill.query(query)
|
||||
# Returns: {"nodes": {node_id: {name, type, source_document, source_chunks}},
|
||||
# "edges": [{source, target, type, confidence, source_document, supporting_text, chunk_id}]}
|
||||
```
|
||||
|
||||
For queries requiring evidence (citations, fact-checking), call:
|
||||
|
||||
```python
|
||||
result = skill.query_with_evidence(query)
|
||||
# Returns:
|
||||
# {
|
||||
# "query": str,
|
||||
# "subgraph": {"nodes": {...}, "edges": [...]},
|
||||
# "supporting_documents": [
|
||||
# {
|
||||
# "doc_id": str,
|
||||
# "doc_title": str,
|
||||
# "supporting_chunks": [{"chunk_id": str, "text": str}, ...]
|
||||
# }
|
||||
# ],
|
||||
# "evidence_chain": "memory leak --[causes]--> system crash"
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Return Structured Context
|
||||
|
||||
Return the result with:
|
||||
- **Subgraph**: nodes + edges (the graph answer)
|
||||
- **Supporting documents**: source chunks that prove each relation
|
||||
- **Evidence chain**: human-readable path summary
|
||||
- **Wiki references**: links to relevant wiki pages found in Step 2
|
||||
|
||||
**If valuable, file the answer back into the wiki:**
|
||||
|
||||
```python
|
||||
wiki_store.write_page(
|
||||
category="topic",
|
||||
title=query,
|
||||
content=f"# {query}\n\n**Evidence chain:** {result['evidence_chain']}\n\n...",
|
||||
summary="...",
|
||||
)
|
||||
```
|
||||
|
||||
This way, future queries on the same topic find the answer instantly in the wiki.
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
- NEVER fabricate nodes or edges not present in the graph
|
||||
- NEVER traverse deeper than depth 2
|
||||
- ALWAYS check the wiki before the graph (wiki-first)
|
||||
- Always include seed nodes in the result, even if they have no edges
|
||||
- Prefer edges with higher confidence when pruning
|
||||
- File valuable answers back into the wiki as topic pages
|
||||
- Return an empty subgraph (not an error) if no relevant nodes are found
|
||||
Reference in New Issue
Block a user