* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
4.3 KiB
Retrieval Instructions
This file defines how the agent answers queries using the two-layer retrieval strategy: wiki-first (fast path), then graph traversal with evidence (deep path).
Overview
Retrieval is a 7-step process:
- Parse the query
- Check the wiki first (fast path)
- Find seed nodes in the graph
- Expand the graph via BFS
- Prune noisy nodes
- Build the subgraph with provenance
- Return structured context
Step 1: Parse the Query
Read the query string and identify:
- Key noun phrases: potential entity names (e.g., "system crash", "memory leak")
- Keywords: individual meaningful words (e.g., "crash", "leak", "memory")
- Normalize all terms to lowercase
Ignore stopwords (e.g., "the", "a", "is", "why", "does", "how", "what").
Step 2: Check the Wiki First (Fast Path)
Before touching the graph, search the wiki. The wiki contains compiled knowledge — cross-references already resolved, contradictions flagged, syntheses written.
from scripts.tools import wiki_store
results = wiki_store.search_wiki(query)
For each relevant result, read the page:
content = wiki_store.read_page_by_slug(result["slug"])
If the wiki has a sufficient answer:
- Synthesize from wiki pages.
- Cite the source pages (e.g., "According to memory-leak and system-crash...").
- File the answer as a new wiki topic page if it's valuable and not already captured:
wiki_store.write_page(category="topic", title="Why System Crashes", content=..., summary=...) - Return early — no graph traversal needed.
If the wiki answer is incomplete or missing: proceed to Step 3.
Step 3: Find Seed Nodes
Call index_store.search(query) with the original query string.
This returns node IDs matching entity names or keywords.
If no seed nodes are found:
- Try searching with individual keywords from Step 1.
- If still no results, return an empty subgraph: "No relevant entities found."
Step 4: Expand the Graph (BFS)
Call retrieval_engine.retrieve(seed_node_ids, depth=2).
BFS from seed nodes:
- Depth 1: direct neighbors
- Depth 2: neighbors of neighbors
Rules:
- Only traverse edges with confidence ≥ MIN_CONFIDENCE (from config.py)
- Do NOT traverse beyond depth 2
- Collect all visited node IDs
Step 5: Prune Nodes
- Limit total nodes to MAX_NODES (from config.py)
- Prioritize:
- Seed nodes (always include)
- Nodes at depth 1
- Nodes at depth 2 (as space allows)
- Remove nodes only weakly connected (edge confidence < MIN_CONFIDENCE)
Step 6: Build the Subgraph with Provenance
For a standard query, call:
subgraph = skill.query(query)
# Returns: {"nodes": {node_id: {name, type, source_document, source_chunks}},
# "edges": [{source, target, type, confidence, source_document, supporting_text, chunk_id}]}
For queries requiring evidence (citations, fact-checking), call:
result = skill.query_with_evidence(query)
# Returns:
# {
# "query": str,
# "subgraph": {"nodes": {...}, "edges": [...]},
# "supporting_documents": [
# {
# "doc_id": str,
# "doc_title": str,
# "supporting_chunks": [{"chunk_id": str, "text": str}, ...]
# }
# ],
# "evidence_chain": "memory leak --[causes]--> system crash"
# }
Step 7: Return Structured Context
Return the result with:
- Subgraph: nodes + edges (the graph answer)
- Supporting documents: source chunks that prove each relation
- Evidence chain: human-readable path summary
- Wiki references: links to relevant wiki pages found in Step 2
If valuable, file the answer back into the wiki:
wiki_store.write_page(
category="topic",
title=query,
content=f"# {query}\n\n**Evidence chain:** {result['evidence_chain']}\n\n...",
summary="...",
)
This way, future queries on the same topic find the answer instantly in the wiki.
Rules
- NEVER fabricate nodes or edges not present in the graph
- NEVER traverse deeper than depth 2
- ALWAYS check the wiki before the graph (wiki-first)
- Always include seed nodes in the result, even if they have no edges
- Prefer edges with higher confidence when pruning
- File valuable answers back into the wiki as topic pages
- Return an empty subgraph (not an error) if no relevant nodes are found