* add mini-context-graph skill * remove pycache files * filename case update to SKILL.md * update readme
5.0 KiB
Ingestion Instructions
This file defines how the agent extracts entities and relations from a raw document.
Step 1: Read the Document
Read the provided text carefully. Identify:
- Entities: noun phrases that refer to real-world objects, systems, components, actors, concepts, or events.
- Relations: verb phrases that describe how one entity affects, contains, causes, uses, or is related to another.
Step 2: Extract Entities
For each entity:
- Record its name (normalized: lowercase, strip leading/trailing whitespace)
- Assign a type: a short label (1–3 words) that categorizes the entity
Entity Type Examples
| Entity Name | Suggested Type |
|---|---|
| Python interpreter | software |
| memory leak | issue |
| operating system | system |
| database | infrastructure |
| user | actor |
| API endpoint | interface |
| server | infrastructure |
Rules:
- Types must be general enough to reuse across documents
- Do NOT create unique types per entity (e.g., avoid
python-interpreter-type) - Use
ontology.mdnormalization rules to canonicalize types
Step 3: Extract Relations
For each pair of entities with an explicit connection in the text:
- Record the source entity name
- Record the target entity name
- Record the relation type: a verb or verb phrase (normalized: lowercase)
- Assign a confidence score between 0 and 1:
- 1.0 = stated explicitly ("A causes B")
- 0.8 = strongly implied ("A is linked to B")
- 0.6 = weakly implied ("A may affect B")
- < 0.6 = do NOT include
Step 4: Output Format
Produce a JSON object in this exact format:
{
"entities": [
{ "name": "entity name", "type": "entity type", "supporting_text": "exact quote mentioning this entity" }
],
"relations": [
{
"source": "source entity name",
"target": "target entity name",
"type": "relation type",
"confidence": 0.9,
"supporting_text": "exact quote that justifies this relation"
}
]
}
The supporting_text field is required for provenance. It must be a verbatim or near-verbatim quote from the document that mentions or supports the entity/relation. This is what links graph nodes and edges back to their source.
Rules
- All names and types must be lowercase
- Only include relations where both entities are present in the entities list
- Do NOT invent entities or relations not supported by the text
- Prefer reusing existing entity and relation types from the ontology over creating new ones
- One entity can appear in multiple relations (as source or target)
- Always include
supporting_text— this enables evidence retrieval and audit trails
Step 5: Write Wiki Pages (Required)
After calling skill.ingest_with_content(...), you MUST write wiki pages:
5a. Write a summary page for the document
from scripts.tools import wiki_store
wiki_store.write_page(
category="summary",
title=f"{title} Summary",
content=f"""---
title: {title}
source_document: {doc_id}
tags: [summary]
---
# {title}
**Source:** {source}
## Key Claims
{chr(10).join(f'- [[{r["source"].replace(" ", "-")}]] {r["type"]} [[{r["target"].replace(" ", "-")}]] (confidence: {r["confidence"]})' for r in relations)}
## Entities
{chr(10).join(f'- [[{e["name"].replace(" ", "-")}]] ({e["type"]})' for e in entities)}
## Open Questions
- (Add questions from reading the document here)
""",
summary=f"Summary of {title}",
)
5b. Write or update entity pages
For each new entity not already in the wiki, write an entity page:
wiki_store.write_page(
category="entity",
title=entity_name,
content=f"""---
title: {entity_name}
type: {entity_type}
source_document: {doc_id}
tags: [{entity_type}]
---
# {entity_name}
(Description from the document or prior knowledge.)
## Relations
(List any wikilinks to related entities extracted from relations.)
## Mentioned in
- [[{doc_id}-summary]]
""",
summary=f"{entity_name}: {entity_type}",
)
For existing entity pages, read the current page and append new information, updated relations, or flag contradictions.
Example
Input document:
System crashes due to memory leaks.
Memory leaks occur when objects are not released.
Expected extraction output:
{
"entities": [
{ "name": "system crash", "type": "issue", "supporting_text": "system crashes due to memory leaks" },
{ "name": "memory leak", "type": "issue", "supporting_text": "memory leaks occur when objects are not released" },
{ "name": "object", "type": "component", "supporting_text": "objects are not released" }
],
"relations": [
{
"source": "memory leak",
"target": "system crash",
"type": "causes",
"confidence": 1.0,
"supporting_text": "System crashes due to memory leaks."
},
{
"source": "object",
"target": "memory leak",
"type": "contributes to",
"confidence": 0.9,
"supporting_text": "Memory leaks occur when objects are not released."
}
]
}