Beyond RAG: Implementing Karpathy's 'LLM Wiki' Pattern with Obsidian

Standard Retrieval-Augmented Generation (RAG) pipelines have hit a hard ceiling in production. If you are building enterprise knowledge systems, you already know the bottleneck: vector RAG is fundamentally reactive. Every query starts from zero. In this post, I will break down the "LLM Wiki" pattern published by Andrej Karpathy as a GitHub Gist — currently sitting at 5,000+ stars — and show you how I implemented it with Claude Code and Obsidian.

The core shift: stop retrieving knowledge at query time, and start compiling it at ingest time.

1. The Problem with Standard RAG: Induced Amnesia

Standard RAG architectures suffer from what I call "induced amnesia." Every time a user submits a query, the system retrieves raw chunks from a vector database and forces the LLM to synthesize an answer from scratch. There is no accumulation. No compounding.

As Karpathy puts it in the Gist:

"The LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up."

When a query requires multi-hop reasoning across dozens of disconnected documents, top-K semantic retrieval leads to two compounding failure modes:

Context fragmentation: the retrieved chunks are semantically similar but logically disconnected.
Hallucination under uncertainty: the LLM fills gaps between chunks with plausible but unverifiable content.

This is not a problem you solve by tuning chunk size or switching embedding models. It is an architectural problem.

2. The LLM Wiki Pattern: Knowledge Compiled at Ingest

Karpathy's pattern inverts the pipeline entirely. Instead of retrieving from raw documents at query time, the LLM acts as an asynchronous, continuous maintainer of a persistent directory of Markdown files.

When a new document is introduced:

The LLM reads it and extracts entities, claims, and relationships.
It physically writes or updates interlinked Markdown summary files.
The knowledge graph grows incrementally — compounding with every new source.

At query time, the LLM simply reads a pre-synthesized, dense Markdown file. No embeddings. No top-K retrieval. No reconstruction from fragments.

The key insight, again from the Gist: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions are already flagged. The synthesis already reflects everything ingested.

3. The 3-Tier Architecture: Sources, Wiki, Schema

To implement this pattern robustly, the architecture is partitioned into three distinct layers:

Layer 1 — Sources (Raw): The immutable ground truth. PDFs, CSVs, raw text. The LLM has read-only access. This layer is never modified.

Layer 2 — The Wiki: A directory of LLM-generated Markdown files. Entity pages, concept summaries, indexes, and logs. The LLM has full write access here and only here.

Layer 3 — The Schema: The agent's operating instructions — a CLAUDE.md or equivalent agent directive file that deterministically defines how the LLM should parse, link, and structure data. This is the most critical layer: without a strict schema, the agent drifts and generates inconsistent structures across ingest cycles.

Defining the Schema (CLAUDE.md)

This is the production CLAUDE.md I use in my own implementation:

# LLM Wiki Schema & Agent Rules
 
You are the sole maintainer of this knowledge base.
When instructed to process a new file from the `/raw` directory,
execute the following routine without deviation:
 
1. **Read & Extract:** Analyze the raw source. Identify core entities,
   claims, metrics, and relationships.
2. **Entity Generation:** For every new entity, create a dedicated
   `[Entity Name].md` file in `/wiki/entities/`. For existing entities,
   update rather than duplicate.
3. **Bidirectional Linking:** Use Obsidian-style wikilinks (`[[Page Name]]`)
   for all cross-references. Every entity page must link back to its sources.
4. **Index Update:** Append a one-line summary and wikilink to `/wiki/index.md`.
5. **Contradiction Flagging:** If new data contradicts an existing claim,
   add a `> ⚠️ CONFLICT:` callout block in the affected page. Do not silently overwrite.
6. **Logging:** Append a chronological entry to `/wiki/log.md`:
   `## [YYYY-MM-DD] ingest | [Source Name] | [N entities created/updated]`

The contradiction flagging rule (step 5) is the most important addition beyond Karpathy's base pattern. In a long-running wiki, conflicting sources accumulate silently. Explicit flagging turns the wiki into a self-auditing system.

4. Implementation: Giving the Agent "Hands"

The agent needs a sandboxed file-system tool. The key constraint: the LLM must never be able to write outside the /wiki directory. This is a hard security boundary, not a soft convention.

import os
 
def update_wiki_file(path: str, content: str, mode: str = "w") -> str:
    """
    Sandboxed write tool for the LLM Wiki agent.
    Enforces strict path containment within /wiki.
    
    Args:
        path: Relative path within the wiki directory.
        content: Content to write or append.
        mode: 'w' to overwrite, 'a' to append.
    
    Returns:
        Confirmation string on success.
    
    Raises:
        PermissionError: If the resolved path escapes /wiki (path traversal attempt).
    """
    base_path = os.path.abspath("./wiki")
    target_path = os.path.abspath(os.path.join(base_path, path))
 
    # Hard boundary: reject any path that resolves outside /wiki
    if not target_path.startswith(base_path):
        raise PermissionError(
            f"Path traversal attempt blocked. "
            f"Resolved path '{target_path}' is outside wiki boundary '{base_path}'."
        )
 
    os.makedirs(os.path.dirname(target_path), exist_ok=True)
    with open(target_path, mode, encoding="utf-8") as f:
        f.write(content)
 
    return f"OK: {path} ({'appended' if mode == 'a' else 'written'})"

For a production agent, this function is registered as a tool in your framework of choice (Claude tool use, LangChain, OpenAI function calling). The LLM never touches the file system directly — it calls this tool, and the tool enforces the boundary.

A minimal Claude tool use registration looks like this:

tools = [
    {
        "name": "update_wiki_file",
        "description": "Write or append content to a file in the wiki directory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Relative path within /wiki, e.g. 'entities/transformer.md'"
                },
                "content": {"type": "string"},
                "mode": {"type": "string", "enum": ["w", "a"], "default": "w"}
            },
            "required": ["path", "content"]
        }
    }
]

5. Obsidian as the Agent's IDE

In this architecture, you do not write the wiki. The LLM does. Obsidian is the IDE you use to read, audit, and navigate what the agent has built.

The workflow is three steps:

Ingest: Drop a source file into raw/.
Execute: python agent.py process raw/your-document.pdf
Audit: Open Obsidian Graph View and inspect the new nodes and edges.

Below is a screenshot from my own running implementation — the wiki has been processing documents about my freelance positioning, market analysis, and technical concepts. Each node is a Markdown file the agent created or updated autonomously:

Obsidian Graph View of my LLM Wiki implementation, showing nodes including index, log, paolo_chignoli, market_analysis_2026, positioning_strategy, rag, ai_agents, and others, all interconnected via wikilinks generated by the agent.

Notice the hub structure: index and log are highly connected (as the schema mandates), while concept nodes like rag, ai_agents, and schema_driven_llm cluster around entity nodes. This structure emerges from the schema rules — the agent is not making aesthetic choices, it is following the CLAUDE.md directive.

A practical note on the "real-time" claim you often see in descriptions of this pattern: Obsidian's Graph View does not hot-reload automatically. You need to either reopen the vault or use the "Reload app without saving" command after the agent writes new files. Minor UX detail, but worth knowing before you expect live updates.

Leveraging YAML Frontmatter + Dataview

Instruct the agent to include YAML frontmatter in every entity file:

---
type: concept
status: stable
last_updated: 2026-04-28
source_reliability: high
sources:
  - "[[Vaswani2017]]"
  - "[[Karpathy-LLM-Wiki-Gist]]"
---
 
# Attention Mechanism
 
...

This unlocks Dataview queries as automated dashboards inside the wiki:

TABLE status, last_updated, source_reliability
FROM "wiki/entities"
WHERE type = "concept"
SORT last_updated DESC

6. The Three Core Agent Operations

Ingest (Create / Update)

The agent reads the source, checks index.md for existing entities, updates pages for known concepts, and creates new Markdown files for novel ones. The log.md always gets an append entry. This is the most expensive operation — it costs a full LLM inference per document.

Query (Read)

The agent identifies relevant summary pages via index.md and reads the pre-compiled content to answer the question. No vector embeddings required. This is the payoff: query cost is a fraction of standard RAG because the synthesis has already happened.

Lint (Refactor)

The operational superpower that most implementations skip. Periodically prompt the agent to:

Scan for orphan pages (nodes with no inbound wikilinks).
Resolve or escalate ⚠️ CONFLICT flags.
Merge redundant entity pages.
Rebuild index.md from scratch for consistency.

Linting is what keeps the wiki coherent over months of incremental ingest. Without it, entropy accumulates.

7. FinOps: Where Does the Compute Actually Go?

The LLM Wiki pattern shifts compute from query time to ingest time. This is the critical trade-off to understand before adopting it:

	Standard Vector RAG	LLM Wiki
Ingest cost	~$0.0001/doc (embeddings only)	~$0.01–0.05/doc (full LLM synthesis)
Query cost	High (context stuffing + synthesis)	Low (read pre-compiled page)
Accuracy on multi-hop queries	Moderate (fragmented chunks)	High (coherent synthesis)
Latency	High at query time	Low at query time
Maintenance	None	Requires periodic linting passes
Corpus growth	Linear retrieval cost	Sublinear query cost (amortized)

The cost estimates assume Claude Sonnet-class models. At 100 documents, ingest costs roughly $1–5 in LLM calls vs. a few cents in embeddings. At 10,000 documents, that gap is significant.

When to use LLM Wiki vs. standard RAG

Use LLM Wiki when:

The same corpus is queried hundreds or thousands of times (query savings amortize ingest cost).
Hallucinations carry real cost — regulatory, legal, or reputational.
The corpus is relatively stable and grows incrementally rather than by bulk.
You need human-auditable knowledge, not a black-box vector space.

Stick with vector RAG when:

Data volume is high and transient (customer support logs, real-time feeds).
The corpus changes faster than you can ingest it.
Budget is constrained at the ingest phase.

8. What I Actually Used It For

The screenshot above is not a toy example. I used this pattern to build a personal knowledge base that ingests documents about my own freelance practice: market positioning, client profiles, technical skill inventory, project history.

The result is a wiki where I can ask "what are the overlapping needs between my target clients in the retail sector and my RAG architecture work?" and get a synthesized answer — because the agent has already cross-referenced market_analysis_2026, target_clients, and service_offerings into a coherent graph.

This is the practical case Karpathy describes: corporate memory for a single operator. In my case, that operator is me as a contractor. The same pattern scales directly to small teams or enterprise documentation systems where the cost of a hallucinated answer exceeds the cost of LLM synthesis at ingest.

Conclusion

The LLM Wiki pattern is not a replacement for RAG. It is a different architectural choice with a different cost profile and a different failure mode. RAG fails at synthesis. LLM Wiki fails at scale and ingest cost.

For high-value, stable, deeply queried corpora — technical documentation, research, institutional memory, personal knowledge management — the compounding nature of a maintained wiki is a structural advantage that vector retrieval cannot replicate.

Karpathy's original Gist is worth reading in full. The implementation takes an afternoon to stand up. The compounding starts immediately.

I work as an AI Engineer and Data Scientist, available for contract engagements on RAG architecture, agent systems, and LLM integration. If this post was useful, feel free to reach out.