8 min to read
On April 3, 2026, Andrej Karpathy posted something on X that resonated well beyond the usual AI news cycle. He wasn't announcing a new model or a benchmark result. He was describing a change in how he personally uses LLMs — a shift from generating code to generating knowledge structure. He called it the Karpathy LLM knowledge base, a form of AI-powered personal knowledge management that builds a self-maintaining wiki from raw research material. Within days the post had spawned a GitHub Gist, a wave of community implementations, and serious debate about whether this approach makes RAG pipelines obsolete for personal use.
This article breaks down exactly how Karpathy's system works, why he made the design choices he did, and how you can build an identical setup for your own research or engineering work.
Karpathy's framing is precise: he is spending a large fraction of his LLM token budget not on code generation but on knowledge manipulation. The system is a personal wiki — a collection of interlinked Markdown files — that an LLM writes and maintains autonomously as new source material is added.
The result at the time of posting: a single research topic had grown to roughly 100 articles and 400,000 words — longer than most PhD dissertations — without Karpathy writing a single word of it directly. The LLM does the writing, the linking, the categorizing, and the consistency checking.
What makes this different from AI-assisted note-taking tools like Notion AI or standard summary bots is the active maintenance loop. The LLM isn't just summarizing documents once. It is incrementally compiling a structured knowledge base, running "health checks" to detect inconsistencies, and generating backlinks as new concepts appear. It behaves less like a chatbot and more like a diligent research librarian who never sleeps.
The entire system rests on three directories. The simplicity is intentional — Karpathy chose a structure that any LLM agent can navigate without custom tooling.
Raw source material goes directly into a folder called raw/. This includes research papers (PDFs converted to Markdown), GitHub repositories, web articles clipped via the Obsidian Web Clipper, datasets, meeting notes, and screenshots. The Obsidian Web Clipper converts web pages to .md files and saves images locally so the LLM can reference them through its vision capabilities — no external URLs, no link rot.
The raw folder is append-only. Nothing is edited here. It is the single source of truth for everything the LLM has ever read.
The wiki/ directory is where the LLM outputs structured knowledge. It writes encyclopedia-style articles for each concept it identifies across the raw material, creates backlinks between related articles, and maintains an index file that summarizes the entire wiki at a glance. When a new source is added to raw/, the LLM reads the existing index, identifies which wiki articles need to be updated or created, and runs a targeted update — it does not rewrite everything from scratch.
The format is pure Markdown throughout. This is deliberate: Markdown is the most compact, LLM-readable, and human-auditable structured format that exists. No proprietary schema, no vector embeddings — just files a human can open and read in any text editor or view through Obsidian's graph view.
The outputs/ folder stores query responses, synthesized reports, and analysis results. When Karpathy asks a question — "what are the three most promising architectures for long-context reasoning?" — the LLM reads the wiki index, drills into relevant articles, and writes the answer to outputs/ as a Markdown document. This gives every query a persistent, auditable record.
Karpathy describes two modes of LLM operation in the system: the compilation step and the linting pass.
The compilation step happens when new material arrives in raw/. The LLM reads the new source, extracts key concepts, checks whether those concepts already exist as wiki articles, and either creates new articles or appends to existing ones. It then updates the index file and generates backlinks. At ~100 articles and 400,000 words, the entire wiki index fits comfortably within a modern LLM's context window, which means the LLM can check for duplicates and contradictions without any retrieval system.
The linting pass is a periodic "health check" that runs independently of new ingestion. The LLM scans the entire wiki for inconsistencies — articles that contradict each other, concepts mentioned in one article but lacking their own entry, index entries that are stale or missing. It can also identify gaps: topics that appear multiple times in the raw material but have no dedicated wiki article yet. These gaps become prompts for the LLM to author new content or flag the gap for human review.
The system has a third critical component that receives less attention than the three folders: the schema configuration file. In Karpathy's GitHub Gist "idea file," this is a CLAUDE.md file (for Claude Code) or AGENTS.md file (for Codex) that defines the rules the LLM must follow when operating the knowledge base.
The schema file specifies:
outputs/ and which are inline responsesKarpathy describes this schema as co-evolved — he refines it over time based on how the wiki develops. The human's primary editorial role is not writing articles but writing and refining the schema that instructs the LLM on how to write them. If you think about it this way, the knowledge base is less a product of the LLM and more a product of Karpathy's instruction-writing — the LLM just executes at scale.
The standard enterprise answer to personal knowledge management at scale is a RAG pipeline — chunk your documents, embed them into a vector database, run similarity search at query time, and inject the retrieved chunks into the LLM context. Karpathy's approach deliberately skips this for personal use, and the reasoning is technically sound.
RAG's core problem is chunking: documents are split into fragments that lose their surrounding context. A paragraph from a research paper might be retrieved without the paragraph that defines the key term it uses. The LLM then has to work around that gap, which introduces retrieval noise and hallucination risk.
The Karpathy approach sidesteps chunking entirely. The wiki articles are already human-readable summaries of the raw material, written by an LLM that has read the full context. At query time, the LLM reads the wiki index — a compact summary of all 100+ articles — and pulls in the specific articles it needs. This is context-loading, not retrieval, and it works because modern LLMs have context windows large enough to hold the index plus several full articles simultaneously.
When does RAG still win? At scale beyond a few hundred articles or millions of words, the index itself becomes too large to fit in context, and retrieval becomes necessary again. Karpathy's approach is explicitly positioned for personal and small-team use — a single researcher's knowledge base, not a company-wide document store.
Karpathy's tool choices are minimal and pragmatic:
raw/ with images stored locally. One-click ingestion from the browser into the knowledge base.The idea file is available at gist.github.com/karpathy/442a6bf555914893e9891c11519de94f. It is designed to be copied, pasted, and adapted — not forked and compiled. This is itself a statement about the LLM agent era: instead of sharing a codebase, you share the concept, and the recipient's agent builds the implementation for their specific setup.
Here is the minimum viable setup for replicating Karpathy's approach for your own personal knowledge management with AI.
Step 1: Create the directory structure
mkdir -p ~/knowledge/raw
mkdir -p ~/knowledge/wiki
mkdir -p ~/knowledge/outputs
Step 2: Create your schema file
Create ~/knowledge/CLAUDE.md (or AGENTS.md for Codex). The schema file should define at minimum:
# Knowledge Base Schema
## Directories
- raw/: Source documents. Append-only. Never edit.
- wiki/: LLM-authored articles. One .md file per concept.
- outputs/: Query responses and reports.
## On ingesting a new source in raw/:
1. Read wiki/INDEX.md to understand existing articles.
2. Identify new concepts in the source not yet in the wiki.
3. Create or update articles in wiki/ for each concept.
4. Add backlinks to existing related articles.
5. Update wiki/INDEX.md with any new entries.
## Index format (wiki/INDEX.md):
- One line per article: [Article Title](filename.md) - one-sentence summary
## On answering a query:
1. Read wiki/INDEX.md.
2. Identify relevant articles and read them.
3. Write answer to outputs/YYYY-MM-DD-query-slug.md.
## On a linting pass:
1. Read all wiki articles.
2. Flag contradictions, missing backlinks, and referenced concepts lacking articles.
3. Create stubs for missing articles.
Step 3: Ingest your first source
Drop a research paper or article into raw/. Then open your preferred AI coding agent in ~/knowledge/ and run:
Process the new file in raw/ according to CLAUDE.md.
The LLM reads the schema, reads the new source, creates wiki articles, and updates the index. Your first knowledge base entry is done.
Step 4: Run a linting pass periodically
Run a linting pass on wiki/ according to CLAUDE.md.
Schedule this weekly or after every ten new sources. The LLM will fill gaps, fix stale links, and suggest new articles. For developers interested in how LLM-optimized metadata works at web scale, the llms.txt specification applies a similar philosophy to public websites.
This system is not a replacement for every knowledge management scenario. Be clear-eyed about the constraints:
The system works because Karpathy inverted the usual human-LLM dynamic: instead of asking the LLM questions, he trained the LLM to ask itself what's missing. Karpathy's core insight isn't about folders or Obsidian — it's that LLMs are increasingly capable enough to act as knowledge compilers, not just query responders.
For teams evaluating whether to build this or invest in a proper RAG pipeline, the honest answer is: start with this approach, and only move to RAG when the context window becomes a genuine bottleneck rather than a hypothetical one. You'll likely be surprised how far structured Markdown takes you.
Connect with top remote developers instantly. No commitment, no risk.
Tags
Discover our most popular articles and guides
Running Android emulators on low-end PCs—especially those without Virtualization Technology (VT) or a dedicated graphics card—can be a challenge. Many popular emulators rely on hardware acceleration and virtualization to deliver smooth performance.
The demand for Android emulation has soared as users and developers seek flexible ways to run Android apps and games without a physical device. Online Android emulators, accessible directly through a web browser.
Discover the best free iPhone emulators that work online without downloads. Test iOS apps and games directly in your browser.
Top Android emulators optimized for gaming performance. Run mobile games smoothly on PC with these powerful emulators.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.
ApkOnline is a cloud-based Android emulator that allows users to run Android apps and APK files directly from their web browsers, eliminating the need for physical devices or complex software installations.
Choosing the right Android emulator can transform your experience—whether you're a gamer, developer, or just want to run your favorite mobile apps on a bigger screen.
The rapid evolution of large language models (LLMs) has brought forth a new generation of open-source AI models that are more powerful, efficient, and versatile than ever.