从 Syneira Vault 到 Karpathy 的 LLM Wiki,再到 Google OKF,三件独立发生的事情指向同一个方向:AI 时代的知识管理,核心不再是存储和检索,而是策展和编译。
今年四五月的时候,我在给自己搭一套个人知识系统。
起因很简单:每天接触的信息来源太杂了。截图、PDF、跟 Claude 聊天的记录、微信群里转的帖子、偶尔存的技术文章。这些东西散落在手机相册、iCloud、聊天记录、浏览器书签里,存的时候觉得有用,过两天就找不到了。
我想要的不复杂:一个文件夹,什么格式的东西都能往里扔。AI 帮我读内容、判断主题、分到对应的项目下面。每天跑一次,出一份处理记录。每周汇总一次,告诉我这周关注最多的是什么,哪些散落的信息之间可能有联系。
所以我用 Claude Code 搭了一个叫 Syneira Vault 的东西。本质上就是一个 Markdown 文件夹。根目录放一个导航文件,告诉 AI 这个库的结构是什么样的,不同类型的信息应该去哪里。身份信息、写作偏好、品牌规范各写一个文件,项目状态各写一个文件。AI 每次处理新输入之前,先读导航,再做分类。分错了我纠正,纠正的规则写回导航文件,下次就不会再错。
没什么技术门槛。所有内容是 Markdown,放在本地,任何编辑器都能打开。今天给 Claude 读,明天给别的模型读,不用改一个字。
搭完这个系统大概两周后,Karpathy 发了一篇 gist。
Andrej Karpathy,前 Tesla AI 总监,前 OpenAI 创始团队成员。他在 GitHub 上发了一个叫 llm-wiki.md 的文件,只有一篇 Markdown 的篇幅,两个月拿了五千多颗星。
他描述的模式,我读完觉得非常熟悉。
他说,大多数人用 LLM 处理文档的方式是 RAG:上传一堆文件,查询时检索相关片段,生成回答。这能用,但每次都在从零开始。问一个需要综合五份文档才能回答的问题,LLM 每次都要重新找、重新拼,什么都没有积累下来。
他提出的替代方案是:让 LLM 不只是在查询时检索,而是主动把知识编译成一个持久的、结构化的 Markdown 知识库,然后持续维护它。新的来源进来,LLM 读取内容,更新相关的页面,补充交叉引用,标注跟已有结论矛盾的地方。知识被整理一次,然后保持更新,而不是在每个查询上重新推导。
他把这个知识库分成三层:
CLAUDE.md),告诉 LLM 知识库的结构、处理规范、输入规则。人类放弃维护 wiki 的原因,恰好是 LLM 擅长的事。LLM 不会觉得无聊,不会忘记更新交叉引用,一次可以处理 15 个文件。
又过了大约六周,Google Cloud 发布了 Open Knowledge Format。
OKF 做的事情,用一句话说:把 Karpathy 描述的这种"用 Markdown 组织知识给 AI 读"的模式,写成了一份正式的、有版本号的开放规范。
具体来说,OKF 定义了一种叫"知识包"的结构。一个知识包就是一个文件夹,里面每个 Markdown 文件描述一个"概念",可以是一张数据表、一个业务指标、一套 API。文件顶部用 YAML 写几个结构化字段,唯一必填的只有 type,告诉读的人这个文件描述的是什么类型的东西。文件之间用普通 Markdown 链接表达关系。
没有专用 SDK,没有数据库,没有需要注册的服务。Apache 2.0 许可证,谁都可以用。
Google 在博文里把要解决的问题叫做 "context-assembly problem"。企业内部的知识散落在元数据目录、wiki、代码注释、各种工具里。每搭一个新 Agent,就得从这些地方重新拼一次上下文。每个工具供应商有自己的知识图谱 schema,知识被锁在产出它的工具后面。OKF 想做的,是给知识一个通用的打包格式,让不同的生产者和消费者之间可以互相读取。
这三件事放在一起看,有意思的不是任何单独一个,而是它们指向同一个方向。
我搭 vault 的时候没有读过 Karpathy 的 gist(还没发布)。Karpathy 写 gist 的时候大概也没有想到 Google 两个月后会把它标准化。但我们各自独立地做了类似的选择:用 Markdown 而不是数据库,用文件夹结构而不是专有平台,让 AI 做分类和维护,人做判断和决策。
这不是巧合。这是一个正在形成的共识:AI 时代的知识管理,核心不再是存储和检索,而是策展和编译。
这三件事放在一起看完,我脑子里一直转着一个问题,不是技术问题,是关于人的。
Karpathy 的 wiki 之所以有用,不是因为他有一个很酷的文件夹结构,也不是因为他用了什么高级 prompt。是因为他本身是那个领域最顶尖的思考者之一。他知道该往 raw/ 里喂什么素材,知道该对 AI 提什么问题,能判断 AI 生成的每一段内容对不对、准不准、有没有遗漏。工具放大了他的能力,但能力是他自己的。
我搭 vault 的时候也意识到了类似的事情。AI 帮我分类、生成摘要、发现关联,这些它做得很好。但每次周报出来,那些"关联发现"到底有没有价值,不是 AI 能判断的。"这三条来自不同来源的笔记可以串成一篇文章",这句话看起来是 AI 说的,但只有我知道它是不是真的能串起来,串起来之后想对谁说,用什么角度切入。如果我不花时间去读那些被整理好的内容,不去想它们之间的关系,不去做"这个值得写,那个先放放"的判断,那个 vault 就只是一个很整齐的文件柜。
AI 可以帮你把数据整理成信息,甚至帮你把信息初步组织成知识的样子,但从"看起来像知识"到"真的变成我的知识",中间隔着的那一步,是你自己坐下来读、想、用的过程。
这不是在否定这些工具的价值。恰恰相反。以前我的注意力被整理、归档、搜索这些事消耗掉了,存了一堆截图和文章,回头找不到,找到了也忘了当初为什么存。现在 AI 把这些体力活清掉了,我才有精力去做真正需要人的那部分。
AI 知识库最大的价值,不是替你思考,是帮你腾出思考的空间。格式可以标准化,维护可以自动化,但思考没法代理。
我给 vault 写的第三条设计原则是"AI 做判断,我做决定"。搭完系统、用了一段时间之后,我觉得这句话还可以再往前推一步:AI 做整理,我做消化。系统帮我把信息收进来、分好类、找到关联,但那些信息要变成我下一篇文章的观点、我下一个项目的方向、我对一个趋势的独立判断,需要的不是更好的系统,是我自己坐在那里想一想。
llm-wiki.md (GitHub Gist, 2026.04)源于生活,立足应用。
实用 > 炫酷 · 能用 > 能看
From a personal Markdown vault to Karpathy's LLM Wiki to Google's Open Knowledge Format, three independent developments point the same direction: AI-era knowledge management is about curation and compilation, not storage and retrieval.
Earlier this year, I started building a personal knowledge system. Not because I had a mess to clean up, but because I was outgrowing my own workflow.
My day job is lead data engineer, working on AI enablement projects in healthcare. Outside of that, I volunteer with a nonprofit AI literacy program in our local community, contribute to a few open-source projects, and write about AI tools and workflows on the side. At any given time I'm tracking multiple threads: technical architecture for work, research for things I'm writing, community education content, tools and frameworks I'm evaluating. I read a lot. I build a lot. I'm always in the middle of several things at once, and I like it that way.
But I realized that all the learning and practicing I was doing wasn't compounding the way it should. Insights from one project that could inform another stayed in the conversation where I first thought of them. Technical patterns I explored for work never made it into my content planning. My thinking was happening across too many surfaces, and nothing was connecting the dots.
What I wanted was a system that could grow alongside me. Not just a filing cabinet, but something that captures how I think, what I'm tracking, and how different threads relate to each other. Something where AI handles the intake, the sorting, the cross-referencing, while I stay focused on the parts that actually need me: reading critically, forming opinions, making judgment calls, deciding what matters.
So I built a local Markdown vault with Claude Code. A navigation file at the root tells the AI the structure of the knowledge base and where different types of input should go. Identity files, writing preferences, project status docs, each in their own Markdown file. AI reads the nav before processing anything new. When it misroutes something, I correct it, the correction gets written back into the nav, and next time it gets it right.
Everything is plain text. No database, no proprietary format. Works with any model, any editor, any OS. The vault is mine. The AI is a collaborator, not a dependency.
About two weeks after I got my system running, Andrej Karpathy posted a GitHub gist called llm-wiki.md. One Markdown file. Five thousand stars in two months.
What he described felt immediately familiar.
His argument: most people use LLMs with documents through RAG. Upload files, retrieve relevant chunks at query time, generate an answer. It works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask something that requires synthesizing five documents, and the model has to find and piece together the fragments every single time.
His alternative: instead of only retrieving at query time, have the LLM actively compile and maintain a persistent, structured Markdown knowledge base. When a new source comes in, the LLM reads it, updates the relevant pages, adds cross-references, flags contradictions with existing content. Knowledge gets organized once, then kept current, rather than re-derived on every query.
He split the system into three layers:
CLAUDE.md in Claude Code) telling the LLM how the knowledge base is structured and what rules to follow.The things that make humans give up on maintaining wikis are exactly what LLMs are good at. They don't get bored, they don't forget to update cross-references, they can process fifteen files at once.
Six weeks later, Google Cloud released Open Knowledge Format, version 0.1.
In one sentence: OKF takes the pattern Karpathy described, using Markdown to organize knowledge for AI consumption, and turns it into a formal, versioned, open specification.
An OKF "knowledge bundle" is a folder. Each Markdown file describes one concept: a database table, a business metric, an API, an operating procedure. A small YAML frontmatter block at the top of each file holds structured fields for querying and filtering. The only required field is type. Files link to each other with standard Markdown links.
No SDK, no database, no service to sign up for. Apache 2.0 license.
Google's blog post frames the problem as "context assembly." Enterprise knowledge is scattered across metadata catalogs, wikis, code comments, and various tools. Every new AI agent has to reassemble context from these sources. Every vendor has its own knowledge graph schema, so knowledge gets locked behind whatever tool produced it. OKF's goal is to give knowledge a universal packaging format that any producer can write and any consumer can read.
I didn't read Karpathy's gist before building my vault. It hadn't been published yet. Karpathy probably wasn't thinking about Google standardizing his pattern two months later. But all three of us, independently, made similar choices: Markdown over databases, folder structures over proprietary platforms, AI for classification and maintenance, humans for judgment and decisions.
That's not coincidence. It's a consensus forming.
Karpathy's wiki is valuable not because he has a cool folder structure or some advanced prompt. It's because he is one of the sharpest thinkers in the field. He knows what to feed into raw/, what questions to ask the AI, and he can tell whether what the AI wrote back is right, incomplete, or missing something important. The tool amplified his ability. The ability was already his.
I noticed something similar with my own vault. AI does the sorting, generates summaries, surfaces connections. It's good at that. But when the weekly brief comes back saying "these three notes from different sources could become an article," only I know whether that's actually true. Whether it would resonate with my audience. What angle to take. If I don't sit down and read what's been organized for me, don't think about how the pieces relate, don't make the call on what's worth pursuing and what to let go, the vault is just a very tidy filing cabinet.
This isn't a critique of the tools. It's the opposite. Before the vault, my attention was split between doing the actual thinking and managing the logistics of where things lived. Now that AI handles intake and organization, I have room for the part that only I can do.
So maybe the real picture looks like this: the biggest value of an AI knowledge base isn't that it thinks for you. It's that it clears the space for you to think. Formats can be standardized. Maintenance can be automated. Thinking can't be delegated.
When I wrote my vault's design principles, one of them was "AI makes judgments, I make decisions." After using the system for a while, I'd push that one step further: AI does the organizing, I do the digesting. The system brings information in, sorts it, finds connections. But for that information to become the point of my next article, the direction of my next project, my own read on where a trend is heading, what I need isn't a better system. It's to sit with it for a while.
llm-wiki.md (GitHub Gist, April 2026)Life first, tools that work.
Useful > Flashy · Usable > Impressive