I Read the OpenClaw and Hermes Source Code: One Bets on Reuse, the Other on Owning the Loop

I pointed Claude Code at the OpenClaw and Hermes codebases. Same product shape, opposite bets: one reuses an upstream runtime and spends its budget on the platform, the other owns the entire loop.

Also on Substack

I Read the OpenClaw and Hermes Source Code: One Bets on Reuse, the Other on Owning the Loop

Today I pointed Claude Code at the source code of OpenClaw and Nous Research’s Hermes Agent and read through both. From the outside they look like the same kind of thing: a personal AI assistant you install on your own machine or a cheap VPS, reachable through Telegram, Slack, or Feishu, following you across platforms and sessions while you work. The Hermes repo even ships a hermes claw migrate command, an open admission that it sees OpenClaw users as a migration target. But once I finished reading the code I noticed something interesting. For the same product shape, the two bet on completely opposite primitives underneath. OpenClaw chose to reuse an upstream open-source project at the runtime layer and spend its engineering budget one level up, on channels and platform contracts. Hermes chose to own the entire loop, pulling every vendor quirk and every research-side bet straight into its main loop. That one fork in the road pulls the two codebases into completely different shapes.

OpenClaw’s default runtime isn’t hand-rolled. It embeds Mario Zechner’s open-source Pi agent core directly into its own process to run the LLM calls and tool dispatch. Model providers all talk plain HTTPS through plugins. extensions/anthropic hits api.anthropic.com, extensions/codex hits OpenAI’s own app-server, extensions/google hits generativelanguage.googleapis.com. Adding a vendor means adding a provider plugin, not touching the main loop. Because it reuses rather than rewrites that layer, all of OpenClaw’s engineering energy goes one level up: channel integration, session management, plugin contracts, cross-vendor orchestration. There’s also an optional plugin called acpx that lets you bring external coding harnesses like Claude Code, gemini-cli, Cursor, and opencode in as a runtime option. On that path OpenClaw actually spawns a child process and reverse-exposes itself as an MCP server so the external harness can call channel tools, which completely inverts the host and hosted relationship. That’s a brand-new pattern in the agent ecosystem. But it’s an explicit, optional path. By default OpenClaw runs Pi inside its own process. In other words it’s not “doesn’t write the loop,” it’s “doesn’t rewrite the loop.” The loop runs in its own process, but that code isn’t OpenClaw’s to maintain long term.

Hermes goes the exact opposite way. run_agent.py is a single 13,000-line file. The whole AIAgent class hand-writes its own while loop, each model provider gets a ProviderTransport subclass to convert message and tool formats, and retry, streaming, prompt cache, and credential rotation are all internalized in the main loop, with no dependency on any vendor binary sitting on PATH. The price is a dozen reactive patches crammed into the main code: scrubbing surrogate characters out of messages, repairing tool-call JSON, retry logic for the times Codex spits out half a thinking block. Every one of those traces back to a model upgrade or a vendor SDK changing its protocol. None of them is sexy in isolation, but each corresponds to a moment of “I caught this case,” and that’s where product stability comes from. OpenClaw eats the same provider-API-protocol changes. Its pi-embedded-helpers group has all the same patches for half-finished reasoning blocks, weird token characters, and corrupted transcript files. The difference is that because the loop isn’t OpenClaw’s to maintain, every time Pi ships a major version it has to re-pull the adaptation layer. The wound just sits in a different place. Hermes carries its wounds inside the main loop. OpenClaw carries them along the seam where it syncs with upstream. Neither has it easy.

Hermes also makes a fairly aggressive research bet. After every chat round it forks a complete second AIAgent in a background thread, redirects stdout and stderr straight to /dev/null, and gives it only the memory and skills toolset, letting this “review fork” decide whether to create a new skill or update MEMORY.md and USER.md. That’s genuine cross-session self-learning, and each trigger is an extra full-model call, doubling the token cost. Nous is a model company, they sell the Hermes line of open-source LLMs, so the repo also carries a completely separate path of batch_runner.py, tinker-atropos/, and trajectory_compressor.py, specifically to generate RL training data for the next generation of tool-calling models. That’s a different thing from the review fork, but the same instinct underneath. For Nous, that doubled cost is R&D investment. For the user it’s extra token spend, and what it buys is a capability dimension OpenClaw didn’t bet on: cross-session learning and self-improvement. OpenClaw isn’t structurally incapable of it. The loop runs in its own process, every tool dispatch can hang a hook, and wiring in a review fork wouldn’t be hard in theory. It just didn’t put its research chips here. Cross-session memory is handled for now by a passive vector-store plugin like memory-lancedb. That’s a product choice, not a capability gap.

The engineering baselines are different bets too. OpenClaw is a monorepo with plugin contracts, 130-plus extensions/* subpackages, a gateway daemon that registers itself as a system service via launchd on macOS and schtasks on Windows so it auto-starts, and an Ed25519 device-identity signature on the WebSocket handshake between CLI and daemon. That’s enterprise-grade RPC. The boundary between core and plugin is governed by strict documented rules, and there’s more supervisory code than business code. Hermes is the reverse: a single-process asyncio app where channel adapters are same-process tasks, session state is managed in SQLite with FTS5, and there’s a dedicated messages_fts_trigram virtual table built for CJK search. Neither neglects the Chinese ecosystem. OpenClaw’s README lists Feishu, WeChat, and QQ clearly, and Hermes’s gateway/platforms/ has feishu, dingtalk, wecom, weixin, yuanbao, and qqbot all present. The difference isn’t Chinese versus English, it’s plugin runtime versus single-process async, a structural choice.

Put the two side by side and OpenClaw is betting that the runtime layer is being commoditized fast, that an open agent core like Pi is already good enough, and that writing your own is a waste of effort better spent on channel integration, plugin contracts, and cross-vendor orchestration. Hermes is betting that the agent loop itself is far from finished, hand-writing the quirks and the self-improving research capability straight in, and turning the repo into a model company’s RL data-generation base along the way. At this point in 2026, with Anthropic abstracting Claude Code into the claude-agent-sdk, OpenAI open-sourcing Codex, and Google open-sourcing gemini-cli, with Pi, claude-agent-sdk, and Hermes all grinding against each other at the runtime layer, the “reuse upstream and bet on the platform” path that OpenClaw took has the better odds. But if the self-improving line genuinely works two years out, the research breakthrough Hermes is betting on would leave OpenClaw in the dust.

If I were learning from these, the order I’d go is: read Pi first to see what a clean open-source agent loop looks like, then read Hermes’s run_agent.py to see what a loop hits in the real world, then come back to OpenClaw to see what orchestration you can build on top of a loop. One sample of each beats ten LangChain tutorials. The thing most worth tracking in this wave of agents isn’t which project wins. It’s how fast the runtime is being commoditized, how research-side capabilities like memory and self-learning are starting to sink into the runtime, and how the orchestration layer is growing entirely new product shapes. Seeing clearly what each layer is becoming, before deciding which layer you want to build on, is the question I now ask myself every time I read a new agent project.

02 · More writing

The Agent Platform Wars Are Really a Quiet Rewrite of Serverless

The Founder-Fit Question: Your Personality Decides Your Direction