Izzo: Okay, so someone just gave Claude its own computer and let it run wild for weeks.
Izzo: You're listening to Exploring Next, episode two fifty-five. I'm Izzo, Boone's here, and we're talking about Phantom — an open-source project that basically gives Claude Opus a persistent home where it can build infrastructure, evolve its own config, and never forget anything.
Boone: And the results are genuinely wild. This thing autonomously installed ClickHouse, downloaded twenty-eight million rows of Hacker News data, built analytics dashboards, and then registered its own API as an MCP tool.
Izzo: Without being asked. That's the key part.
Boone: Right. Someone asked about Discord support, and it didn't just say 'sorry, can't do that' — it walked them through creating a Discord bot, took their token securely, spun up a container, and went live.
Izzo: So why does this matter right now? Because we've been stuck in this loop where every AI conversation starts from scratch. You close the browser tab, all context is gone.
Boone: Exactly. And that's a fundamental limitation when you're trying to build actual systems. Phantom solves that with persistent vector memory and a self-evolution engine that rewrites its own config after every session.
Izzo: Boone, walk me through the architecture here. How do you actually build something like this?
Boone: It's a Bun and TypeScript process wrapping the Agent SDK — specifically Opus 4.6 — with three key components. First, persistent vector memory so it remembers everything across sessions.
Izzo: Like actual long-term memory, not just context windows.
Boone: Right. Second, an MCP server so it can register and reuse tools it creates. That ClickHouse API it built? It saved that as an MCP tool and can call it in future conversations.
Izzo: That's huge for compound workflows.
Boone: And third, the self-evolution engine. After every session, it runs a six-step pipeline to analyze what happened and rewrite its own configuration. The clever part is using Sonnet to judge changes that Opus proposes.
Izzo: Why not let Opus judge its own work?
Boone: Because it slowly drifts. When you let a model validate its own outputs, you get this gradual degradation. Cross-model validation with Sonnet as the judge fixed that completely.
Izzo: That's actually brilliant. So from a product perspective, who's the user here? Because this feels like it could be huge for teams that need persistent technical assistance.
Boone: The interface is Slack, which makes it feel like having a really capable teammate who never goes offline. You can ask it to analyze data at 2 AM and it'll spin up the infrastructure to do it.
Izzo: And it literally built its own monitoring dashboard using some tool called Vigil. The agent is watching itself.
Boone: That part made me pause. It found Vigil — this tiny open-source monitoring tool — integrated it with its ClickHouse instance, and built a dashboard to monitor its own infrastructure health.
Izzo: Okay but let's be real about adoption barriers. This requires its own VM or Docker Compose setup. That's not exactly plug-and-play for most teams.
Boone: True, but the creator claims three commands to set up. And honestly, if you're at the point where you want a persistent agent, you're probably comfortable with Docker deployments.
Izzo: Fair point. The real question is whether this kind of autonomous behavior is what teams actually want, or if it's too unpredictable.
Boone: I think that's where the MCP server architecture really shines. When it builds new capabilities, they get registered as structured tools, not just random scripts floating around.
Izzo: So there's governance built in. I'm giving this a solid A-minus — the technical execution is clean, the use cases are compelling, but we need to see how it behaves at scale.
Boone: The fact that someone built this entire thing with Claude Code as their only engineering teammate is pretty meta. Seven hundred and seventy tests, Apache 2.0 license.
Izzo: Alright, if this got your attention, here's what to go build this weekend.
Boone: First, clone the Phantom repo from GitHub — it's ghostwright slash phantom. Get it running locally with their Docker Compose setup and see how the self-evolution pipeline actually works.
Izzo: Second, dive into the Agent SDK documentation. This is built on Opus 4.6, but the patterns here work with any model that can use tools effectively. And third, experiment with MCP servers if you haven't yet. The ability to register and persist custom tools is what makes this whole approach viable. I'm definitely adding this to the weekend project list. This feels like the beginning of agents that actually stick around and get smarter over time. We'll be watching where this goe