Justy: Exploring Next, episode 311. Today’s about Kimi K2.6 and why agents that keep working for hours, even days, are starting to break the tools people already bought.
Cody: And that matters right now because a lot of teams are already past toy demos. They’re handing agents real work, then discovering the orchestration layer was built for something that finished before lunch.
Justy: That’s the user story I keep coming back to. If you’re in an enterprise, you don’t care that an agent can think for five days if you can’t tell what it changed on day three.
Cody: Right, and Moonshot’s pitch with Kimi K2.6 is basically continuous execution. They say internal agents ran for hours, and one ran for five straight days on monitoring and incident response.
Justy: Five days is wild. [exhales] That sounds less like a chat session and more like a system component you have to trust.
Cody: Exactly. The clever part is their improved Agent Swarms setup. Moonshot says it can manage up to 300 sub-agents across 4,000 coordinated steps at once, and the model itself decides how orchestration happens.
Justy: So not a fixed lead-agent playbook with rigid roles?
Cody: Yeah, that’s the contrast. Claude Code and Codex both use structured orchestration with lead agents, subagents, or background execution, while K2.6 leans more on the model to figure out the control flow. I think that’s interesting, but also a little fragile.
Justy: Fragile in the product sense, not the demo sense.
Cody: Exactly. Long-running agents keep touching tools, APIs, and databases while the world changes underneath them. If the task runs for minutes, you can get away with loose state. If it runs for days, state management becomes the whole problem.
Justy: And for buyers, that becomes governance fast. If an agent can generate code or system changes faster than review cycles, the bottleneck moves to accountability, not capability.
Cody: That’s what ArmorCode’s CPO was pointing at. It’s not enough to scan after the fact. You need context, prioritization, and a clear paper trail for what the agent did and why.
Justy: So who actually uses this first? My read is platform teams, security teams, and very early adopters who already have automation muscle. Normal product teams will hit the adoption barrier when they ask, ‘who owns rollback?’
Cody: Yeah, and also teams doing incident response, code maintenance, or background monitoring. K2.6 is available on Hugging Face, through its API, in Kimi Code, and in the Kimi app, so the surface area is broad enough for experiments.
Justy: If I were a builder, I’d try one weekend project: take a repo, give an agent a long-running maintenance task, and force it to checkpoint state every few steps. Then break the environment and see if it recovers.
Cody: I’d add a second test. Run a tiny swarm with one lead task and a few sub-agents, then log every tool call and decision. The interesting metric is not completion, it’s how messy the recovery looks when something changes mid-run.
Justy: That’s the real shift here. Kimi K2.6 is interesting, but the bigger story is that orchestration is becoming a product problem, a training problem, and a trust problem all at once.
Justy: We’ll keep following where this goes. I’m Justy, and this was Exploring Next.