Justy: You ever have your cloud AI go down right when you're in the middle of something? Or they move a feature behind a new tier? I'm so done with that.
Cody: Same. That's exactly why the local agent thing stopped feeling like a hobby project. There's this framework called Hermes — crossed a hundred and forty thousand GitHub stars in three months.
Justy: Hermes. I've seen the logo, I think. What's the actual pitch? Like, my notes app already has an AI thing.
Cody: So the big difference is it's built to improve itself. Every time it hits a weird task or gets feedback, it writes a new skill and keeps it. It's not just calling an API and forgetting what happened.
Justy: Right, right.
Cody: And it runs these little contained sub-agents for specific jobs, so one doesn't get confused by what the other is doing. Nous Research curates all the plugins too, so it doesn't break every time you sneeze at it.
Justy: Nous — that's the same crew behind some of the open weight model releases, yeah?
Cody: Yeah. So there's pedigree. The other thing is, put the same model in Hermes versus another framework and Hermes returns better results. It's doing active orchestration, not just wrapping a chat call.
Justy: Okay, but what model even runs this locally? I don't have a data center.
Cody: Qwen 3.6. Twenty-seven billion or thirty-five billion parameters. The thirty-five B runs in about twenty gigs of memory and it's outperforming their old hundred-twenty-billion model. The twenty-seven B is supposedly matching stuff that used to need four hundred billion parameters and seventy-plus gigs.
Justy: That's wild. So my RTX can actually do this without catching fire.
Cody: That's the whole point. RTX PCs, RTX Pro workstations, DGX Spark — Tensor Cores matter here because it's not just one prompt. It's multistep tasks, refining skills, running while you sleep. You want throughput and you want it local.
Justy: Who's this for, though? Like, am I shipping a product with this or is this a power user weekend thing?
Cody: Right now it's mostly developers and serious prosumers. The barrier is you still have to manage the stack — local model server, the agent loop, maybe some Docker stuff. It's not 'install and double-click' yet.
Justy: So it's still rough.
Cody: A little. But the 'always on' part is real. You set it to watch a folder, or your messages, or whatever — it just keeps going. The self-improving part means two weeks in it's doing things you didn't explicitly teach it.
Justy: That's either exciting or terrifying.
Cody: Both. I tried pointing it at a messy downloads folder and told it to organize by project. First pass was okay. Third pass it had written a skill that checked file contents, not just extensions.
Justy: Okay, that is actually useful.
Cody: And it was running on a single GPU while I was also using the machine.
Justy: What about the model selection? Do I need to be a Qwen stan or can I swap?
Cody: It's provider and model agnostic by design. Qwen 3.6 is just the one they optimized for because the efficiency is so good. But you could point it elsewhere.
Justy: Alright. If someone's gonna try this, like this weekend, what do they actually do?
Cody: Grab the Hermes repo, get Qwen 3.6 running through Ollama or llama.cpp on your RTX if you have one. Then pick one boring repetitive task and let it try to build a skill for it. Don't ask it to do ten things. One thing.
Justy: One thing. I like that. Cody, I might actually try this.
Cody: Send me a screenshot when it inevitably moves your important PDFs to a folder named 'misc'.
Justy: I will blame you directly.