Justy: The spicy part is not agents writing code. It’s the claim that code might stop being the thing you worship.
Cody: Yeah, that’s the real argument. Not autocomplete got better. More like, if the model can reason through ambiguity, then the durable artifact shifts up a layer.
Justy: Which is such an Exploring Next problem to get excited about on a Tuesday. Episode four hundred fourteen and we’re back to arguing about specs like it’s somehow new again.
Cody: Baruch’s framing is basically that prompt engineering is too squishy to build on. He calls it kind of magical mumbling, and says the serious thing is context engineering. So, rules, skills, scripts, feedback loops, evaluation. Stuff you can actually manage like an engineering system instead of a lucky chat session.
Justy: Right.
Justy: I actually like that distinction. Because the central claim, I think, is not just use more context. It’s that context becomes the architecture surface. If the spec is precise enough, then code is more like a generated byproduct you can replace when requirements change.
Cody: Mm-hm.
Cody: And the evidence he leans on is that older attempts at spec-to-code failed on ambiguity. Michael Stiefel brings up the old C A S E tools dream, and Baruch says the missing piece was a reasoning machine that can handle human language being annoyingly fuzzy.
Justy: Which, honestly, maps to real product work almost too well. Nobody asks for a feature cleanly. They say, make it fast, make it secure, make it easier. And then somebody has to sit there and do the annoying but important follow-up of, okay, compared to what?
Cody: Exactly.
Cody: That’s where his question-loop idea is strongest. He basically says an agent should behave like a decent architect. Keep asking clarifying questions until the requirement is pinned down enough to act on. Not one giant prompt. An iterative loop with the client, architect, developer, whoever owns the intent.
Justy: My week was weirdly that in miniature. I spent forty minutes untangling whether someone meant faster checkout or fewer support tickets, which sounds the same in a planning doc and is very much NOT the same thing. Anyway, that made this hit harder for me.
Cody: Yeah, same point. Most failures start in unstated assumptions, not syntax.
Justy: And yes, that’s why people should care. If you manage product, architecture, or a platform team, this is practical because it changes where rigor goes. More effort up front on intent, constraints, acceptance criteria, and traceability.
Cody: Yeah.
Cody: I also buy his shift-left point, with a caveat. He says if the spec is the source of truth, then you can evaluate quality before code exists. That’s useful. You can inspect the requirements, the clarifying answers, maybe the policy constraints, and catch bad assumptions early. But I would not oversell that as replacing testing.
Justy: That part felt grounded to me. Not magical. More like testing changes meaning a bit. You’re not only checking code behavior. You’re also validating whether the captured intent was actually the right intent.
Cody: Right, right.
Cody: And the traceability claim is interesting. If the agent asks follow-up questions and those answers get preserved, you have a record of where a misunderstanding entered. That’s better than the usual archaeology where somebody digs through tickets, docs, and one cursed chat thread from six months ago.
Justy: One cursed chat thread is the true source of truth in half of software.
Cody: Where I start pushing back a little is the microservices part. Baruch says microservices fit this model best right now because context windows are limited, so smaller bounded services are easier to regenerate from changed specs. I think that is directionally sensible. But I don’t think that makes microservices the best paradigm in some general sense.
Justy: Oh, there he is. My beloved rain cloud.
Cody: Justy, you know I’m right. If your domain is naturally cohesive, slicing it into services just to please a model could be a mess. You’d trade one limitation for five new ones. His own caveat kind of saves him here, because he says this is the current state of the art, not eternal truth.
Justy: I think that’s the practical read too. Not everybody should run out and atomize their stack. But teams already operating with clear service boundaries probably get a cleaner path to this style of regeneration. And the human architect still matters because somebody has to orchestrate the whole thing and watch for system behavior that only shows up in the seams.
Cody: No way.
Cody: Yeah, that last bit may be the most important sentence in the whole conversation. Humans own correctness. Humans own the context. Humans own emergent properties. The agent can draft, question, regenerate, maybe even do a lot of the boring translation. But it does not get to declare the system good just because the local outputs look neat.
Justy: So the actual takeaway for me is pretty simple. The exciting part is not, wow, the bot writes code. It’s that good teams might finally have a reason to treat specs, constraints, and decision records like living product assets instead of paperwork you abandon after kickoff.
Cody: That’s a good read, Justy. Also, if your spec says midnight furniture migration, I’m rejecting the pull request on sight.
Justy: Cody, fair. Keep your cursed chat threads. I’m keeping the better specs fantasy for at least one more cross-country trip.