Justy: If your agent is chewing through context like crazy, the bill and the lag show up fast. That’s the part people feel.
Justy: Welcome back to Exploring Next, episode 347. I’m Justy, and Cody’s here with me in person, which is nice because we can argue about token bills without a screen in the way.
Cody: Yeah, and this one’s timely because a lot of agent apps are doing the dumb expensive thing. They load a ton of text up front, then keep paying to resend it. That scales badly.
Justy: And the user doesn’t care that the prompt was elegant. They care that the thing took forever and the usage dashboard looks weird. So who is this actually for?
Cody: Mostly teams building production agents. Support workflows, internal ops, coding assistants, research assistants. Anywhere the agent keeps looping over the same app state, docs, or user history. Strands Agents is interesting because it leans into tools and session-aware state instead of treating every turn like a fresh essay.
Justy: That feels like a real adoption barrier, though. People already have some agent stack half-working. If the new thing means rewiring the whole flow, they’ll probably stay put.
Cody: Right, and that’s the trade-off. The clever part is the pattern: keep the model smaller in the moment, and let it call out for only what it needs. The article’s point is basically that you can cut token usage a lot by not stuffing everything into the prompt. I think the headline number was 96% in some cases, which is huge if it holds in your workload.
Justy: Ninety-six is wild. But I’m always thinking, okay, what’s the story for the person actually paying for this? Is it a startup with one agent, or a bigger team with lots of sessions and lots of repeated context?
Cody: Bigger teams feel it first. If you have dozens or hundreds of agent sessions, tiny inefficiencies become real money. And latency matters too. Smaller prompts mean faster turns, which makes the agent feel less like it’s thinking in molasses. [chuckles]
Justy: So how does it work under the hood? Because I’ve seen a lot of agent frameworks say they’re efficient and then they just move the mess around.
Cody: That’s fair. Strands Agents is built around orchestration rather than one giant prompt blob. The model can use tools, and those tools can fetch fresh context from external systems or session memory. So instead of serializing your whole world into the context window, you make context a resource the agent requests. That’s the part I find genuinely smart.
Justy: And the weird part is that it sounds more annoying for the developer, but better for the product. Which is usually how these things go.
Cody: Yeah. More moving pieces on the back end, but cleaner behavior for the user. The downside is you need good tool boundaries and decent retrieval. If your tool calls are sloppy, the agent just becomes a confused little tourist asking for the wrong map.
Justy: [sighs] That’s a very vivid image. I buy it, though. And from a product angle, the barrier is not just technical. It’s also trust. Teams need to believe the agent won’t forget something important because it wasn’t in the prompt.
Cody: Exactly. The source’s core idea is that agents should be stateful without being bloated. That’s a nice middle ground. I do think the article is probably a little optimistic if someone reads it as 'just swap frameworks and your costs vanish.' You still need good evals, logging, and a sense of what context actually matters.
Justy: Yeah, I’d push that too. The market doesn’t adopt a framework because it’s elegant. They adopt it when the first few workflows feel safer, cheaper, and easier to ship. Otherwise it sits in a repo and everybody nods at it.
Cody: Build Next-wise, I’d start simple. Use the AWS Strands Agents repo and wire up one agent that can answer questions from a small docs folder. Then add a tool that fetches only the relevant file chunks on demand, and log tokens before and after.
Justy: For a solo builder, that’s a solid weekend project. You could do the same thing with a local markdown folder and a tiny command-line app. No big platform needed.
Cody: Yeah, and if you want to get more serious, compare that against a basic LangChain or custom tool-calling setup. Same task, same inputs, different orchestration style. Measure latency, token use, and how often the agent asks for the wrong thing.
Justy: That’s the real test. Not whether it sounds clever in a blog post, but whether your app gets cheaper and less annoying to use. Alright, I’m gonna call that a win for lunch-table engineering today.
Justy: We’ll leave it there. Exploring Next, episode 347. Cody, thanks for the deep dive, and yeah, I’m still thinking about that tourist with the wrong map.