Justy: Okay, I just read this thing and my first thought is—holy crap, they’re finally talking about the spiral in agents. Cody: Mm-hm. Justy: Like, every time you spin up a sub-agent or call a tool, you’re just dumping more tokens into the mix, and then the next turn has to re-read all of it. It gets ridiculous fast. Cody: Right. And that’s where this Nemotron 3 Ultra thing slots in—they’re pitching it as the heavy lifter for the steps. Justy: Frontier reasoning plus five times the throughput? That’s the dream for anyone running long workflows. Cody: Yeah, if the throughput claim holds. Five times is… a lot. Justy: Well, they’re leaning on NVFP4 quantization and LatentMoE. Cody, you’ve seen MoE routing before—does LatentMoE actually cut the compute the way they say? Cody: It’s MoE but with a compressed latent space for the router. So instead of firing a bunch of experts every step, it picks a few and only materializes those. And NVFP4 is four-bit weights with per-channel scales—so yeah, you can fit way more on the same GPU without losing accuracy. Justy: And the hybrid Mamba-Transformer layers—that’s for the long context, right? Cody: Mamba handles the sequential part cheaply, Transformer layers kick in when you need the expressivity. It’s a trick to keep long-context inference from blowing up. Justy: So—this is such an Exploring Next take—if you’re running a multi-agent pipeline and you keep hitting context limits or token costs, swapping in Nemotron 3 Ultra for the orchestration layer could actually be a no-brainer. Cody: God, you already have the product slide in your head. Justy: I’m just saying—thirty percent lower token cost on SWE-bench? That’s real money for teams shipping agents at scale. Cody: Okay, but look at the EnterpriseOps-Gym number—thirty-three percent on long-horizon planning. That’s behind GLM and Qwen. So it’s not wins. Justy: Fair. But they’re still leading on PinchBench and Long Context Ruler at a million tokens. And they hit ninety-five percent on Ruler—no one else in the table even a million. Cody: Yeah… and the open weights and recipes are a big deal. If you’re in a domain where none of the teachers fit, you can fine-tune it with your own data. Justy: Which, by the way—Multi-Teacher On-Policy Distillation with more than ten domain-specific teachers. That’s how they’re getting the specialization without starting from scratch every time. Cody: Mm-hm. And the RL pipeline’s fully transparent, so you can audit what the model’s actually learning from. Justy: Anyway, I flew in late last night and my brain’s still on west coast time, so bear with me— Cody: You sound like you drank three espressos on the plane. Justy: I did. But the thing that’s sticking with me is the angle. You don’t need Nemotron for every single call—just the ones where the agent has to think hard. Cody: Right. So you’d pair it with a smaller, faster model for the routine stuff and only route the complex turns to Nemotron. Justy: Exactly. And if the throughput and token efficiency pan out, the math might actually work. Cody: I mean… I’m still side-eyeing that five-x claim until I see third-party repros. But the architecture checks out. Justy: And the benchmarks—even with the mixed bag—are strong enough that I’d at least kick the tires. Cody: Yeah. If your agents are blowing through context or tokens, it’s worth a look. If not… eh. Justy: There it is—the Cody caveat. Always a caveat. Cody: Someone’s gotta be the skeptic. Justy: Alright, forty-six-whatever this is. Next time you’re in LA, we’re testing Nemotron on my to-do list. See if it can finally organize my inbox. Cody: Good luck with that.