Cody: Okay so I just read this paper and I need to know if I’m missing something. Justy: Mm-hm. Cody: They’re letting an LLM be its own data engineer. Like, the whole pipeline—planning, generating, testing, iterating—no humans touching it. Justy: Right. Cody: And they’re saying GPT-5.2 built a curriculum that improved a student model by fifty-seven percent. Justy: Fifty-seven POINT twenty-nine. Cody: Of course you remember the decimal. Justy: I mean, that’s the headline. And it’s NOT a typo. Cody: So the stuck problem here is obvious: we’ve been hand-rolling these data curation workflows forever, right? Domain adaptation always needs domain data, and getting it is slow, expensive, or both. Justy: Exactly. And for most teams, the moment you move off general tasks—finance, legal, internal docs—you’re basically stuck unless you’ve got a dataset someone already built. Cody: Which is never tailored enough. So they’re flipping it: what if the model just writes its own training data, tests it, and keeps rewriting until it works? Justy: And the kicker is they’re treating data like code. Optimize, measure, iterate. Cody: Yeah. So the agent starts by defining the domain—say, medical records—then designs prompts, synthesizes a dataset, trains a student model on it, evaluates the student on a test set… Justy: Mm-hm. Cody: …and if the student sucks at, I dunno, extracting dosage info, the agent goes back and generates more data targeting that specific gap. Rinse, repeat. Justy: Okay, so this is the part where I’m supposed to say this is Exploring Next gold, right? Agent-driven specialization. Cody: You would. Justy: But come on—imagine shipping this. You’re a startup with zero labeled data in your niche. You fire up DataAgent, point it at your docs, and a week later you’ve got a model that actually understands your stuff. Cody: A week? Justy, you’re assuming the compute budget of a small country. Justy: Fine, a month. With cloud costs that make your CFO cry. Cody: And that’s before we talk about the feedback loop problems. If your eval set’s even slightly skewed, you’re just teaching the model to game the metric. Justy: Right, right. But the paper’s not claiming it’s production-ready. It’s saying the capability exists. Autonomous data engineering as a measurable thing. Cody: Fair. And the code’s on GitHub—DataAgent. So if someone wanted to poke at it, they could. Justy: So you’re saying it’s research-only for now. Cody: I’m saying if I were building this, I’d start with a tiny domain and a very tight eval harness. And I’d still expect to debug for weeks. Justy: Meanwhile, my brain’s already writing the product spec. ‘Just add your docs, we’ll handle the rest.’ Cody: That’s such a you move. Justy: Anyway. Flight’s delayed, so I’m just sitting here in the airport lounge, reading papers like a weirdo. Cody: Of course you are. What’s your ETA? Justy: Another two hours. Anyway—this thing feels like it could un-block a lot of teams. Cody: Maybe. But I’d bet good money the first three companies that try it hit a wall on eval data quality. Justy: Or they overfit to their own benchmarks and ship a model that’s amazing at the test set and useless in the wild. Cody: Bingo. And that’s episode four-forty, I guess. Justy: God, we’re up to four-forty already? Anyway—DataAgent’s on GitHub if you’re brave. I’m gonna go find a coffee that doesn’t taste like jet fuel.