Justy: Okay so — Apex. A fast, specialized model for React Native. And I have to ask, Cody, is this actually different, or is this just another 'we fine-tuned something and called it a product' situation?
Cody: Oh, it's different in one way, which is that they're at least honest about what they're doing. They say straight up — general models miss the framework conventions, the library behavior, the cross-platform stuff that decides whether a React Native answer is actually useful.
Justy: Mm-hm.
Cody: And I don't think that's wrong. But then the question becomes — does training on React Native repos actually get you there? Or are you just... building a model that's good at React Native code in the same way a model trained on GitHub is good at GitHub code. Which is to say, kind of, but also not really.
Justy: I think the economic argument is actually the more interesting one. The article points to GitHub Copilot's shift to usage-based billing as a signal — running agentic workflows on frontier models is expensive. And smaller, optimized models are proving they can alter that cost-performance curve.
Cody: Sure. But Cursor Composer 2 and Windsurf SWE-1 are also general-ish tools that happen to use smaller models well. That's not the same as saying a model trained specifically on React Native is better.
Justy: No, but React Native genuinely has weird constraints though. Native modules, third-party libraries that break across versions, the whole iOS-Android split. That's not just 'general coding with a React skin.'
Cody: That's fair. But here's where I get skeptical. They evaluated Apex against React Native Evals. Who runs React Native Evals?
Justy: Oh, that's a good question.
Cody: Because if it's Callstack, that's... that's a little bit of a conflict of interest, Justy. They're saying 'within its specific domain, this optimized model alters the performance-to-cost ratio significantly.' Significantly. What does that mean? We don't know. We have their word.
Justy: To be fair, they also say the model is in private beta with their own engineers. That's not nothing — they've been running it for a couple months. February thirteenth they started experiments, April second internal testing began.
Cody: Right, but that's also the problem. 'Internal testing with our open-source developers.' Their developers. On their code. Which means the training data probably looks a lot like the test set.
Justy: Okay but — they specifically say they did not do a random web scrape. They cherry-picked around the libraries and frameworks their engineers see in daily delivery. That's either a weakness or a strength depending on how you look at it.
Cody: Both. It's both. It means the model is probably really good at the stuff Callstack works on, and maybe not great at the stuff Callstack doesn't work on. Which is... fine? But it's not 'a specialized model for React Native.' It's a model specialized in what Callstack does in React Native.
Justy: That's... actually a pretty important distinction.
Cody: Yeah. And then there's the base model choice. They started with proof-of-concept experiments on Devstral and Qwen, landed on Gemma 4 because it was already stronger for React Native before specialization. So how much is the specialization actually doing versus just picking a good base?
Justy: I mean, that's a fair question, but also — isn't that the whole point of fine-tuning? You pick a base that's close and then you push it further?
Cody: Sure, but we can't tell from the article how much further it actually got pushed. They trained with SFT and GRPO. Fine. Those are standard techniques. But there's no ablation study, no comparison to just prompting Gemma 4 really well.
Justy: Right, right.
Cody: So here's my actual read — the economic logic is sound, the specialization thesis is plausible, and React Native might genuinely be a good candidate for it. But the evidence is thin and self-reported. We need public benchmarks. We need someone else running the evals. We need actual users who aren't Callstack engineers.
Justy: And the private beta is the right move for exactly that reason. They're being cautious about the claims, at least.
Cody: I guess. Though 'private beta while we finish the legal and operational work' is also a very careful way of saying 'we're not ready to be judged yet.' Which is fair, but also — that's not the same as 'this works.'
Justy: No, no. And I think where we land is — this is probably useful if you're a React Native team. The cost angle is real, the domain focus is probably real, and Callstack knows this space better than most. But we're taking their word on the performance claims until someone else runs the numbers.
Cody: Yeah. That's the honest version. The theory is solid. The execution — we'll see.
Justy: Alright, I'll take it. Still better than another 'we built an AI coding tool' press release.
Cody: Low bar though, Justy.
Justy: It really is. Alright, that's Apex. We'll see where it lands.