Justy: Sixty-five percent. That's the number that stuck with me.
Cody: Miro pointing agents at Snowflake and getting it wrong two out of three times?
Justy: Yeah — and it wasn't the model, it was the context. Ten thousand tables, no semantic layer, and the agents just… hallucinated joins.
Cody: Right.
Justy: Which is wild because that's the exact pitch everyone's making for text-to-SQL right now. Just point the agent at the warehouse!
Cody: And it falls apart the second your org is big enough to have multiple teams naming things differently. Schema tells you what columns exist. It doesn't tell you which ones actually go together for a business question.
Justy: Okay so DataHub's new thing — Context Intelligence — basically says, instead of giving the agent raw schema, give it the query history. The joins that already worked.
Cody: That's the core argument, yeah. They've been doing lineage tracking for years, pulling query logs from warehouses. Now they're flipping that same infrastructure to build what they call a semantic index.
Justy: Wait, so the plumbing isn't new?
Cody: That's the part I actually like. They didn't build this from scratch. The query log extraction and SQL parsing was already running in production for lineage. Postgres, Snowflake, BigQuery — they've been pulling from those for years. The new layer sits on top.
Justy: That's smart product sequencing. You already have the pipes, now you monetize the insight.
Cody: Sure, but let's talk about what they're actually doing with those logs. They filter for what they call golden queries — high-quality analyst queries and scheduled pipelines that represent proven business logic.
Justy: Mm-hm.
Cody: Then they invert the SQL patterns into structured text definitions they call semantic anchors. Those anchors become the retrieval basis the agent queries before it generates SQL.
Justy: So instead of the agent guessing how to join tables, it looks up what joins humans already validated?
Cody: In theory, yes. And there's a human review step on top — domain experts can resolve conflicting definitions and simulate changes before publishing.
Justy: Cody, I have to ask — how many orgs actually have clean, consistent query history? Like, is the golden query problem itself a mess?
Cody: That's exactly where I push back. If your analysts are writing garbage SQL and your scheduled pipelines are held together by duct tape, your golden queries are just well-documented garbage.
Cody: The approach assumes a certain maturity. Orgs that actually have analysts who know the data and write consistent queries — this helps them a lot. Orgs where everyone's just winging it? Query history won't save you.
Justy: But for the Miro case — ten thousand tables — someone's been writing correct queries against that environment. You're not starting from zero.
Cody: Right, and that's where I think the argument does hold. If you're a company big enough to have that many tables, you almost certainly have analysts who know the right joins. The problem is that knowledge is trapped in their heads and in their query tabs.
Justy: Extracting it is the actual product insight. Like, the reframing here is — this is a retrieval problem, not a generation problem. The model doesn't need to be smarter about joins. It needs better context about what's already worked.
Cody: I agree with that framing. The model was never going to figure out that your finance team calls it gross_revenue and your sales team calls it total_bookings from first principles. That's lookup, not reasoning.
Justy: And they're exposing this via MCP, LangChain, Google's Agent Development Kit, and CrewAI. So it's pluggable.
Cody: Which is table stakes for this kind of thing honestly. If you built a context layer and only exposed it through your own SDK, nobody would adopt it.
Justy: True. Okay, I got sidetracked earlier — I was looking at my coffee and realized I haven't actually had coffee yet today. I got up, started reading this DataHub piece, and just… forgot.
Cody: That is very on-brand for you.
Justy: It really is. Anyway — who do you think this actually changes things for? Like, practically?
Cody: Data teams at mid-to-large companies who've been putting off building a semantic layer. This is basically a shortcut to one, derived from behavior instead of declared rules.
Justy: That's a way better pitch than most semantic layer vendors give. They all say 'define your metrics once.' This says 'we'll learn them from what people already did.'
Cody: The learned-versus-declared distinction is real. But I'll say it again — you need the behavioral data to be worth learning from. If your warehouse is chaos, query logs just give you a map of the chaos.
Justy: Fair. But if you've got fifteen thousand contributors and three thousand production deployments on the open source side, there's a decent chance your deployment base isn't total chaos.
Cody: Decent chance. Not guaranteed.
Justy: Same! Shows you where the long tail actually lives. Alright Cody, go find your own golden queries today. Preferably before your own SQL starts hallucinating.
Cody: That's… that's actually what I'm trying to avoid.