Justy: I genuinely cannot believe we're on episode four-oh-eight and we're still talking about RAG.
Cody: I mean, it keeps not being solved, so.
Justy: Fair point. Okay so this one's actually interesting though — it's about moving beyond pure vector search, adding graph structure to RAG for production systems. The core argument is that vectors capture meaning but they throw away relationships, and for interconnected enterprise data that's a real problem.
Cody: Right.
Cody: And that part I actually buy. The example they use is pretty concrete — you've got a supply chain, Supplier A provides Component X to Factory Y, that's structured data sitting in SQL. Then there's an unstructured news report about flooding at Supplier A's facility. Vector search will surface the news article if you ask about production risks, but it has no idea that Factory Y depends on that supplier.
Justy: So the LLM gets the news but can't answer which downstream factories are at risk. It either hallucinates the connection or just gives up even though the data's technically in the system somewhere.
Cody: Exactly. That's the multi-hop reasoning gap. Vectors are flat — they know this chunk is similar to that query, but they don't know that A connects to B connects to C.
Justy: How was your week by the way? I feel like I haven't talked to you since — was it Thursday?
Cody: Yeah, Thursday. It was fine, mostly heads down on this pipeline refactor. Finally got the DAG scheduling sorted out Friday afternoon which, honestly, I didn't think was going to happen. So that was a win. How about you?
Justy: Pretty chill. Finally replaced the garbage disposal which sounds like nothing but it's been broken for like three weeks and I just kept... not calling someone. Anyway — this graph RAG thing, the pattern they describe is a three-layer stack. Ingestion, storage, retrieval.
Cody: Mm-hm.
Cody: And the ingestion part is where they're most opinionated. The author worked on logging infrastructure at Meta and their takeaway was you have to enforce structure at ingestion time. You can't try to reconstruct relationships from messy data after the fact. So during ingestion you run entity extraction — LLM or NER — pull out the nodes and edges, link them to what's already in the graph.
Justy: Which makes sense in theory but Cody, that's also the part where I get skeptical. Entity extraction with an LLM is noisy. You're adding another model to your pipeline that's going to miss things or hallucinate relationships, and now your graph has garbage in it.
Cody: Right, right.
Cody: And a graph with wrong edges is arguably worse than no graph at all, because now you're traversing confidently in the wrong direction. At least with flat vector search the failure mode is obvious — it just doesn't connect the dots. With a bad graph you get very confident wrong answers.
Justy: So who should actually care about this? Like, is this only supply chain people or is it broader?
Cody: The article names supply chain, financial compliance, and fraud detection specifically. And I think those are the right examples. Those are domains where the data is genuinely interconnected — like, fraud detection is literally about finding patterns across relationship chains. That's not a vector problem, that's a graph problem that happens to also have unstructured text attached.
Justy: Okay.
Justy: I think the practical question is whether your users are actually asking multi-hop questions or if they're mostly doing semantic search with slightly more context. Because if it's the latter, adding Neo4j and entity extraction pipelines is a lot of operational overhead for marginal improvement.
Cody: Yeah, and that's where I think the article overgeneralizes a bit. It's framed as like, vectors are insufficient, you need graphs. But really it's vectors are insufficient for this specific class of question. Most RAG deployments I see are still basically semantic search over documents, and for that, flat vectors work fine.
Justy: It's the classic architecture article problem though. You find a real gap and then the implication becomes everyone should adopt the solution.
Cody: I mean, that's literally every architecture blog post ever written. But to be fair, the retrieval pattern they describe is actually elegant. You do a vector scan to find entry points in the graph based on semantic similarity, then traverse from those nodes to gather structured context. So you get the best of both — the fuzzy matching of vectors and the deterministic paths of the graph.
Justy: That part I like. It's not replace vectors with graphs, it's use vectors to find the front door and then walk the graph to get the full picture. That's a product I can imagine building.
Cody: Sure.
Cody: The storage model is interesting too. They store vector embeddings as properties on specific nodes — like a RiskEvent node would have both its graph relationships and its vector embedding. So you're not maintaining two separate systems that might drift out of sync.
Justy: Oh, that's smart. I was assuming they'd have the vector DB and the graph side by side and have to reconcile them somehow.
Cody: No, and that's actually the detail that makes this feel production-viable to me. Dual systems that have to stay consistent is a nightmare. Embeddings as node properties keeps it single-source.
Justy: The reference implementation uses Python, Neo4j, and OpenAI. Cypher queries for the graph traversal part.
Cody: Which is fine for a reference but I do wonder about scale. Neo4j's community edition has limits, and once you're talking enterprise supply chain data the graph gets big fast. The hybrid query — vector scan plus traversal — could get expensive if your entry points are noisy and you're traversing huge subgraphs.
Justy: Right.
Justy: But I think the pattern is still sound even if the implementation needs work. The insight that structure has to come in at ingestion, not be reconstructed later — that's the Meta lesson and it's a good one. I've seen so many teams try to do the reconstruction thing and it's always a mess.
Cody: Agreed. That's probably the most broadly applicable takeaway even if you never touch a graph database. If your data has structure, preserve it early. Don't flatten everything and hope you can piece it back together.
Justy: Alright Cody, I'll let you get back to your DAG. Thanks for walking through this one with me.
Cody: Anytime. The garbage disposal thanks you too.