Justy: So you've got a stack of documents—call it a thousand pages—and you need to answer a question that touches five different parts of five different files. You chunk it, throw chunks at an LLM, get back some facts. Then what? Suddenly you're sitting on a pile of evidence and no clear way to combine it all without just… concatenating everything and hoping the context window holds.
Cody: Right. That's the aggregation bottleneck. And it gets worse because each chunk extraction is local. One document might mention that a company was founded in 2015. Another chunk says it was founded in 2014. The LLM doesn't know these contradict because it never saw them side by side.
Justy: So SLIDERS flips that. Instead of chunking and concatenating, you extract into a database.
Cody: Exactly. You define a schema—let's say companies, funding rounds, executives. You run extraction on each document chunk and insert structured records into those tables instead of collecting text snippets.
Justy: And then you query it with SQL.
Cody: Exactly. SQL is way more precise than reasoning over concatenated text. You can say, 'give me all companies founded between 2010 and 2015 that raised Series A funding,' and the database gives you exactly that.
Justy: But if I'm extracting from a thousand documents, I'm probably going to get duplicates or conflicts.
Cody: That's where reconciliation comes in. SLIDERS identifies duplicates and inconsistencies using metadata and extraction rationales to decide which record is right or whether they need to be merged.
Justy: How much does this actually improve over just throwing everything at GPT-4?
Cody: On existing benchmarks, SLIDERS beats GPT-4 by 6.6 points. At 3.9M and 36M token scales, it improves by 19 and 32 points respectively.
Justy: But that's benchmark. Real world—who's building with this?
Cody: SLIDERS is clever, but not plug-and-play. You need to design your extraction schema and set up a database. If your documents are semi-structured—financial reports, contracts—where you can define a clear schema, then yes. If they're totally unstructured, the overhead might outweigh the benefit.
Justy: And latency? The reconciliation step adds a pass over the data.
Cody: Right. So it's not real-time like simple RAG. But for batch jobs—'analyze these documents overnight'—it's solid. The reconciliation overhead is worth it because you get correctness guarantees you don't get from text concatenation.
Justy: If you were building this yourself, what would you do differently?
Cody: I might start with a lighter schema—maybe key-value pairs instead of full relational normalization. Get extraction and reconciliation working, then add structure as you learn what questions matter. Also, use SQLite instead of a full database server for solo projects.
Justy: What do people actually grab and try?
Cody: The GitHub repo is at stanford-oval/sliders. If you want to experiment, run it on a small document set first—like a single annual report. See if the schema makes sense for your domain.
Justy: For someone working solo?
Cody: Start with a simple project. Take a public dataset like Wikipedia articles or SEC filings, define a minimal schema, write a Python script to extract facts using an LLM API, insert them into SQLite, and write SQL queries to answer test questions. You'll see immediately where the schema breaks.
Justy: That's a weekend project.
Cody: Yep. And if it works, you've learned whether structured extraction is worth the effort for your use case.
Cody: Exactly. And the reconciliation piece is the part that makes it work in the real world, where extraction is messy and inconsistent.
Justy: This is Exploring Next, episode 330. Thanks for walking through the paper with me, Cody.