Justy: The annoying part is when an agent sort of works, then quietly hands the next step a mess.
Justy: Welcome to Exploring Next, episode 316. I'm Justy, in Cody's kitchen again, and today we're talking about panini and precision for agent workflows.
Cody: Yeah, and this matters right now because more people are wiring models into tools, file systems, and little orchestrators. A vague answer is no longer just annoying. It can trigger the wrong action.
Justy: Exactly. For a regular user, that shows up as, why did the app say it checked the file when it actually checked the wrong path, or why did two agents disagree and nobody told me which one was stale.
Cody: So the repo's premise is pretty sharp. The author argues a lot of agent failures are precision failures, not capability failures. The model can do the task, but it leaves the relation between actor, object, tool, and cause implicit.
Justy: Which is a product problem fast. If your system sounds polished but hides the causal chain, support gets mystery bugs instead of actionable reports.
Cody: Right. panini tries to force a schema onto outputs using six roles. kartā is the acting component, karma is what got acted on, karaṇa is the tool or mechanism, sampradāna is the purpose, apādāna is the source or root cause, and adhikaraṇa is the layer or location where it happened.
Justy: I like that it's aimed at the handoff surface. Not just, can the model answer, but can another process consume the answer without guessing. [pause] That's a very different bar.
Cody: And the examples are concrete. One is a file-reader tool returning empty output with no error. The terse version just lists possible causes. The panini version labels the actor as the file-reader tool, the object as file content, the instrument as the read operation, the locus as the tool output layer, then breaks causes into file state versus tool layer.
Justy: That feels useful for teams building internal agents, dev tools, ops dashboards, maybe customer support workflows. Anyone with a chain where step four depends on what step three actually meant.
Cody: Another good example is two agents disagreeing on world state. panini turns that into a resolution hierarchy tied to a truth-grounding mechanism, timestamps, and a structured handoff with confidence and source fields. I think that's the clever part. It's not asking for prettier prose. It's asking for machine-legible causality.
Justy: Though I can see an adoption barrier. Most teams do not want their outputs to suddenly look academic or bulky. If you're shipping a user-facing assistant, readability still matters.
Cody: Totally. The repo kind of admits that. There are modes. viveka is lighter and mainly strips hedging. shuddha is the full structured version. sutra goes even more compressed and aphoristic. I'd probably avoid that one unless your downstream parser really wants it. [chuckles]
Justy: The numbers are interesting too. On the main eval, panini gets score-3 completeness on 87 percent of Anthropic runs and 95 percent on OpenAI, versus 30 and 65 percent for terse prompting. That's a real jump.
Cody: Yeah, and ambiguity drops a lot on Anthropic, from 5.7 hedges per response in terse mode to 4.0 in panini. On OpenAI it's basically flat because the terse baseline was already low. So the gain depends on model behavior.
Justy: The cost is tokens. Anthropic especially. The average goes from 392 in terse to 793 in panini. That's not a rounding error. If you're doing ten-step loops, that bill shows up.
Cody: I could be wrong, but I read that as a fair trade in failure-sensitive systems. If step attribution matters, extra tokens may be cheaper than silent retries or bad tool calls. In a simple single-turn app, I'd skip it.
Justy: Same. My honest hesitation is whether teams keep the discipline. A prompt skill helps, but if the orchestrator doesn't actually consume kartā, karma, and apādāna, then you're paying for structure nobody uses.
Cody: That's the real test, Justy. The repo measures output quality, not end-to-end business outcomes yet. Still, it's refreshingly explicit about that. This is an experiment, not a grand theory.
Justy: If you want to try it, the quick path is npx skills add dpaul0501/panini, then turn on panini mode or add the always-on instructions to your agent prompt. Solo builders can do that in an afternoon.
Cody: And if you want the evals, clone the repo, run uv sync, then use uv run python evals/llm_run.py --provider openai or --provider anthropic with your API key set. There's also support for OpenAI-compatible endpoints, so you can poke at local stacks too.
Justy: A good weekend project is a tiny two-agent workflow: planner plus tool-runner. Run the same tasks in terse mode and shuddha mode, then compare whether your logs are easier to debug when the tool fails or the handoff gets fuzzy.
Justy: That's episode 316 of Exploring Next. From Cody's kitchen, with too much coffee and just enough structure, we'll catch you next time.