Blog

Engineering blog

How we build Koji — extraction pipelines, benchmarking methodology, schema design, and lessons from running document AI in production.

June 29, 2026 / extractionhitlproductstrategy

The Review Queue Is the Product

Most document AI vendors treat human review as the embarrassing fallback for when automation fails. We treat it as the core feature — the part that makes the rest trustworthy.
June 26, 2026 / extractionschemasworkflowagents

Don't Let the Agent Grade Its Own Homework

An AI agent can improve a document extraction schema on its own — read the failures, edit the config, re-test. The hard part isn't the loop. It's making sure the agent can't lie to itself about whether it worked.
June 18, 2026 / extractionbenchmarkingllm

HTML Tables Won't Save Your Extraction Accuracy

We tested four table encodings — markdown, HTML, JSON, and CSV — across three models and 232 documents. Re-encoding tables changed accuracy by amounts indistinguishable from noise. The only thing that moved was the token bill.
June 11, 2026 / securitymulti-tenancyengineering

We Don't Trust Our WHERE Clauses

In a multi-tenant system, the worst bug is the query that silently returns another customer's data. Here's how Koji makes that structurally impossible with Postgres row-level security — and the test that proves it holds.
June 5, 2026 / extractionschemasworkflow

Schema TDD: Building Document Extraction Without Opening a Browser

Schema development isn't a configuration task — it's an engineering discipline. Here's how an iterative push-extract-inspect loop gets you to 96% accuracy in hours, not weeks.
May 29, 2026 / extractionbenchmarkingmodels

Bigger Models Don't Extract Better

We tested GPT-4o-mini, GPT-4o, Llama 3 8B, and Llama 3 70B on 165 documents. GPT-4o is worse than GPT-4o-mini at structured extraction — and we found out why.
May 22, 2026 / open-sourcestrategy

Why Open Source for Document AI

We made Koji open source because the security claims that matter most are the ones you can verify yourself.
May 20, 2026 / securityarchitecture

Where Your Documents Go During Extraction

The first question every security team asks when evaluating document AI: 'If I upload a policy PDF, who sees it?' Here's exactly what happens at every stage.
May 18, 2026 / extractionmethodology

Null Semantics: When "Nothing" Is the Right Answer

Every extraction system can pull values out of documents. The harder problem is knowing when a value isn't there — and handling that correctly.
May 16, 2026 / extractioninfrastructure

Rate Limits, Retries, and the Hidden Accuracy Killer in LLM Pipelines

We spent weeks investigating a 6% accuracy variance. The root cause wasn't the model or the prompts — it was silent HTTP 429 errors treated as 'field not found.'
May 14, 2026 / extractionrouting

Why Heuristic Routing Fails on Long Documents

When a 120-page insurance policy goes through extraction, the AI sees fragments. If the router picks the wrong chunks, the AI can't extract what isn't in front of it.
May 10, 2026 / benchmarkingmethodology

Benchmarking Document Extraction: How We Measure Accuracy Across 1,100 Documents

Every document extraction vendor claims 95%+ accuracy. None of them publish how they measure it. We built an open, reproducible benchmark — here's the methodology.
May 6, 2026 / extractionarchitecture

Schema-Driven Extraction: Configuration Over Code for Document AI

Most extraction approaches rely on prompt engineering. Schema-driven extraction replaces the hope with a contract — typed fields, validation rules, and routing hints in a YAML file.

The Review Queue Is the Product

Don't Let the Agent Grade Its Own Homework

HTML Tables Won't Save Your Extraction Accuracy

We Don't Trust Our WHERE Clauses

Schema TDD: Building Document Extraction Without Opening a Browser

Bigger Models Don't Extract Better

Why Open Source for Document AI

Where Your Documents Go During Extraction

Null Semantics: When "Nothing" Is the Right Answer

Rate Limits, Retries, and the Hidden Accuracy Killer in LLM Pipelines

Why Heuristic Routing Fails on Long Documents

Benchmarking Document Extraction: How We Measure Accuracy Across 1,100 Documents

Schema-Driven Extraction: Configuration Over Code for Document AI