The OCR trap: when the invoice PDF lies
A founder forwards 200 vendor PDFs. Generic OCR returns '100000', '10,00,00', '1.00.000' — three different number formats for the same amount. OCR returns text; reconciliation needs fields with confidence.
Riya at her desk, eyebrow raised. Laptop screen shows three OCR outputs side-by-side, each from a different invoice PDF: '100000', '10,00,00', '1.00.000'.
“200 PDFs. Three formats. Same number. Or is it?”
Transcript›
Is this ten thousand, one lakh, or a hundred thousand euros?
OCR returns text. Reconciliation needs fields.
Riya pointing at the laptop. The three invoices are now parsed into a structured table with amount, unit, and confidence score columns. Invoice A: ₹1,00,000 INR confidence 0.97 high. Invoice B: ₹1,00,000 INR confidence 0.82 review. Invoice C: €1,00,000 EUR confidence 0.74 with a locale flag.
“A raw-text dump is not a vendor invoice. It's a guess in a costume.”
Transcript›
Field-level confidence. Unit detected. Locale flagged. Now I can reconcile.
Document intelligence is OCR + structure + confidence. Not just OCR.
Riya with a slight smile. Screen shows two columns: high-confidence invoices auto-matched on the left, two flagged invoices queued for human review on the right with the confidence band and reason visible.
“Trust, but verify the confidence band.”
Transcript›
Auto-match what's clean. Review what's flagged. Stop arguing with raw text.
The audit asks: how did you decide? Field-level confidence is the answer.
The longer take
OCR — optical character recognition — has been a commodity capability for over two decades. Any modern vision model can extract the characters from an invoice PDF with high per-character accuracy. The trap is that high per-character accuracy is not the same as a reconcilable invoice.
Consider the three invoices in the panels: '100000', '10,00,00', '1.00.000'. Each is technically correctly OCR'd — those are the characters on the page. But each represents a different reconcilable value depending on the locale, format, and unit. '100000' with no separator could be one lakh (INR) or one hundred thousand (USD). '10,00,00' is the Indian comma format for one lakh. '1.00.000' is the European decimal format for the same amount. A reconciliation engine that treats all three as raw strings will fail to match them; an engine that parses them all as ₹1,00,000 will succeed, but only if the parsing is unit-aware and locale-aware.
The architectural difference between OCR and document intelligence is structure. Document intelligence parses the raw text into typed fields (amount, currency, date, vendor, line items), assigns a per-field confidence score, and flags edge cases (multi-currency, locale ambiguity, unusual formats) for human review. The output is not text; it is a structured invoice object that downstream reconciliation can join on amount and currency without re-doing the parsing.
The reason this matters operationally is that the volume of vendor invoices for any growing business is large and unstructured. A D2C founder uploading 200 vendor PDFs a month cannot manually validate the OCR for each one. The cost of building this manually — vendor-specific parsers for every supplier — exceeds the cost of just typing the data in. Document intelligence as a category exists because the alternative is either typing or accepting OCR errors silently into the books.
What good document intelligence looks like in practice: 80–90% of fields auto-extract with high enough confidence to skip review. The remaining 10–20% surface as a review queue with the specific field, the confidence score, and the reason (low confidence, locale ambiguity, unusual format). A human resolves the review queue in minutes per month, not hours per week. The audit trail records which fields were auto-accepted and which were reviewed, with the original PDF attached. The reconciliation downstream joins on the structured fields, not the raw text.