🎁 Early Access Open: Get 1 Month of Growth Tier Free when you sign up

Learn
EP05·document-intelligenceocrapmanga

The OCR trap: when the invoice PDF lies

ByAmit Mishra·Founder, ReconPe·Narrated by Riya Bhattacharya, CA

A founder forwards 200 vendor PDFs. Generic OCR returns '100000', '10,00,00', '1.00.000' — three different number formats for the same amount. OCR returns text; reconciliation needs fields with confidence.

1. SETUPPanel 1 of 3

200 PDFs. Three formats. Same number. Or is it?

Transcript
Dialogue — Riya

Is this ten thousand, one lakh, or a hundred thousand euros?

Side monologue

OCR returns text. Reconciliation needs fields.

2. REINFORCEPanel 2 of 3

A raw-text dump is not a vendor invoice. It's a guess in a costume.

Transcript
Dialogue — Riya

Field-level confidence. Unit detected. Locale flagged. Now I can reconcile.

Side monologue

Document intelligence is OCR + structure + confidence. Not just OCR.

3. TURNAROUNDPanel 3 of 3

Trust, but verify the confidence band.

Transcript
Dialogue — Riya

Auto-match what's clean. Review what's flagged. Stop arguing with raw text.

Side monologue

The audit asks: how did you decide? Field-level confidence is the answer.

The longer take

OCR — optical character recognition — has been a commodity capability for over two decades. Any modern vision model can extract the characters from an invoice PDF with high per-character accuracy. The trap is that high per-character accuracy is not the same as a reconcilable invoice.

Consider the three invoices in the panels: '100000', '10,00,00', '1.00.000'. Each is technically correctly OCR'd — those are the characters on the page. But each represents a different reconcilable value depending on the locale, format, and unit. '100000' with no separator could be one lakh (INR) or one hundred thousand (USD). '10,00,00' is the Indian comma format for one lakh. '1.00.000' is the European decimal format for the same amount. A reconciliation engine that treats all three as raw strings will fail to match them; an engine that parses them all as ₹1,00,000 will succeed, but only if the parsing is unit-aware and locale-aware.

The architectural difference between OCR and document intelligence is structure. Document intelligence parses the raw text into typed fields (amount, currency, date, vendor, line items), assigns a per-field confidence score, and flags edge cases (multi-currency, locale ambiguity, unusual formats) for human review. The output is not text; it is a structured invoice object that downstream reconciliation can join on amount and currency without re-doing the parsing.

The reason this matters operationally is that the volume of vendor invoices for any growing business is large and unstructured. A D2C founder uploading 200 vendor PDFs a month cannot manually validate the OCR for each one. The cost of building this manually — vendor-specific parsers for every supplier — exceeds the cost of just typing the data in. Document intelligence as a category exists because the alternative is either typing or accepting OCR errors silently into the books.

What good document intelligence looks like in practice: 80–90% of fields auto-extract with high enough confidence to skip review. The remaining 10–20% surface as a review queue with the specific field, the confidence score, and the reason (low confidence, locale ambiguity, unusual format). A human resolves the review queue in minutes per month, not hours per week. The audit trail records which fields were auto-accepted and which were reviewed, with the original PDF attached. The reconciliation downstream joins on the structured fields, not the raw text.

New episode every Monday.

Riya teaches reconciliation in three panels a week. Free, ungated, no newsletter sign-up gate.

See all episodes