🚀 Launching Soon: Get 1 Month of Growth Tier Free by signing up for Early Access

Blog
·8 min readmatchingreconciliationtechnicalgeo

Beyond VLOOKUP: why direct matching fails for commerce reconciliation in 2026

Direct matching on order_id works until it doesn't. Eight concrete failure modes in marketplace and gateway reconciliation — and what cascade probabilistic matching does instead.

Almost every serious reconciliation in the first month of a finance team's existence starts with VLOOKUP on order ID. It works. It keeps working until order volume reaches somewhere between 500 and 2,000 per month. And then, quietly, across a dozen small failure modes, it stops working. The failures are not obvious because VLOOKUP still returns values — it just returns the wrong ones, misses rows that should match, or matches rows that shouldn't. The cost of this degradation is paid in two places: hours per week spent on manual re-review, and revenue silently lost to variances that were never investigated because they were buried in a larger error rate.

The reason direct matching works initially is that in low-volume, low-complexity settings, the identifier space is clean. One order generates one settlement line which generates one bank credit. The order ID in the marketplace file, the reference string in the bank statement, and the order ID in the internal books are all the same characters. VLOOKUP is sufficient because the world is still simple. The transition to something genuinely harder happens gradually and across several dimensions at once, which is why finance teams often misdiagnose what's going wrong — they assume the tool broke, when actually the shape of the problem changed.

The first failure mode is split settlements. A single order on Amazon India can generate between two and five settlement lines across a week: the initial capture, a shipping-fee adjustment, a commission true-up, a return credit, and an A-to-Z claim disposition. Each is a valid row; each may have the same order ID or a derivative form (ORDER-123-R1, ORDER-123-A2). Direct match on order ID returns one row and silently drops the others. The second failure mode is aggregated settlements in the opposite direction: ten orders paid out as a single bank credit of a net amount, with no breakdown visible in the bank file. Matching the credit to a single order is mathematically impossible; matching it to the ten underlying orders requires backing into the aggregate sum minus MDR minus tax minus refunds, which is a solver problem, not a lookup problem.

The third mode is format drift. Amazon India's MTR has changed its settlement-identifier column name twice in 2025, and order-ID format has inconsistently included or dropped leading dashes across report versions. Flipkart exports periodically strip leading zeros; Meesho settlement payloads occasionally include a suffix ('/R' for returns) that breaks exact-match lookups. The fourth mode is adjustments and reversals: a current-cycle row 'reversal of ORDER-123' which cancels a row from three weeks ago. Matching the reversal to its original requires time-aware matching logic, not key matching. The fifth is delayed matches: an order captured on day one hits the settlement file on day five and the bank on day eight, meaning the three reconciliation artefacts never coexist in a single point-in-time snapshot unless the matching model treats them temporally.

The sixth failure mode is missing identifiers. COD remittances in Indian marketplace contexts frequently arrive as bank credits with no UTR, no order reference in the description, and a batched amount covering dozens of unrelated orders. VLOOKUP cannot match what cannot be keyed. The seventh is truncated-reference collisions. Bank statement memo fields often truncate UTRs to 12 or 16 characters; two distinct transactions with reference numbers that share a 12-character prefix become indistinguishable to any matcher operating on the truncated form. The eighth is timezone and date-format skew: an IST settlement timestamp in the MTR, a UTC bank credit timestamp, and internally-maintained books at local time produce three representations of 'the same day' that disagree by up to 18 hours at the cycle boundaries.

A common response to these failures is to loosen the match: from exact string comparison to fuzzy string matching — Levenshtein, Jaccard, or similar. Fuzzy matching helps with the narrow class of failures caused by minor character-level variation: a missing dash, a stripped leading zero, an extra space. It does not help with split settlements (multiple legitimate matches, not one fuzzy match), aggregated settlements (the matching target is an algebraic composition, not a string), or truncated-reference collisions (two equally-fuzzy matches are equally legitimate). Fuzzy matching is a band-aid on a shallow subset of the real problem. Teams that try to solve cross-cycle matching by turning up fuzzy thresholds end up with a different problem: false-positive matches that cost more to clean up than the false negatives they replaced.

The architectural step that does address these failures is probabilistic matching, most often implemented as some variant of the Fellegi-Sunter model. Each candidate pair of records is scored against multiple fields — identifier, amount, date, counterparty, reference — with each field contributing a log-likelihood weight based on the observed distribution of agreement versus disagreement in similar pairs. The output is a calibrated probability that the two records are the same transaction. High-probability pairs are accepted; low-probability pairs are rejected; pairs in a middle band surface for human review with a field-level rationale. The key properties are that matches are rationalisable (the score decomposes into per-field evidence), tunable (weights adapt over time to the profile of the organisation's data), and auditable (every decision is reproducible given the same inputs and weight vector).

Probabilistic matching alone, applied naively to every pair of records in two datasets, is computationally infeasible for meaningful volume. The usual solution is cascade matching — the architecture that the ACRE engine inside ReconPe implements, and that related designs in Ledge, Modern Treasury, and parts of the academic record-linkage literature share. Stage one is an exact key match on canonicalised identifiers, resolving the 50–70% of pairs that are cleanly matchable. Stage two is blocking — coarse grouping of candidates on one or more blocking keys (for example, amount within ₹1 and date within 3 days) to prune the candidate space. Stage three is fuzzy comparison within blocks. Stage four is locality-sensitive hashing for cases where blocking alone is too coarse. Stage five is Fellegi-Sunter probabilistic scoring across remaining candidate pairs. Stage six is optimal assignment via Hungarian algorithm to resolve N-to-M conflicts. The cascade converts an O(N²) problem into something that runs on a laptop for most Indian-merchant workloads.

The practical signal that a finance team has outgrown direct matching is usually one of four symptoms. The first is review time: if a single reconciliation cycle takes more than about four hours of analyst time for a team doing under 10,000 monthly orders, the tool is either wrong or the data is wrong, and the former is much more common. The second is exception-rate drift: if exceptions as a percentage of rows has been creeping up over successive cycles, the tool is failing to adapt to changes in the data — probabilistic matchers with adaptive weights correct for this automatically; rule-based matchers do not. The third is variance leakage: if the periodic month-end 'adjustment' journal to force books to match reality has been growing, the tool is silently dropping matches and the team is compensating manually. The fourth is audit friction: external auditors asking for field-level rationale for match decisions and the tool producing only a binary matched/unmatched flag.

Direct matching is not wrong. It is a local optimum that most finance teams outgrow at a specific, identifiable scale, and continuing to apply it past that scale produces operational cost that compounds silently. The alternative — probabilistic matching combined with cascade architecture — is not new (Fellegi-Sunter was published in 1969) and not proprietary to any vendor. What is new, in the 2024–2026 wave of reconciliation platforms, is that this architecture has finally become practical for SMB and mid-market buyers rather than being reserved for Fortune 500 implementations with six-figure deployment budgets. ReconPe is one of several platforms in this wave; the category move is what matters. The decision finance teams face is less about which vendor and more about whether to recognise that the matching problem has moved beyond what VLOOKUP can solve, and to pick a tool — any tool — that has the right shape for the problem.

Stop reviewing reconciliation one cell at a time

Try ReconPe on your next settlement. Free tier, no card required.

Start reconciling