🚀 Launching Soon: Get 1 Month of Growth Tier Free by signing up for Early Access

Blog
·8 min readfinanceopsar-to-glmatchingclose

The ₹25 lakh problem: why your GL never matches your AR file row-for-row

ByAmit Mishra·Founder, ReconPe

AR-to-GL tie-out fails on most close days because the cardinality is wrong. Sub-ledgers carry many invoice rows; the GL carries one consolidated entry. Subset-sum matching is the engineering answer — bounded, audit-grade, and surprisingly hard to do correctly.

Every controller has had this conversation: a single ₹25 lakh GL journal entry sits there, the corresponding AR aging file has 248 invoice rows for the period, and somewhere inside that 248-row file is the subset of twelve invoices whose amounts sum to exactly ₹25 lakh. The accountant who posted the JE knew which twelve. They aren't here this week. The audit reviewer is asking why the sub-ledger doesn't tie out. The reconciliation tool reports the GL row as MISSING_SOURCE and twelve of the AR rows as MISSING_TARGET, and you spend the morning recreating which twelve, by hand, with a calculator and a coffee.

This is the AR-to-GL cardinality problem, and it is the single most common reason finance teams report month-end close as 'still hard' even with reconciliation software deployed. The standard 1-to-1 matching engine — the only kind most reconciliation tools ship — assumes that for every source row there is exactly one target row, with optional tolerance for value drift. That assumption is correct for transaction-level reconciliation (one Razorpay payment matches one bank credit) and wrong for sub-ledger to GL tie-out, where the GL deliberately summarises many sub-ledger rows into a single posting. The mismatch isn't a bug in the GL or the sub-ledger; it's the right shape for both. The reconciliation tool is the one that has the wrong model.

The engineering answer is many-to-1 matching, sometimes called subset-sum matching: given an unmatched target row with amount T and an unmatched source pool with N candidate rows, find the subset whose amounts sum to T within tolerance. Stated this way it sounds straightforward. The reason it isn't shipped in most reconciliation products is that the naive implementation — try all 2^N subsets — is intractable for N greater than about 25, and the corner cases compound quickly. Currency partitioning. Counterparty partitioning. Date-window pruning. Tolerance handling that absorbs TDS withholding without absorbing genuine breaks. A subset-sum candidate generator that runs against a 5,000-row AR aging without taking forty minutes per target row.

The pruning is the load-bearing engineering. Without it, a ten-thousand row pool produces a search space larger than the number of atoms in a small bacterium. The four pruning axes that make subset-sum tractable in practice are: currency (no cross-currency sums), counterparty (only sum rows sharing a counterparty fingerprint), date window (sources must fall within a configurable window of the target's posting date — typically 90 days back, 7 days forward), and overshoot (any partial subset whose sum already exceeds target plus tolerance gets pruned, with the source rows sorted descending by absolute amount so overshoot kicks in early in the search). With these four prunings, a depth-first search on a typical AR-to-GL dataset completes in milliseconds per target, not minutes.

Tolerance is the second non-obvious problem. The naive answer — match within 0.5% relative tolerance — fails on small invoices, where 0.5% of ₹2,000 is ₹10, smaller than the typical TDS rounding residual. The naive answer fails on large invoices the other way, where 0.5% of ₹2 crore is ₹1 lakh, large enough to absorb a real break. The right answer is the maximum of relative tolerance and an absolute floor — typically 0.5% or ₹100, whichever is larger. This handles small invoices correctly (₹100 absolute tolerance accommodates rounding) and large invoices correctly (0.5% of ₹2 crore is ₹1 lakh, which is the appropriate band for a transaction of that size). Configurable per rule for tighter tolerance on high-value entities.

Once the subset is found, the exception model matters. A 1-to-1 matcher that finds nothing emits one MISSING_SOURCE for the GL row and twelve MISSING_TARGETs for the AR rows — thirteen exceptions for one human decision. A subset-sum matcher that finds the group should emit one AGGREGATION_CANDIDATE exception ('these twelve AR rows sum to this GL row, residual ₹0, please confirm') and suppress the thirteen 1-to-1 exceptions, because they are not really exceptions — they are the components of an aggregation that has already been solved. The reduction from thirteen exceptions to one is the user-visible value. ReconPe's wedge demo dataset — twenty-five AR rows summing to one ₹25 lakh GL JE — drops the exception count from twenty-six to fourteen, and the remaining thirteen are real noise rows that genuinely have no match.

There is a category of objection that sounds reasonable and is wrong. 'But the AR file should have a journal-entry-batch-id column linking each invoice to the GL row it sums into.' In a perfectly designed accounting system, yes. In practice — across Tally, Zoho Books, NetSuite, SAP, custom-built ledgers, and the Excel exports finance teams actually work with — the batch reference is missing or unreliable about half the time. Subset-sum is the matching algorithm for the AR file you actually have, not the AR file you wish you had. When the batch reference is reliable, the engine uses it (it becomes a counterparty-equivalent partition that further bounds the search). When it isn't, the engine still finds the subset by amount.

The uncomfortable truth about most reconciliation products is that the rules-based heritage of the category baked in 1-to-1 cardinality as an architectural assumption decades ago, and the assumption has propagated forward through every generation of tooling. Modern products are increasingly probabilistic on the matching axis (Bayesian scoring, embedding similarity, learned weights) but still cardinality-restricted on the pairing axis. Subset-sum matching is the pairing-axis upgrade that the category needs and most vendors haven't shipped. ReconPe's ACRE engine ships it as Stage 1.5 — running between the standard exact + fuzzy + LSH matching pass and the Bayesian scorer, on the unmatched pool, behind a feature flag so existing customer reconciliations are unaffected by default. For sub-ledger to GL tie-out, it converts the question 'why don't these tie out' from a half-day investigation into one click of confirm.

Stop reviewing reconciliation one cell at a time

Try ReconPe on your next settlement. Free tier, no card required.

Start reconciling