Subset-sum: matching ₹25 lakh to twelve invoices
The GL posts one ₹25,00,000 journal entry. The AR aging has twelve open invoices for that customer. Which subset matches? Humans give up at n=8. Riya walks through the deterministic answer.
Riya at her desk in a navy shirt, looking at a laptop with a two-pane view: left pane shows one GL journal entry for ₹25,00,000 from Customer X dated 31 March, right pane shows twelve AR invoices for the same customer with amounts ranging from ₹85,000 to ₹4,60,000.
“One JE. Twelve candidates. Where do you start?”
Transcript›
Which twelve invoices? Or eight? Or six and a credit note?
Many-to-one. The hardest cardinality. Spreadsheets give up here.
Riya pointing at the laptop. An inset shows a tree of candidate subsets being pruned by the search algorithm — branches whose partial sums exceed ₹25,00,000 plus tolerance are crossed out. An annotation reads: Target: ₹25,00,000 plus or minus ₹500 tolerance (rounding / payment-side adjustment).
“At n=8, the search space is 256 subsets. At n=20, it's a million.”
Transcript›
A subset-sum search. Sum the candidates. Keep the ones within ₹500 of the target. Drop the rest.
Currency + counterparty + date-window pruning kills most of the search space before it starts.
Riya with a slight smile. An inset shows the resolved match: eight of twelve candidate invoices summing to ₹25,00,500 against the ₹25 lakh target, residual ₹500 within tolerance, the other four invoices explicitly marked as carry-forward to the next cycle.
“The match, the residual, and what's carried forward.”
Transcript›
Eight invoices match the ₹25 lakh JE within tolerance. The other four are open AR for next cycle. Both halves of the answer.
You see the subset. You see the residual. You see what's still open. The auditor sees all three.
The longer take
Many-to-one reconciliation is the cardinality that breaks spreadsheets. One side has a single record — typically a GL journal entry, a bank credit, or a settlement payout. The other side has multiple records that should sum to it — typically AR invoices, AP bills, or settlement-line components. The question is: which subset on the many-side matches the single record on the one-side, within tolerance?
For small n this is tractable manually. At n=4 there are 16 possible subsets; an analyst can squint at the file and find the right combination by inspection. At n=8, the search space is 256. At n=12, it is 4,096. At n=20, it is over a million. The combinatorial explosion is the reason most spreadsheet-based reconciliations cap out around n=8 and beyond that resort to either ignoring the unresolved variances or posting a 'round-off adjustment' journal to force the books to tie.
The algorithmic solution is bounded subset-sum search. The naive version enumerates all 2^n subsets — infeasible at large n. The practical version prunes aggressively: by counterparty (only candidates for the same customer count), by date window (only candidates within the relevant period), by currency, by partial-sum bounds (any branch whose partial sum already exceeds the target plus tolerance is dropped). With these prunes, real-world reconciliation cases at n=20 to n=50 are computationally tractable in milliseconds on a laptop.
The configurable parameter that matters most is the residual tolerance. For a TDS-bearing customer payment, the residual is typically Section 194-Q withholding (0.1%) plus rounding — for a ₹25 lakh invoice, that's ₹2,500 of TDS plus up to ₹100 of rounding. For a GST-input scenario, the tolerance might absorb a small input credit adjustment. For an FX scenario, the tolerance has to cover the cross-currency conversion plus the bank's FX margin. Setting the tolerance correctly is the difference between auto-matching 90% of cases and reviewing each one by hand. Too tight, and real matches drop out as 'unmatched'; too loose, and false positives sneak in.
The reason this is worth getting right is that the alternative is silent leakage. The 'round-off adjustment' journal that gets posted each month to force the books to tie is the line that hides every unresolved aggregation. Over a year, those aggregations can sum to material amounts that show up only when an auditor traces them backwards. A subset-sum engine doesn't eliminate the variances — but it makes the unresolved ones visible as exceptions you can disposition consciously, rather than absorbed into a single line that no one looks at.