From rule engines to probabilistic matching: a brief history of reconciliation software
Reconciliation tooling has evolved through three distinct generations — fixed-format batch matchers, rule-based platforms, and probabilistic engines. Understanding the lineage explains why so many tools feel stuck.
Reconciliation software predates most of what we'd recognise as enterprise software. By the late 1970s, large banks were running batch programs on mainframes to reconcile nostro accounts against MT940 statements arriving via the SWIFT network — formats and processes that, in their essential structure, have not changed in fifty years. Understanding how the category evolved from there is useful because each generation's assumptions are baked into tools still on the market today.
The first generation was fixed-format batch reconciliation. A mainframe COBOL program ingested two files in known schemas, compared them on hard-coded keys, and produced a reconciliation report in a third fixed format. The model assumed both inputs were deterministic, that the keys never changed, and that any discrepancy was a true exception requiring human investigation. This model is genuinely good at what it was designed for — reconciling MT940 statements where the data is structured and the volume is bounded. It is also brittle in exactly the ways the modern data environment exposes: any schema drift breaks ingestion, any new file format requires a new program, and any matching beyond exact key requires custom logic.
The second generation, broadly the 1990s through the 2010s, was rule-based reconciliation platforms. The shift was that the matching logic moved out of code and into configuration. A user (typically still a developer or a power-user accountant) could define matching rules through a UI: 'match on invoice number, then on amount within 1 percent, then on date within 3 days.' Tolerance handling, multi-pass matching, and exception categorisation became configurable rather than hard-coded. This generation produced the platforms that still dominate the enterprise reconciliation market — Blackline, FloQast, Trintech, and the reconciliation modules in SAP and Oracle.
Rule-based platforms solved the configurability problem but inherited a deeper limitation. They are still binary: a given pair either matches under the configured rules or it doesn't. There is no concept of 'this pair is 87 percent likely to be a match given the field-level evidence', which means there is no principled way to auto-approve high-confidence matches and only surface the genuinely uncertain ones. The result, in practice, is exception queues that are far larger than they need to be — the rule engine flags everything that doesn't match the rule, including cases that any human reviewer would close in two seconds because the answer is obvious from the available data.
The intellectual ancestor of the third generation is older than any of this software. Fellegi and Sunter published the formal theory of probabilistic record linkage in 1969, in the context of merging census records. The theory treats matching as a statistical inference problem: given two records and the agreement profile across their fields, what is the posterior probability that they are the same underlying entity? The framework gives you a principled way to combine evidence across fields, weight more reliable fields more heavily, and produce a calibrated confidence score rather than a binary verdict.
Probabilistic matching took decades to migrate from academic record linkage into operational reconciliation. Several things had to happen first: cheap general-purpose compute, scalable approximate-similarity search (LSH and its successors), and — perhaps most importantly — finance teams reaching the volume scale where deterministic rule engines stopped being humanly viable. By the late 2010s these conditions were broadly met, and a third generation of reconciliation tools started appearing. Their distinguishing property is that exceptions are graded, not flagged: every candidate pair gets a confidence score, the threshold for human review is tunable, and the system gets more accurate over time as users confirm or reject suggestions.
Layered on top of this in the early 2020s came large language models. Their right role in reconciliation, in our view, is downstream of the matching engine — explaining why an exception was flagged, suggesting plausible resolutions based on similar past cases, narrating risk in plain language for non-technical stakeholders. Using an LLM as the matching engine itself is a category error: matching needs to be deterministic and auditable, not generated by an inference call that may produce different output for the same input. The right architecture is deterministic matching producing structured exceptions, with an AI layer explaining and resolving them.
This history matters because it explains why the category feels uneven. Many tools sold as 'AI reconciliation' in 2026 are second-generation rule engines with an LLM bolted onto the chat UI. The matching is still binary; the AI just answers questions about the binary output. Other tools are third-generation in matching but have weak workflow and multi-tenancy, designed for academic record-linkage rather than operational finance. The product space that's actually scarce is third-generation matching, audit-grade workflow, and downstream LLM reasoning, integrated into a single platform.
That's the bet ReconPe is built on. The matching engine is deterministic, probabilistic, and auditable; the workflow handles multi-step approval, role-based permissions, and aged-exception escalation; the AI layer explains, suggests, and narrates without ever being on the matching path. Each layer is good at what it's good at, and none is asked to do work outside its designed role. The category will keep evolving, but the structural separation between match, manage, and reason is, we think, the right factoring — and it explains why each of the three generations of tools persists in market, each appropriate for a different cohort of customer.