Blocking (record linkage)
A technique in record linkage in which records are partitioned into smaller subsets (blocks) such that only records within the same block are compared — making the matching problem computationally tractable on large datasets.
Naive record linkage compares every record in dataset A against every record in dataset B, which is quadratic and impractical at scale. Blocking is the structural fix: choose one or more 'blocking keys' such that records in the same block are plausibly matchable, and skip cross-block comparison entirely.
Block-key choice is the central design decision. A strict key (full date + amount) creates many small blocks with high precision but misses candidate matches where the key has any error. A relaxed key (date prefix + amount rounded to nearest hundred) creates fewer larger blocks with higher recall but more comparisons per block. Multi-pass blocking — running several blocking strategies and unioning the candidate sets — is a common compromise.
Modern reconciliation engines cascade blocking: an exact key pass first, a relaxed key pass second, an LSH pass third. Each pass catches candidates the previous missed; the union is then scored with Fellegi-Sunter or another probabilistic framework. This is the structure of ReconPe's ACRE blocking stage.