LSH (Locality-Sensitive Hashing)

LSH

A family of hashing techniques designed so that similar inputs map to the same hash bucket with high probability — used to find approximate-match candidates in large datasets without comparing every pair.

Locality-Sensitive Hashing addresses the scaling problem in record linkage: comparing every pair of records in two datasets is O(n*m) and quickly becomes intractable above a few hundred thousand rows. LSH hashes records such that records likely to be similar collide in the same bucket, reducing comparison to within-bucket pairs only.

Several LSH families exist for different similarity measures: MinHash for Jaccard similarity over sets (useful for token-overlap matching of free-text fields), SimHash for cosine similarity over vectors (used in near-duplicate document detection), and random-projection LSH for Euclidean distance.

In reconciliation, LSH is typically used as a 'blocking' stage before more expensive scoring. ReconPe's ACRE engine implements LSH as an optional blocking layer that ships disabled by default; the production cascade relies on exact key, relaxed fuzzy, and token-set layers, with a keyless amount+date blocker when reference identifiers are unreliable.

Did your brands pay your schemes right?

Sub-ledger to GL, in seconds

Software does matching. People do judgement.

Marketplaces

Compare

Learn

LSH (Locality-Sensitive Hashing)

Related terms

Put this into practice