🚀 Launching Soon: Get 1 Month of Growth Tier Free by signing up for Early Access

Glossary

LSH (Locality-Sensitive Hashing)

LSH

A family of hashing techniques designed so that similar inputs map to the same hash bucket with high probability — used to find approximate-match candidates in large datasets without comparing every pair.

Locality-Sensitive Hashing addresses the scaling problem in record linkage: comparing every pair of records in two datasets is O(n*m) and quickly becomes intractable above a few hundred thousand rows. LSH hashes records such that records likely to be similar collide in the same bucket, reducing comparison to within-bucket pairs only.

Several LSH families exist for different similarity measures: MinHash for Jaccard similarity over sets (useful for token-overlap matching of free-text fields), SimHash for cosine similarity over vectors (used in near-duplicate document detection), and random-projection LSH for Euclidean distance.

In reconciliation, LSH is typically used as a 'blocking' stage before more expensive scoring. ReconPe's ACRE engine uses LSH as the third blocking layer (after exact key and fuzzy key) to catch matches where reference identifiers have been corrupted — typos, partial truncation, or reformatting — and the only remaining signal is approximate text similarity.

Put this into practice

See how ReconPe handles lsh (locality-sensitive hashing) on your real settlement data. Free tier, no card required.

Start reconciling