Levenshtein distance

A string-similarity metric that counts the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.

Levenshtein distance, named for Vladimir Levenshtein who introduced it in 1965, measures the edit distance between two strings. The string 'kitten' and 'sitting' have a Levenshtein distance of 3: substitute k→s, substitute e→i, insert g.

In reconciliation, Levenshtein distance is the standard primitive for fuzzy matching of identifiers: order IDs that may have a missing leading zero, vendor names that may differ in spacing or casing, UTRs that may be truncated. A typical use is to compute distance between candidate field values and accept the match if the distance is below a threshold proportional to string length.

Naive Levenshtein is O(n*m) in time and space, but the standard two-row dynamic-programming optimisation reduces space to O(min(n,m)). For very large reference sets, Levenshtein is usually combined with LSH or a trie-based prefilter to avoid computing distance against every candidate.

Did your brands pay your schemes right?

Sub-ledger to GL, in seconds

Software does matching. People do judgement.

Marketplaces

Compare

Learn

Levenshtein distance

Related terms

Put this into practice