The fuzzystrmatch
extension in PostgreSQL provides functions for fuzzy string matching. It is useful for approximate string comparisons, spell-checking, and searching similar words in a database.
Your Nile database arrives with fuzzystrmatch
extension already enabled.
Available Functions
The fuzzystrmatch
extension provides several functions for different types of string matching algorithms:
Soundex
The soundex()
function returns a four-character Soundex code based on how a word sounds:
SELECT soundex('example');
Difference
The difference()
function compares two Soundex codes and returns a similarity score from 0 to 4 (higher means more similar):
SELECT difference('example', 'exampel');
Levenshtein Distance
The levenshtein()
function computes the edit distance (number of single-character edits required to transform one string into another):
SELECT levenshtein('example', 'exampel');
Levenshtein Distance with Custom Costs
You can specify different costs for insertions, deletions, and substitutions:
SELECT levenshtein('example', 'exampel', 1, 2, 1);
The metaphone()
function returns a phonetic representation of a word, useful for English-language fuzzy searches:
SELECT metaphone('example', 10);
Example: Finding Similar Names
If you have a table with names and want to find names similar to a given input:
CREATE TABLE contacts (
tenant_id UUID,
name TEXT,
email EMAILADDR,
PRIMARY KEY (tenant_id, email)
);
-- Create a tenant first
INSERT INTO tenants (id, name) VALUES
('11111111-1111-1111-1111-111111111111', 'Example Corp');
INSERT INTO contacts (tenant_id, name, email) VALUES
('11111111-1111-1111-1111-111111111111', 'John', 'john@example.com'),
('11111111-1111-1111-1111-111111111111', 'Jon', 'jon@example.com'),
('11111111-1111-1111-1111-111111111111', 'Johnny', 'johnny@example.com'),
('11111111-1111-1111-1111-111111111111', 'Jonathan', 'jonathan@example.com');
SELECT name FROM contacts WHERE difference(name, 'Jon') > 2;
Use Cases
- Finding similar names in a customer database.
- Detecting typos in text input.
- Enhancing search functionality with approximate string matching.
Limitations
- Soundex is optimized for English and may not work well for other languages.
- Levenshtein distance can be computationally expensive for large datasets.
- Phonetic matching may not always align perfectly with intended pronunciations.
Conclusion
The fuzzystrmatch
extension provides multiple methods for fuzzy string matching, making it a valuable tool for approximate searches and typo detection in PostgreSQL databases.
For more details, refer to the PostgreSQL documentation.
Responses are generated using AI and may contain mistakes.