String similarity processing function

oracle queries, for example, the jobs related to table SALES in stored procedures
SELECT * FROM (SELECT a.name,upper(b.what)AS what,SYS.UTL_MATCH.edit_distance_similarity
(a.name,upper(b.what)) AS similarity FROM dba_source a,dba_jobs b
WHERE  SYS.UTL_MATCH.edit_distance_similarity(a.name,upper(b.what))>80
AND upper(a.text)  LIKE '%SALES%'
AND b.what NOT LIKE '%dbms_refresh%')
ORDER BY 3 DESC;

It has its own similarity function sys. UTL? Match. Edit? Distance? Similarity, which can be used directly.

In mysql8, I found only one search related document Document address It's not suitable for my own use. I found a user-defined function that can be used to calculate the string similarity.

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `COMPARE_STRING`( s1 text, s2 text) RETURNS int(11)
    DETERMINISTIC
BEGIN 
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; 
    DECLARE s1_char CHAR; 
    DECLARE cv0, cv1 text; 
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; 
    IF s1 = s2 THEN 
      RETURN 0;     
    ELSEIF s1_len = 0 THEN 
      RETURN s2_len; 
    ELSEIF s2_len = 0 THEN 
      RETURN s1_len; 
    ELSE 
      WHILE j <= s2_len DO 
        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; 
      END WHILE; 
      WHILE i <= s1_len DO 
        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; 
        WHILE j <= s2_len DO 
          SET c = c + 1; 
          IF s1_char = SUBSTRING(s2, j, 1) THEN  
            SET cost = 0; ELSE SET cost = 1; 
          END IF; 
          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; 
          IF c > c_temp THEN SET c = c_temp; END IF; 
            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; 
            IF c > c_temp THEN  
              SET c = c_temp;  
            END IF; 
            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; 
        END WHILE; 
        SET cv1 = cv0, i = i + 1; 
      END WHILE; 
    END IF; 
    RETURN c; 
  END$$
DELIMITER ;

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `SIMILARITY_STRING`(a text, b text) RETURNS double
BEGIN
RETURN ABS(((COMPARE_STRING(a, b) / length(b)) * 100) - 100);
END$$
DELIMITER ;

It's very useful to try it. Some logic can be modified properly by yourself.

In gp, I have been looking for a long time and found a function that looks high-end. This function needs to be installed:

[gpadmin@SZWPLDB1085 ~]$ psql -d postgres -f $GPHOME/share/postgresql/contrib/fuzzystrmatch.sql
SET
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION

Then there will be some more functions in the postgres Library:

Usage in Document address

Tags: Oracle PostgreSQL SQL

Posted on Mon, 04 May 2020 16:33:26 -0700 by tartou2