PHP extended trie tree, swoole filtering sensitive words scheme

In some app s, comments on the web and some articles, we will see some * and so on. In addition to the specific non display, we will process some sensitive characters entered by users, specifically display * or other characters according to the business area.

Here is a brief introduction to business processing.

Original address: little time personal blog > http://small.aiweimeng.top/index.php/archives/18.html

php extension installation instructions:

1. Install the php extension trie tree, installation tutorial http://blog.41ms.com/post/39.html
 2. Install the swoole extension, installation tutorial http://www.swoole.com/

**Code Description:**

1.reload_dict.php, which provides the process of automatically updating the dictionary library to the trie tree file

/**
* Thesaurus maintenance update
* Date: 2018/11/7
* Time: 9:42
*/

// Set memory
ini_set('memory_limit','128M');

// Read sensitive word dictionary
$handle = fopen('dict.txt','r');

// Generate empty trie tree filter
$resTrie = trie_filter_new();

while (! feof($handle))
{
$item = trim(fgets($handle));

if(empty($item))
{
continue;
}

// Add sensitive words to trie tree one by one
trie_filter_store($resTrie, $item);
}

// Generate trie tree file
$blackword_tree = 'blackword.tree';

trie_filter_save($resTrie, $blackword_tree);

2. trie tree object acquisition tool class

FilterHelper.php, which provides access to trie tree objects, avoids repeated generation of trie tree objects and ensures synchronous updating of tree files and sensitive thesaurus

/**
* Filter assistant
* getResTrie Provide trie tree object
* getFilterWords Extract the filtered string
* Date: 2018/11/7
* Time: 9:49
*/
class FilterHelper
{

// Trie tree object
private static $_resTrie = null;

// Update time of dictionary tree
private static $_mtime = null;

 

/**
* Mode initialization
*/
public function __construct(){}


/**
* Prevent cloning objects
*/
public function __clone(){}

 

/**
* Provide trie tree object
*
* @param string $tree_file Dictionary file tree path
* @param string $new_time Update time of dictionary tree when called at present
* @return null
*/
static public function getRecTrie($tree_file, $new_time)
{
if(is_null(self::$_mtime))
{
self::$_mtime = $new_time;
}

if(($new_time != self::$_mtime) || is_null(self::$_resTrie))
{
self::$_resTrie = trie_filter_load($tree_file);
self::$_mtime = $new_time;

// Output dictionary file overload time
echo date('Y-m-d H:i:s') . "\tdictionary reload success!\n";
}

return self::$_resTrie;

}

 

/**
* Extract the filtered sensitive words from the source string
*
* @param string $str Source string
* @param array $res 1-3 Indicates 3 characters from position 1
* @return array
*/
static public function getFilterWords($str, $res)
{
$result = array();
foreach ($res as $k => $v)
{
$word = substr($str, $v[0], $v[1]);

if (!in_array($word, $result))
{
$result[] = $word;
}
}

return $result;
}


}
```


3,External filtering HTTP Access interface

filter.php,Use swool,External submission filter interface access

```php
/**
* Provide external filtering HTTP access interface
* Date: 2018/11/7
* Time: 9:59
*/


// Set the maximum running memory of the script, and adjust it according to the size of the dictionary
ini_set('memory_limit', '512M');

// Set time zone
date_default_timezone_set('PRC');

// Load assistant file
require_once('FilterHelper.php');

// ip and port of http service binding
$serv = new \swoole_http_server("127.0.0.1", 9502);


/**
* Processing request
*/
$serv->on('Request', function($request, $response) {

// Receive get request parameters
$content = isset($request->get['content']) ? $request->get['content']: '';

$result = '';

if (!empty($content)) {

// The path of the dictionary tree file. By default, it is under the current directory
$tree_file = 'blackword.tree';

// Clear file state cache
clearstatcache();

// The modification time of the dictionary tree file when getting the request
$new_mtime = filemtime($tree_file);

// Get the latest trie tree object
$resTrie = FilterHelper::getResTrie($tree_file, $new_mtime);

// Execution filtering
$arrRet = trie_filter_search_all($resTrie, $content);

// Extract the sensitive words filtered out
$a_data = FilterHelper::getFilterWords($content, $arrRet);

$result = json_encode($a_data);
}

// Define http service information and response processing results
$response->cookie("User", "W.Y.P");
$response->header("X-Server", "W.Y.P WebServer(Unix) (Red-Hat/Linux)");
$response->header('Content-Type', 'Content-Type: text/html; charset=utf-8');
$response->end($result);
});

$serv->start();

Tags: PHP Unix Linux

Posted on Sun, 01 Dec 2019 17:12:01 -0800 by Prismatic