Get some information under a tag of SegmentFault

Preface

At present, segment fault has not opened API. In 2017, Gaoyang dada said Planned Even though I didn't use SegmentFault

In PHP, I randomly wrote a code to obtain the information under a certain tag, which has no technical content and is used for timing acquisition

The purpose is to help solve problems if they can be solved, or to report if there is an advertisement

Get URL

It's mainly to get the dynamic tags, including technical Q & A and column articles

  1. Tag dynamic https://segmentfault.com/t/*
  2. Technical Q & a https://segmentfault.com/t/*/questions
  3. Column https://segmentfault.com/t/*/blogs

XPath node

It is also possible and relatively simple for PHP to use XPath to collect data

  1. Label dynamics

    • Title / / h4/a/text()
    • Link / / h4/a/@href
  2. Technical questions and answers

    If you are careful, you can bring a class attribute

    • Title / / h2[@class="title"]/a/text()
    • Link / / h2[@class="title"]/a/@href
  3. Column article

    • Title / / h2/a/text()
    • Link / / h2/a/@href

Effect screenshots

Use

You need to complete the relevant information in the code, URL, XPath node and json file path

You can choose to use @ Easy's PushBear For one to many push, you need to complete the key

crontab timing

crontab is a 30 minute task to get the information on the first page

*/30 * * * * php /www/wwwroot/tag.php >> /tmp/sf.log

PHP code

<?php
$url = ''; // URL to collect
$key = ''; // SendKey of PushBear
$title_xpath = ''; // XPath node for title
$url_xpath = ''; // Corresponding linked XPath node
$json_path = "/tmp/sf.json";

$html = file_get_contents($url);

$dom = new DOMDocument();
// Load HTML from a string
@$dom->loadHTML($html);
// Normalize the HTML
$dom->normalize();
// Load DOM with DOMXpath for query
$xpath = new DOMXPath($dom);

// Get the corresponding xpath data
$title_hrefs = $xpath->query($title_xpath); // Title

$data = [];
for ($i = 0; $i < $title_hrefs->length; $i++) {
    $href = $title_hrefs->item($i);
    $title = $href->nodeValue;
    $data[$i]['title'] = $title;
}

// Get the corresponding xpath data
$url_hrefs = $xpath->query($url_xpath); // link
for ($i = 0; $i < $url_hrefs->length; $i++) {
    $href = $url_hrefs->item($i);
    $url = $href->nodeValue;
    $data[$i]['url'] = 'https://segmentfault.com'.$url;
}

$json = json_encode($data);

// Judge whether the file exists
if (file_exists($json_path)) {
    // existence
    $old = file_get_contents($json_path);
    // Different files
    if ($old != $json) {
        // Replace write new file
        file_put_contents($json_path, $json);
        $oldInfo = json_decode($old, true);
        // Get the difference
        $data = getDiffArrayByTitle($data, $oldInfo);
    } else {
        // It's not the same
        echo date('Y-m-d H:i:s', time()). "Same content".PHP_EOL;
          return;
    }
} else {
    // No write file exists
    file_put_contents($json_path, $json);
}

$str = "";
foreach ($data as $key => $item) {
    $num = $key + 1;
    $str .= "{$num}. [{$item['title']}]({$item['url']}) \n\n";
}

// Push
if (!empty($key)) {
    echo sendByBear('***Label dynamics', $str);
}

function getDiffArrayByTitle($arr1, $arr2, $pk='title'){
    $res = [];
    foreach($arr2 as $item) $tmpArr[$item[$pk]] = $item;
    foreach($arr1 as $v) if(! isset($tmpArr[$v[$pk]])) $res[] = $v;
    return $res;
}

function sendByBear($text, $desp = '', $key = '')
{
    $postData = http_build_query(
        array(
            'text' => $text,
            'desp' => $desp,
            'sendkey' => $key
        )
    );

    $opts = array('http' =>
        array(
            'method'  => 'POST',
            'header'  => 'Content-type: application/x-www-form-urlencoded',
            'content' => $postData
        )
    );

    $context = stream_context_create($opts);

    $result = file_get_contents('https://pushbear.ftqq.com/sub', false, $context);

    return $result;
}

epilogue

Let go Github warehouse link If there is any infringement thought in this article, it can be deleted officially

Tags: PHP JSON crontab Attribute

Posted on Wed, 04 Dec 2019 08:12:52 -0800 by jcanker