node grabs and downloads photos of little sister in batch

background

I woke up in the phone bombing of Cuihua on Saturday morning

It's a silly question to ask in the morning

I thought I'd hurry to get rid of it

Of course, I will help those who are willing to help!

I can't help frowning at the red exclamation mark here

When I open the website, I can't see the colorful gif picture I expected, which almost blinds my titanium alloy dog's eyes

After a few visits and appreciation, I finished my work.

Text begins

Modules used

  • http: create service , process flow correlation
  • fs: operation files and folders (read and write)
  • cheerio: simply and roughly understood as juquay in the node world

Start with the whole page

In order to send out the normal link provided by Cuihua to the webmaster's home to demonstrate

// Introduce required modules
var http = require('http');
var cheerio = require('cheerio');
var fs = require('fs');
// Define crawling target station
var Url = 'http://sc.chinaz.com/tupian/'
http.get(Url, function (res) {
  var htmlDate = '';
  // Get page data
  res.on('data', function (chunk) {
    htmlDate += chunk;
  });
  // End of data acquisition
  res.on('end', function () {
    // Filter out the required elements
    filterContent(htmlDate);
  });
}).on('error', function () {
  console.log('Error getting data!');
});

filter

Analyze the page structure to see which pictures are needed to get this node in content

Traverse. box and get src and alt of a > img

// Filter page information
function filterContent(htmlDate) {
  if (htmlDate) {
    var $ = cheerio.load(htmlDate);
    // Get what you need
    var Content = $('#container');
    // Store the information you will catch
    var ContentData = [];
    Content.find('.box').each(function (item, b) {
      var pic = $(this);
      // Why is SRC 2? SRC can't get the print. I found SRC 2
      var src = formatUrl(pic.find('a').children('img').attr('src2'));
      var name = formatUrl(pic.find('a').children('img').attr('alt'));
      // Give the information to the download function to download
      download(src, name) 
      // Here's a copy, too
      ContentData.push({
        src,
        name
      })
    });
    // Stored the captured picture information
    console.log(ContentData)
  } else {
    console.log('html null');
  }
}

Grab links all have "is thumbnail" and need a method to help transform

// Or take HD link
function formatUrl(imgUrl) {
  return imgUrl.replace('_s', '')
}
// Picture download function
function download(url, name) {
  http.get(url, function (res) {
    let imgData = '';
    //Format picture encoding
    res.setEncoding("binary");
    //Detect requested data
    res.on('data', (chunk) => {
      imgData += chunk;
    })
    res.on('end', () => {
        // If there is no folder, create it to prevent error
      if (!fs.existsSync('./images')) {
        fs.mkdirSync('./images');
      };
      fs.writeFile(`./images/${name}.jpg`, imgData, 'binary', (error) => {
        if (error) {
          console.log(error);
        } else {
          console.log(`${name}----Download succeeded!`)
        }
      })
    })
  })
}

Results display

Finally, the labor income

Tags: node.js encoding

Posted on Sat, 09 Nov 2019 08:33:43 -0800 by daftdog