Pthon Batch Query Douban Book Score (source code attached to the tutorial)

stay Lazy Disk Shared high-score e-books are generated using python batch queries
The regular bean-petal api is not allowed to be invoked, several searches reveal an interface

https://book.douban.com/j/subject_suggest?q=book name

Use this interface to get the url of the book on the bean petal

Functions to get a single book url

def get_book(title):
    url = "https://book.douban.com/j/subject_suggest?q=%s"%title
    rsp = requests.get(url,headers=get_headers())
    rs_dict = json.loads(rsp.text)
    url_ = rs_dict[0]['url']
    print(url_)
    return get_detail(url_)
get_book("The Dream of Red Mansion")

Running this function gives you a url

https://book.douban.com/subject/1007305/

Here you can see the details page of the doughnut

Next crawl the scores on the details page

def get_detail(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}
# url = "https://book.douban.com/subject/34869428/"
    web_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(web_data.text,'lxml')
    rank = soup.select('#interest_sectl > div > div.rating_self.clearfix > strong')[0].get_text().strip()
    print(rank)
return rank
get_detail("https://book.douban.com/subject/34869428/")

Run the above function to output a score: 9.6
Complete Code

#-*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import requests,time,json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}
def get_detail(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}
# url = "https://book.douban.com/subject/34869428/"
    web_data = requests.get(url,headers=headers)
    soup = BeautifulSoup(web_data.text,'lxml')
    rank = soup.select('#interest_sectl > div > div.rating_self.clearfix > strong')[0].get_text().strip()
# print(rank)
return rank
def get_book(title):
    url = "https://book.douban.com/j/subject_suggest?q=%s"%title
    rsp = requests.get(url,headers=headers)
    rs_dict = json.loads(rsp.text)
# print(rs_dict)
    url_ = rs_dict[0]['url']
# print(url_)
return url_,get_detail(url_)

if __name__ == '__main__':
    book_list=["The Dream of Red Mansion","Romance of the Three Kingdoms","Water Margin","Journey to the West"]
for i in book_list:
        url,rank = get_book(i)
        print(i,rank,url)

Run the above code directly to get:

Dream of Red Mansions 9.6 https://book.douban.com/subject/1007305/
Three Kingdoms Release 9.2 https://book.douban.com/subject/1019568/
Water Margin 8.6 https://book.douban.com/subject/1008357/
Journey to the West 8.9 https://book.douban.com/subject/1029553/

For your understanding, take the four masterpieces as examples in the form of lists. Large-scale crawling is recommended in the form of databases, plus delay and anti-sealing, as well as proxy, cookie, multi-threaded crawling, etc. This is not expanded.

The disadvantage of using https://book.douban.com/j/subject_suggest?q=book name is that part of the data can't be found. The solution is to use selenium to simulate manual query crawling (selenium is a versatile artifact, the disadvantage is slower), but the interface can already satisfy most book queries, and then use the score to do a filter so that the above articles are selected for everyone.Out of books with high ratings.Now that you've selected 20 copies, you don't need selenium to climb any more.
Attach the public number Lazy Find Resources QR code

Published an original article. Praise 0. Visits 17
Private letter follow

Tags: Windows JSON Selenium Python

Posted on Mon, 03 Feb 2020 18:21:05 -0800 by DuNuNuBatman