python uses scratch to simulate the login of Douban with verification code

A few months ago, I touched a little bit of python crawler, but I forgot a lot. Now I'll practice again

# -*- coding: utf-8 -*-
import scrapy
import urllib
from PIL import Image

class DoubanLoginSpider(scrapy.Spider):
    name = 'douban_login'
    allowed_domains = ['']
#    start_urls = ['']
    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0"}

    def start_requests(self):
        //Rewrite start "requests" to request the login page
        return [scrapy.FormRequest("", headers=self.headers, meta={"cookiejar":1}, callback=self.parse_before_login)]

    def parse_before_login(self, response):
        //Fill in the login form and view the verification code
        print("Form filling before login")
        captcha_id = response.xpath('//input[@name="captcha-id"]/@value').extract_first()
        captcha_image_url = response.xpath('//img[@id="captcha_image"]/@src').extract_first()
        if captcha_image_url is None:
             print("No verification code at login")
             formdata = {
                                "source": "index_nav",
                                "form_email": "",
                                #Please fill in your password
                                "form_password": "*********",
                print("Login with verification code")
                save_image_path = r"D:\EclipseWorkspace\python\douban\douban\spiders\cpatcha.jpeg"
                #Download picture verification code to local
                urllib.request.urlretrieve(captcha_image_url, save_image_path)
                #Open the picture so that we can identify the verification code in the picture
                        im ="D:\EclipseWorkspace\python\douban\douban\spiders\cpatcha.jpeg")
                captcha_solution = input('Enter the verification code according to the opened picture:')
                formdata = {
                                "source": "None",
                                "redir": "",
                                "form_email": "",
                                #Please fill in the password here
                                "form_password": "**********",
                                "captcha-solution": captcha_solution,
                                "captcha-id": captcha_id,
                                "login": "Sign in",

        print("Logging in")
        #Submit Form 
        return scrapy.FormRequest.from_response(response, meta={"cookiejar":response.meta["cookiejar"]}, headers=self.headers, formdata=formdata, callback=self.parse_after_login)
    def parse_after_login(self, response):
        //Verify successful login
        account = response.xpath('//a[@class="bn-more"]/span/text()').extract_first()
        if account is None:
                print("Login failed")
                print(u"Login successful,The current account is %s" %account)
I'm also a little white. I may not understand how deep I can simply record my summary,

The domain name and header in front, needless to say, go to the start "requests" method. In this method, the returned content requests the landing page of,

Where to find the address of this landing page? Use Google browser to open Douban's landing page, press F12, find out the address of action submission, which is the landing page

Then take this header parameter, customize a dictionary to store cookie s, and call back to the next method,

First of all, we need to analyze whether there is a verification code on this page. Maybe you didn't log in the first two times, but many times later, the verification code will appear, so we need to consider this situation

You can fail multiple times and press F12 to find the address of the verification code,

Then distinguish the situation with or without verification code according to the situation. See the code for details

Tags: Python Windows Google

Posted on Mon, 20 Apr 2020 08:40:06 -0700 by anybody99