Python crawler introductory tutorial 59-100 python crawler advanced technology verification code 5-pole verification identification technology II

Picture comparison

Yesterday's blog has stored pictures locally. The first thing to do today is to compare two pictures and locate the gap between them.

Gap picture

Complete picture

Calculating notch coordinates

Comparing all RBG pixels of two pictures, we can get the x value of different pixels, that is, the distance to move.

    def get_distance(self,cut_image,full_image):

        # print(cut_image.size)
        threshold = 50
        for i in range(0,cut_image.size[0]):
            for j in range(0,cut_image.size[1]):
                pixel1 = cut_image.getpixel((i, j))
                pixel2 = full_image.getpixel((i, j))
                res_R = abs(pixel1[0] - pixel2[0])  # Calculate RGB difference
                res_G = abs(pixel1[1] - pixel2[1])  # Calculate RGB difference
                res_B = abs(pixel1[2] - pixel2[2])  # Calculate RGB difference

                if res_R > threshold and res_G > threshold and res_B > threshold:
                    return i  # Distance to be moved

Verification has a special algorithm for user behavior detection. Find an older article.

https://blog.csdn.net/ieternite/article/details/51483491

If we directly put the gap position calculated above into the previous script, you will find that even if the position moved correctly, the prompt is "the monster ate the pie chart" and the verification failed. Obviously, geetest recognizes that this action is not human behavior. So we need to see how the sliding of a natural mouse is different from the sliding of our code on the trajectory.

When the mouse drags the slider to move, it also follows human behavior. Here, you can refer to the article.

https://www.cnblogs.com/xiao-apple36/p/8878960.html

Moving slider

This part is consistent with our previous sliding validation code recognition, which is realized by selenium.

    # Moving slider
    def start_move(self, distance):
        element = self.driver.find_element_by_xpath('//div[@class="gt_slider_knob gt_show"]')


        # Use half of the slider for offset settings
        distance -= element.size.get('width') / 2
        distance += 15

        # Press the left mouse button
        ActionChains(self.driver).click_and_hold(element).perform()
        time.sleep(0.5)
        while distance > 0:
            if distance > 20:
                # If the distance is more than 20, let him move faster.
                span = random.randint(5, 8)
            else:
                # When the gap is near, move slowly.
                span = random.randint(2, 3)
            ActionChains(self.driver).move_by_offset(span, 0).perform()
            distance -= span
            time.sleep(random.randint(10, 50) / 100)

        ActionChains(self.driver).move_by_offset(distance, 1).perform()
        ActionChains(self.driver).release(on_element=element).perform()

Running effect, the first verification failed, wait about 7 seconds for the second verification, pay attention to the success.

The last thing to adjust is the validation failure, requiring repeated validation

Validation failed

Validation fails, you can continue to write under the drag, which belongs to the normal logic code.

       self.start_move(dis)

        # If an error occurs
        try:
            WebDriverWait(self.driver, 5).until(
                EC.element_to_be_clickable((By.XPATH, '//div[@class="gt_ajax_tip gt_error"]')))
            print("Validation failed")
            return
        except TimeoutException as e:
            pass

        # Determine whether the validation is successful
        try:
            WebDriverWait(self.driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//div[@class="gt_ajax_tip gt_success"]')))
        except TimeoutException:
            print("Revalidation....")
            time.sleep(5)
            # Recursive drag after failure
            self.analog_drag()
        else:
            print("Verify success")

Written in the back

So far, the pole validation has been written, and there are many areas in the code that need to be adjusted.

for example

element = self.driver.find_element_by_xpath('//div[@class="gt_slider_knob gt_show"]') 

The above way to get elements, it is easy to lead to the target elements are not captured, and then the project directly error exit, so it needs to be improved.

driver needs to be shut down in time, otherwise there will be a lot of chromedriver.exe processes in your task manager.

Verification code decoding basically follows sliding verification code, the core content is the processing of two pictures, I hope you can learn.

Sweep the code, pay attention to the public account of Wechat, reply 0321 to get the source code of authentication code

Tags: Python Selenium

Posted on Sun, 19 May 2019 03:00:10 -0700 by colbyg