Use Python to crack the verification code, just a few minutes!

Before crawling some web page data, we need to get its verification code, and then splice the url to get the data. Then by looking at the form of the verification code, we can add some simple letters and numbers, and then add the dry verification code. Then Baidu on the Internet, Python can do some simple verification code cracking. The specific operations are as follows:

1: windows installs the corresponding package:

1. Install pilot: PIP install pilot

2. Install Tesseract OCR: Download address of installation: https://ask.hellobi.com/blog/tangyudi/ (reference link address)

3. Install pytesseract: pip install pytesseract

2: Environment configuration and modification of pytesseract.py file:

1. When installing the above package, its environment variable must be configured.

2. Modify pytestseract.py's:

              tesseract_cmd=r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

3: Install the appropriate package in centos:

Please follow this link to install: http://blog.csdn.net/diandianxiyu'geek/article/details/50522582 (quoted website)

Four codes are as follows:

 1 # -*- coding:utf-8 -*-
 2 import sys
 3 reload(sys)
 4 sys.setdefaultencoding('utf-8')
 5 import urllib
 6 from PIL import Image,ImageEnhance
 7 import pytesseract
 8 import requests
 9 import time
10 import re
11 from lxml import etree
12 from PublicCode import search_config
13 # Decryption verification code
14 #Python Learning exchange group 125240963 update information every day
15 t =int(round(time.time()*1000))
16 
17 def get_guid(t,second_header):
18 url='http://cri.gz.gov.cn/Search/NewGuid?t=%s'%t
19 result =requests.get(url,search_config.second_header)
20 return result.content
21 def get_image(guid):
22 url='http://cri.gz.gov.cn/Search/ValidateCode?t=1517210875615&guid=%s'%guid
23 
24 res = requests.get(url)
25 with open('1.jpg',"wb") as f:
26 f.write(res.content)
27 res = Image.open('1.jpg')
28 
29 return res
30 threshold = 150
31 table = []
32 for i in range(256):
33 if i < threshold:
34 table.append(0)
35 else:
36 table.append(1)
37 def getverify1(name):
38 im = Image.open(name)
39 imgry = im.convert('L')
40 imgry.save('g' + name)
41 out = imgry.point(table, '1')
42 out.save('b' + name)
43 string = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
44 im = Image.open('b1.jpg')
45 enhancer = ImageEnhance.Contrast(im)
46 im = enhancer.enhance(6)
47 text = pytesseract.image_to_string(im, config=string)
48 text = text.strip('')
49 text = text.upper();
50 
51 return text
52 def main():
53 guid = get_guid(t, search_config.second_header)
54 get_image(guid)
55 date = getverify1('1.jpg')
56 return guid,date

Tags: Python pip Windows CentOS

Posted on Fri, 31 Jan 2020 00:55:38 -0800 by toro04