python3 implements 12306 to query the remaining tickets

1, Principle of querying the remaining tickets

The normal user's web browser usually enters 12306 official website to query the remaining tickets. You can click to query the starting station, terminal station and date. If you use Python to operate, there are two schemes. One is based on selenium 2 automatic framework control browser implementation, and the other is based on Python's own crawler package such as request, This paper implements the second scheme.

2, The realization of querying the remaining tickets

The browser queries the remaining tickets by visiting the following:
https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2017-09-30&leftTicketDTO.from_station=BJP&leftTicketDTO.to_station=SHH&purpose_codes=ADULT
In this format, the url shows that from the url, we can see that the train [date] is 2017-09-30, SHH is BJP, and the destination station is Shanghai. Therefore, python program only needs to package the above key information, and then encapsulate the url. When calling Python's own request, urllib and other packages, it can crawl the entire remaining ticket information, and finally parse the browser's web page output.

1. Encapsulate the remaining ticket query URL

#Query the link header of remaining tickets
query_url = "https://kyfw.12306.cn/otn/leftTicket/query?"
url=query_url+"leftTicketDTO.train_date="+train_date+"&leftTicketDTO.from_station="+from_station+"&leftTicketDTO.to_station="+to_station+"&purpose_codes=ADULT"  

2. Send the request and read the content of the web page

#Too many http connections are not closed, resulting in Max retries exceeded with url
#Increase the number of retries
requests.adapters.DEFAULT_RETRIES = 5
#requests uses the urlib3 library. The default http connection is keep alive. requests is set to False to close.
s = requests.session()
s.keep_alive = False
#Send a query request to get the remaining tickets page
r = requests.get(url,allow_redirects=True,verify=False,timeout=10)
if r.status_code==200:  
    # station_dict = r.json()['data']['map']
    traindatas = r.json()['data']['result']

3. Parse the web content to the specified trainInfo data structure

for data in traindatas:
    trainInfo = {}
    #Analyze the content of the web page and capture the remaining ticket information
    trainRowItem = re.compile('\|([^\|]*)').findall(data)
    trainInfo['train_no'] = trainRowItem[2]
    trainInfo['from_station_name'] = stationDictChineseMapAbbr [trainRowItem[3]]
    trainInfo['to_station_name'] = stationDictChineseMapAbbr [trainRowItem[4] ]    
    # trainInfo['from_station_name'] = trainRowItem[3]
    # trainInfo['to_station_name'] = trainRowItem[4]
    trainInfo['start_time'] = trainRowItem[7]
    trainInfo['arrive_time'] = trainRowItem[8]
    trainInfo['duration'] = trainRowItem[9]
    trainInfo['swz_num'] = trainRowItem[31]
    ...

4. Output the data structure content trainInfo of the remaining ticket to the terminal

#Format output
header = 'No. train No. departure station arrival station departure time arrival time business seat first class seat second class seat advanced soft sleeper moving sleeper hard sleeper no seat'.split()
pt = PrettyTable()
pt._set_field_names(header)
for i,trainInfo in enumerate(trainInfoList):
    pt.add_row([i,trainInfo['train_no'],trainInfo['from_station_name'],trainInfo['to_station_name'],trainInfo['start_time'],
    trainInfo['arrive_time'],trainInfo['duration'],trainInfo['swz_num'],
    trainInfo['zy_num'],trainInfo['ze_num'],trainInfo['gjrw_num'],trainInfo['rw_num'],trainInfo['dw_num'],
    trainInfo['yw_num'],trainInfo['yz_num'],trainInfo['wz_num']])
#Terminal output
print(pt)

5. Final effect display

Complete code link

Tags: Python JSON Selenium Session

Posted on Tue, 05 May 2020 05:38:20 -0700 by Dude0