Python Automation Crawling of Urban Distance [Small Project Required]


Create a database locally, store Excel data in city table, and record all prefecture-level cities and counties, county-level cities and districts of | Hubei Province as sample table data.Using python's xlrd package, define process_data package to store operation excel data, generate classes of sql statements, define op_postgresql package to store database operation objects, and define various methods

PS:: Many other people tend to give up when they are learning Python because they don't have good tutorials or no one to guide them. For this reason, I built a Python full-stack development communication.Skirt: After a long period of fighting and thinking (digital homophonic) can be found, which contains the latest Python tutorial items to take, problems that you don't understand can be solved by old drivers, and they can work with each other.Supervise common progress

 

Urban Distance Crawling-Task Planning

Create a database locally, store excel data in city table, and record all prefecture-level cities and counties, county-level cities and districts of | Hubei Province as sample table data.

 

Create a database locally, store excel data in city table, and record all prefecture-level cities and counties, county-level cities and districts of | Hubei Province as sample table data.Prepare to create the public/config.py extension package, then use python's xlrd package to define the process_data package to store the operation excel data and generate classes for sql statements.
Define the op_postgresql package to store the database's operation objects and define various methods
Create a crwler package to store the crawler's operation object - > discover the map api invoked by the other site - > change to the package invoked by the German map api - store the operation object
Create a log folder to hold logs of database operations
Create a data folder to hold the initial excel data

 

Basic database construction:

Sample data table format:
Table name: sample_table

name column data type length distribution fk Required fields Remarks
Territory name address text       TRUE Territory name
Territory type ad_type integer       TRUE 0 - is a prefecture level city; 1 - is a county, county level city, district.
Latitude and longitude coordinates text       TRUE Latitude and longitude of Geographic Names
···              

Format of sample 1-1 place route table

Table name: sample_route

name column data type length distribution fk Required fields Remarks
starting point origin text         starting point
Purpose destination text         Purpose
distance distance integer         distance
Route route text         Route
···              
 

Create Configuration Information Interface

Conveniently store the specific variables and configuration information we need.

public/config.py

#!/usr/bin/python
# -*- coding: utf-8 -*-
#__author__: stray_camel

import os,sys
#The parent directory of the directory where the current package is located
src_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 

Create an interface to read excel data

Using python's xlrd package, define the process_data package to store the operation excel data and generate classes for sql statements

Reference github source readme document
No document s were found on the PyPI, so we can only go to github to find the source code. xlrd handles excel basic guide

import xlrd
book = xlrd.open_workbook("myfile.xls")
print("The number of worksheets is {0}".format(book.nsheets))
print("Worksheet name(s): {0}".format(book.sheet_names()))
sh = book.sheet_by_index(0)
print("{0} {1} {2}".format(sh.name, sh.nrows, sh.ncols))
print("Cell D30 is {0}".format(sh.cell_value(rowx=29, colx=3)))
for rx in range(sh.nrows):
print(sh.row(rx))

Create process_data/excel2sql.py extension package for easy import later
Get excel's data to construct sql statement and create city table (hubei province) sample table

process_data/excel2sql.py

#!/usr/bin/python
# -*- coding: utf-8 -*-
#__author__: stray_camel

import xlrd,sys,os,logging
from public import config
class Excel2Sql(object):
    def __init__(
        self, 
        url:"str File path of type", 
        sheet:"excel Form name in"):
        self.f_name = url
        # Store specific form name data in excel
        self.sh_data = xlrd.open_workbook(self.f_name).sheet_by_name(sheet)
        self.rows = self.sh_data.nrows
        self.cols = self.sh_data.ncols

When we generate this Excel2Sql object, we want to follow a similar pattern

excel_data = excel2sql.Excel2Sql("fiel_name","sheet_name")

Code to read excel files directly and get data from a form.So when we initialize an object, we want to assign values to its properties.

In the excel table, we store the data as follows:

Province/municipality directly under the Central Government Prefectural City County, county-level city, District
Beijing Beijing Dongcheng District
... ... ...

 

 

 

Then we want to be able to access one of these functions by calling this class (interface) and only get all the data of a province/municipality directly under the Central Government, like Hubei province, we point to get 103 counties and districts in Qizhong.

Define methods in class Excel2Sql:

def init_SampleViaProvince_name(
        self, 
        Province_name:"Provincial Name"
        ) ->"insert Data, in list form[('Territory name 1','1','Latitude and longitude'),('Geographic Name 2','1','Latitude and longitude')]":
        geo_app = Geo_mapInterface(config.geo_key)
        all_data = [self.sh_data.row_values(i) for i in range(self.rows)]

        cities_data=[[["".join(i),1],["".join(i[1:len(i)]),1]][i[0]==i[1]] for i in all_data if i[0] == Province_name]
        for i in cities_data:
            i.append(geo_app.get_coordinatesViaaddress("".join(i[0])))
        # cities_data=[[["".join(i),1,'test1'],["".join(i[1:len(i)]),1,'test2']][i[0]==i[1]] for i in all_data if i[0] == Province_name]
        return cities_data

We can then test whether the class is correctly constructed or debug:
Write at the end of the file:

if __name__ == "__main__":
    test = Excel2Sql(config.src_path+"\\data\\2019 Latest National Table of Administrative Levels of Cities, Provinces, Counties(194).xls","National list of cities, provinces, cities and counties")
    print(test.init_SampleViaProvince_name("Beijing"))

Test results:

(env) PS F:\Viewer Data> & f:/Viewer Data/env/Scripts/python.exe f:/Viewer Data/City Distance Crawling/process_data/excel2sql.py
 ['Dongcheng District, Beijing', 1,'116.416357, 39.928353'], ['Xicheng District, Beijing', 1,'116.365868, 39.912289'], ['Chongwen District, Beijing', 1,
'116.416357,39.416357,39.92838353'], ['Xuanwu District in Beijing', 1,'116.365868, 116.365868,39.912289'], ['Chaoyyang District in Beijing', 1,'116.601144,39.94858574'], ['Fengttai District in Beijing', 1,'Fengttai District District in Beijing Beijing', 1,'116.2871717149 49,39.85848484848484842727'], ['Shijingjingshan Mountain Mountain Mountain Mountain Mountain Mountain Mountain Area in Beijing', 1,'116.222229292982 82,39.90909066666611'], ['Haidiadiadian District', 1,'], ['Mentougou District, Beijing', 1,'116.102009, 39.940646'], ['BeijingFangshan District', 1,'116.143267, 39.749144'], ['Tongzhou District, Beijing', 1,'116.656435, 39.909946'], ['Shunyi District, Beijing', 1,'116.654561, 40.130347'], ['Chang, Beijing'
Ping District', 1,'116.231204,40.220660'], ['Daxing District of Beijing', 1,'116.341014,39.784747'], ['Pinggu District of Beijing', 1,'117.121383,40.140701'], ['Huairou District of Beijing', 1,'116.642349,40.315704'], ['Miyun County of Beijing', 1,'116.843177,40.376834'], ['Beijing'
City Yanqing County', 1,'115.974848, 40.456951']]
 

Create an OP database postgresql (and all other databases, too ~) interface

Define the op_postgresql package to store the database's operation objects and define various methods

The curd of the database is really from sophomore to sophomore.
General package for accessing postgresql databases: psycopg2
Visit the official website
In this Operations Documentation site, the ideas used have been clearly written out at http://initd.org/psycopg/docs/usage.html

 

 

Hope you don't have to take detours on the Internet (see less translated documents.)
http://initd.org/psycopg/

The same pattern applies, calling postgresql's driver/interface, setting parameters to log in, and accessing the database.Set cursor, inject sql data, fetch return value.

  • One thing to note here is that xss injection is prevented by default and parameter access is generally set when writing code.
  • Note that log files are generated and logs are printed

Specific process is not detailed, just code

op_postgresql/opsql.py:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#__author__: stray_camel

'''
//Define encapsulation of basic operations for mysql database
1.Data Insertion
2.Empty table
3.Query all data of a table
'''
import logging
import psycopg2
from public import config
class OperationDbInterface(object):
    #Define Initial Connection Database
    def __init__(self, 
    host_db : 'Database Service Host' = 'localhost', 
    user_db: 'Database User Name' = 'postgres', 
    passwd_db: 'Database Password' = '1026shenyang', 
    name_db: 'Database Name' = 'linezone', 
    port_db: 'Port number, integer number'=5432):
        try:
            self.conn=psycopg2.connect(database=name_db, user=user_db, password=passwd_db, host=host_db, port=port_db)#Create database links
        except psycopg2.Error as e:
            print("Failed to create database connection|postgresql Error %d: %s" % (e.args[0], e.args[1]))
            logging.basicConfig(stream=open(config.src_path + '/log/syserror.log', encoding="utf-8", mode="a"), level = logging.DEBUG,format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s')
            logger = logging.getLogger(__name__)
            logger.exception(e)
        self.cur=self.conn.cursor()

    #Define the insertion of data into the sample table
    def insert_sample_data(self, 
    condition : "insert Sentence" = "insert into sample_data(address,ad_type,coordinates) values (%s,%s,%s)", 
    params : "insert Data, List Form[('Territory name 1','1','Latitude and longitude'),('Geographic Name 2','1','Latitude and longitude')]" = [('Territory name 1','1','Latitude and longitude'),('Geographic Name 2','1','Latitude and longitude')]
    ) -> "Bulk insert data results as a dictionary" :
        try:
            self.cur.executemany(condition,params)
            self.conn.commit()
            result={'code':'0000','message':'Successful batch insert operation','data':len(params)}
            logging.basicConfig(stream=open(config.src_path + '/log/syserror.log', encoding="utf-8", mode="a"), level = logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s')
            logger = logging.getLogger(__name__)
            logger.info("In sample table sample_data Insert data into{}Bar, operation:{}!".format(result['data'],result['message']))
        except psycopg2.Error as e:
            self.conn.rollback()  # Perform rollback operation
            result={'code':'9999','message':'Perform Bulk Insert Exception','data':

Tags: Python Database Excel SQL

Posted on Thu, 06 Feb 2020 19:49:16 -0800 by Right Wing Liberal