Store the data crawled by the crawler framework into mysql database

Crawling website data by using scratch is a relatively mainstream crawler framework at present, which is also very simple.
1. After the project is created, change the value of robotstxt'obey to False in Otherwise, it will follow the robots protocol by default, and you will not be able to crawl any data.
2. Start to write your crawler in the crawler file. You can use xpath or css selector to parse the data. After all the data is parsed, declare your fields in the items file

import scrapy
class JobspiderItem(scrapy.Item):
    zwmc = scrapy.Field()
    zwxz = scrapy.Field()
    zpyq = scrapy.Field()
    gwyq = scrapy.Field()

3. Then, in the crawler file, first import the declared field class in items, then create the item object, write the value, and finally don't forget to give the item to yield

item = JobspiderItem()
item['zwmc'] = zwmc
item['zwxz'] = money
item['zpyq'] = zpyq
item['gwyq'] = gzyq
yield item

4. The next step is to save it to mysql database:
1. Import the item class and pymysql module in the file

from ..jobspider.items import JobspiderItem
import pymysql

2. Then I started to connect to and write to the database. Here, I set up the mysql database and data table directly, but not in the code

class JobspiderPipeline(object):
    def __init__(self):
        # 1. Establish database connection
        self.connect = pymysql.connect(
        # localhost is connected to the local database
            # Port number of mysql database
            # User name of the database
            # Local database password
            # Table name
            # Encoding format
        # 2. Create a cursor to operate the table.
        self.cursor = self.connect.cursor()

    def process_item(self, item, spider):
        # 3. Put the Item data into the database, which is written synchronously by default.
        insert_sql = "INSERT INTO job(zwmc, zwxz, zpyq, gwyq) VALUES ('%s', '%s', '%s', '%s')" % (item['zwmc'], item['zwxz'], item['zpyq'], item['gwyq'])

        # 4. Submit operation

    def close_spider(self, spider):

5. Finally, go to the file to uncomment the item "pipeline"

Tags: Database MySQL encoding

Posted on Sun, 05 Jan 2020 12:57:11 -0800 by the max