Python Valentine's Day super skill export wechat chat record generation word cloud

Preface

The text and pictures of this article are from the Internet, only for learning and communication, not for any commercial purpose. The copyright belongs to the original author. If you have any questions, please contact us in time for handling.

Author: Python practical book

PS: if you need Python learning materials, you can click the link below to get them by yourself

http://note.youdao.com/noteshare?id=3054cce4add8a909e784ad934f956cef

See if this is a little love with sound and picture~

Today we collect the daily conversation between lovers

Use this to make a little romance that belongs only to you!

First, we need to export the data of ourselves and objects~

Wechat's backup function can't directly export text format to you. It is actually a database called sqlite. If we use the method circulated on the Internet to extract text data, iOS needs to download itunes to back up the whole machine, Android needs the root permission of the machine, which is very troublesome. Here we introduce a method to export only chat data with objects without the whole machine backup and the root permission of the machine.

That is to use the Android emulator to export, so that ios / Android can be used universally, and can avoid adverse effects on the machine. First, you need to use the wechat of the computer version to back up the chat records of you and your object. Take windows as an example:

  1. Download the night God simulator

  2. Download wechat in the night God simulator

  3. Use wechat of windows client version for backup, as shown in the lower left corner of the figure

  4. Click backup chat record to computer

  5. Select backup object on mobile terminal

Click to enter the selection chat record below, and then select the record with your object

  1. After exporting, open the simulator and log in to wechat of the simulator

  2. After successful login, return to wechat of computer version to log in, open backup and recovery, and choose to restore chat record to mobile phone

  3. Check the chat record we just exported and click on the mobile phone to start recovery

  4. Open the root permission of night God simulator

  5. Search the RE file manager with Baidu browser of the simulator, download it (Figure 1) and open it after installation. A dialog box will pop up for you to give root permission, choose to give it permanently, open the RE file manager (Figure 2), enter the following folder (Figure 3), which is the place where the application stores data.

/data/data/com.tencent.mm/MicroMsg

Then enter a folder composed of numbers and letters, as shown in Figure 3, 4262333387ddefc95fee35aa68003cc5

  1. Find the file EnMicroMsg.db under the folder and copy it to the shared folder of the night God simulator (Figure 4). The location of the shared folder is / mnt/shell/emulated/0/others (Figure 5). Now visit C:\Users \ your user name \ NOx? Share \ othershare of windows to get the database file (micromsg. DB)

  2. After exporting the database, use a software called sqlcipher to read the data, and the first seven digits after MD5 calculation of the string are the passwords of the database. For example, "355757010761231 857456862" actually has no space in the middle, and then put it into MD5 calculation to take the first seven digits, which will be introduced in detail later.

Wow, it's really "easy to understand". It doesn't matter. Next, I'll tell you how to obtain IMEI and UIN.

The first is IMEI, which can be found in the system settings - property settings in the upper right corner of the simulator, as shown in the figure.

Now that we have IMEI, what about UIN?

Similarly, open this file with the RE file manager

Long press to change the file, click the three points in the upper right corner - select the opening method - text browser, find the default UIN, and then the number is!

After getting these two strings of numbers, you can start to calculate the password. If my IMEI is 355757010762041 and Uin is 857749862, then the combination is 355757010762041857749862. Put this string of numbers into free MD5 online calculation

The first seven digits of the obtained number are our passwords, like this one is 6782538

Then we can enter our core link: use sqlcipher to export chat text data!

Click File - open database - select the database file we just created, and a box will pop up for you to enter the password. We can enter the database by entering the seven digit password we just obtained. Select message form, and this is the chat record between you and your object!

We can export it as a csv file: File - export - table as csv

Next, we will use Python code to extract the real chat content: content information, as shown below. Although this software also allows select, it is not allowed to export after select, which is very difficult to use, so we might as well write one ourselves:

 1 import pandas
 2 import csv, sqlite3
 3 conn= sqlite3.connect('chat_log.db')
 4 # New database as chat_log.db
 5 df = pandas.read_csv('chat_logs.csv', sep=",")
 6 # Read what we extracted in the previous step csv File, change to your own file name here
 7 df.to_sql('my_chat', conn, if_exists='append', index=False)
 8 # Deposit in my_chat In the table
 9  
10 conn = sqlite3.connect('chat_log.db') 
11 # Connect to database
12 cursor = conn.cursor()
13 # Get cursor
14 cursor.execute('select content from my_chat where length(content)<30') 
15 # take content The length is limited to less than 30 because content Sometimes there are things sent by wechat in
16 value=cursor.fetchall()
17 # fetchall Return filter results
18  
19 data=open("Chat record.txt",'w+',encoding='utf-8') 
20 for i in value:
21     data.write(i[0]+'\n')
22 # Write filter results to chat records.txt
23  
24 data.close()
25 cursor.close()
26 conn.close()
27 # Close connection

 

Remember to convert the encoding format of the csv file to utf-8, otherwise it may not run:

Step 2: generate word cloud according to the chat data obtained in step 1

. import our chat records and segment each line

Chat record is a sentence line by line. We need to use word segmentation tool to decompose the sentence line by line into an array of words. At this time, we need to use stammer participle.

After segmentation, we need to remove some mood words, punctuation marks, etc. (stop words), and then we need to customize some dictionaries. For example, if you love each other, the common stuttering participles cannot be recognized, and you need to define them by yourself. For example, don't catch a cold, the common segmentation result is

Little / fool / don't / cold /

If you add "little fool" to the custom dictionary (mywords.txt in our example below), the result of segmentation will be

Little fool / don't / have a cold /

Let's segment our chat records as follows:

 1 import jieba
 2 import codecs
 3 def load_file_segment():
 4     # Read text file and segment words
 5     jieba.load_userdict("mywords.txt")
 6     # Load our own dictionary
 7     f = codecs.open(u"Chat record.txt",'r',encoding='utf-8')
 8     # Open file
 9     content = f.read()
10     # Read file to content in
11     f.close()
12     # Close file
13     segment=[]
14     # Save segmentation results
15     segs=jieba.cut(content) 
16     # Participle the whole
17     for seg in segs:
18         if len(seg) > 1 and seg != '\r\n':
19             # If the result of word segmentation is not a single word and is not a line break, it is added to the array
20             segment.append(seg)
21     return segment
22 print(load_file_segment())

 

  1. Calculate the corresponding frequency of words after segmentation

To facilitate the calculation, we need to introduce a package called pandas, and then to calculate the number of each word, we need to introduce a package called numpy. In cmd/terminal, enter the following command to install pandas and numpy:

pip install pandas
pip install numpy
 1 import pandas
 2 import numpy
 3 def get_words_count_dict():
 4     segment = load_file_segment()
 5     # Get word segmentation results
 6     df = pandas.DataFrame({'segment':segment})
 7     # Convert word segmentation array to pandas data structure
 8     stopwords = pandas.read_csv("stopwords.txt",index_col=False,quoting=3,sep="\t",names=['stopword'],encoding="utf-8")
 9     # Load stop words
10     df = df[~df.segment.isin(stopwords.stopword)]
11     # If not in the stop word
12     words_count = df.groupby(by=['segment'])['segment'].agg({"count":numpy.size})
13     # Group by words, count the number of each word
14     words_count = words_count.reset_index().sort_values(by="count",ascending=False)
15     # reset_index It's for retention segment Field, sort, number large in front
16     return words_count
17 print(get_words_count_dict())

 

The complete code, wordCloud.py, is as follows with detailed analysis:

 1 import jieba
 2 import numpy
 3 import codecs
 4 import pandas
 5 import matplotlib.pyplot as plt
 6 from scipy.misc import imread
 7 import matplotlib.pyplot as plt
 8 from wordcloud import WordCloud, ImageColorGenerator
 9 from wordcloud import WordCloud
10  
11 def load_file_segment():
12     # Read text file and segment words
13     jieba.load_userdict("mywords.txt")
14     # Load our own dictionary
15     f = codecs.open(u"Chat record.txt",'r',encoding='utf-8')
16     # Open file
17     content = f.read()
18     # Read file to content in
19     f.close()
20     # Close file
21     segment=[]
22     # Save segmentation results
23     segs=jieba.cut(content) 
24     # Participle the whole
25     for seg in segs:
26         if len(seg) > 1 and seg != '\r\n':
27             # If the result of word segmentation is not a single word and is not a line break, it is added to the array
28             segment.append(seg)
29     return segment
30  
31 def get_words_count_dict():
32     segment = load_file_segment()
33     # Get word segmentation results
34     df = pandas.DataFrame({'segment':segment})
35     # Convert word segmentation array to pandas data structure
36     stopwords = pandas.read_csv("stopwords.txt",index_col=False,quoting=3,sep="\t",names=['stopword'],encoding="utf-8")
37     # Load stop words
38     df = df[~df.segment.isin(stopwords.stopword)]
39     # If not in the stop word
40     words_count = df.groupby(by=['segment'])['segment'].agg({"count":numpy.size})
41     # Group by words, count the number of each word
42     words_count = words_count.reset_index().sort_values(by="count",ascending=False)
43     # reset_index It's for retention segment Field, sort, number large in front
44     return words_count
45  
46 words_count = get_words_count_dict()
47 # Get words and frequency
48  
49 bimg = imread('ai.jpg')
50 # Read the template image we want to generate word cloud
51 wordcloud = WordCloud(background_color='white', mask=bimg, font_path='simhei.ttf')
52 # Get the word cloud object, set the word cloud background color and its pictures and fonts
53  
54 # If your background color is transparent, please replace the above two sentences with these two sentences 
55 # bimg = imread('ai.png')
56 # wordcloud = WordCloud(background_color=None, mode='RGBA', mask=bimg, font_path='simhei.ttf')
57  
58 words = words_count.set_index("segment").to_dict()
59 # Turn words and frequencies into Dictionaries
60 wordcloud = wordcloud.fit_words(words["count"])
61 # Mapping words and frequencies to word cloud objects
62 bimgColors = ImageColorGenerator(bimg)
63 # Color generation
64 plt.axis("off")
65 # Close axis
66 plt.imshow(wordcloud.recolor(color_func=bimgColors))
67 # Paint color
68 plt.show()

Tags: Python Database simulator encoding

Posted on Wed, 04 Dec 2019 14:41:20 -0800 by lansing