Python word cloud [Chinese / English] simple introduction to Xiaobai

1. analysis

To build a word cloud, you need to:

  • Raw materials are articles, etc
  • Segmentation of content
  • Using the tool of word cloud construction to construct the content after word segmentation
  • Save as picture

2. Main modules required

  • Chinese word segmentation of jieba
  • wordcloud building word cloud

3. Module principle

Implementation principle of wordcloud

  • Text preprocessing
  • Word frequency statistics
  • Color rendering high frequency words in the form of pictures

The implementation principle of jieba

  • Chinese word segmentation (with multiple modes)[ details]

4. English word cloud

English word segmentation and word cloud construction only need wordcloud module

The specific implementation is as follows:

 1 from wordcloud import WordCloud
 3 string = 'Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good.'
 4 font = r'C:\Windows\Fonts\FZSTK.TTF'
 5 wc = WordCloud(font_path=font, #This must be added if it is Chinese, otherwise it will be displayed as a frame
 6                background_color='white',
 7                width=1000,
 8                height=800,
 9                ).generate(string)
10 wc.to_file('ss.png') #Save pictures

5. Chinese participle

The specific implementation is as follows:

1 import jieba 
2 cut = jieba.cut(text)  #text For the string you need to participle/sentence
3 string = ' '.join(cut)  #Space separated words

6. Chinese word cloud

Chinese word cloud needs jieba and wordcloud modules

The specific implementation is as follows:

 1 import jieba
 2 from wordcloud import WordCloud
 3 from PIL import Image
 4 import numpy as np
 6 font = 'hwkt.ttf'
 7 content = (open('Job demand.txt','r',encoding='utf-8')).read()
 8 cut = jieba.cut(content)
 9 cut_content = ' '.join(cut)
10 img ='22.png') # What image to display
11 img_array = np.array(img) # Convert picture to array
13 wc = WordCloud(
14     background_color='white',
15     mask=img_array, # If there is no such item, the default picture will be generated
16     font_path=font # Chinese word segmentation must have Chinese font settings
17 )
18 wc.generate_from_text(cut_content) # Draw pictures
19 wc.to_file('new.png') # Save pictures

7. Effect

The implementation effect of English word cloud is as follows:


The implementation effect of Chinese word cloud is as follows:


