Python operation Word and Excel and packaging

Installing the module

# Word action library
pip install docx
# Excel operation Library
pip install openpyxl
# Package exe tool
pip install pyinstaller

Word operation

Reference address: https://python-docx.readthedocs.io/en/latest/

  1. matters needing attention

    • Can only operate on docx format of openxml specification
  2. Word internal structure

# Sometimes, when the data cannot be obtained through the open method, the internal xml structure can be considered for processing
from docx import Document
doc= Document(path)
body_element = doc._body._body
# Show document internal structure
print(body_element.xml) 
  1. Instance: get document directory
#Get the namespace of xml
def xpath_ns(tree):
    "get xml namespace"
    nsmap = dict((k, v) for k, v in tree.nsmap.items() if k)
    return nsmap

doc= Document(path)
body_element = doc._body._body
ns= xpath_ns(body_element)
# Get the node where the directory is located
links = body_element.xpath('./w:p/w:hyperlink')
for link in links:
    # Get the contents of each directory
    runs= [Run(r,None) for r in link.xpath('w:r[@w:rsidRPr]',namespaces=ns)]
    for r in runs:
        # print contents
        print(r.text)
  1. Example: reading table contents
doc= Document(path)
for table in doc.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text)
  1. Example: reading paragraph content
doc= Document(path)
for g in doc.paragraphs: 
    for run in g.runs:
        print(run.text)

Excel operation

Reference address: https://openpyxl.readthedocs.io/en/stable/usage.html

  1. Instances: formatting
import openpyxl
from openpyxl.styles import PatternFill, Border, Side, Alignment,Font

# Cell font
bft=Font(name="Meiryo UI",size=9,bold=True)
# Fill pattern
headFill=PatternFill("solid", fgColor="d9d9d9")
# Border line
thin = Side(border_style="thin", color="000000")
# Border fill style
border= Border(top=thin, left=thin, right=thin, bottom=thin)
# Alignment
align= Alignment(horizontal="center", vertical="center")
# Rerouting settings
wraptext= Alignment(vertical="center",wrapText=True)

bk= openpyxl.load_workbook(filename="test.xlsx")
oSheet=bk["test"]   
# Value setting data 
cell=oSheet.cell(row=row,column=col,value="hello world!")
cell.font=bft
cell.fill= headFill
cell.border= border
cell.alignment= align
#cell.alignment= wraptext
# Fixed first three rows and three columns
oSheet.freeze_panes='D4'
bk.save(expath)
bk.close()
  1. Example: automatic column width

The principle is to get the maximum width of each column and set it

import openpyxl

def getMaxLength(max,data):
    "Compare the number of characters to return the maximum value"
    length=len(str(data))
    if length > max:
        return length
    else:
        return max

book= openpyxl.load_workbook(filename="test.xlsx")
sheet=book["test"]   
for col in sheet.columns:
    max_length=0
    for cell in col:
        max_length=getMaxLength(max_length,cell.value)
    adjusted_width = (max_length + 2) * 1.2
    sheet.column_dimensions[col[0].column_letter].width = adjusted_width

Package exe file

Packaging purpose: we can run our programs on computers without python environment

  1. cmd to py file directory
    • Verify that pyinstaller is installed
  2. Perform packaging
PyInstaller test.py --onefile --noconsole
  • --onefile: means to package an exe file
  • --noconsole: no black command window
  1. After execution, the dist and build folders will be generated in the directory

    dist: the exe file in the folder is the exe we need.
    Now it can be executed on computers without Python environment.

Other operations

  1. Get folder file
import glob
files= glob.glob(docxPath+"/*.docx")
  1. String match replacement
import re
re.sub(r'^[0-9,.]*', "", text)
  1. Get file name
import ntpath
name=ntpath.basename(path)
  1. Determine whether the path is a file
from os.path import isfile
isfile(path)

Tags: xml pip Python Excel

Posted on Fri, 29 May 2020 08:06:54 -0700 by luuney