Office automation-3 word operation of Python automation

1, Preparation before class

python docx library is required for processing Word in python. The terminal executes the following installation commands:

  • pip3 install python-docx

Note: there may be less use of word, which is not serious, but the latter invitation can be used as a useful reference, which may be used. For example, your wedding invitation, salary slip, notice, etc., you don't need to write one by one (of course, you can't do it to show your sincerity.)

get down to business,

2, Key points of knowledge

  • R & D logic is to explain logic, generally from top to bottom, following the ideas of What - Why - How or what - what - how;

1. Preheating: a preliminary understanding of docx

(1) Create a new blank word and insert text

# Import library
from docx import Document

# New blank document
doc_1 = Document()

# Add title (0 is equivalent to the title of the article, the default level is 1, and the level range is 0-9)
doc_1.add_heading('New blank document title, level 0',level = 0)
doc_1.add_heading('New blank document title, level 1',level = 1)
doc_1.add_heading('Create a new blank document title at level 2',level = 2)

# New paragraph
paragraph_1 = doc_1.add_paragraph('This is the beginning of the first paragraph\n Please take care!')

# Bold
paragraph_1.add_run('Bold font').bold = True
paragraph_1.add_run('Normal font')

# Italics
paragraph_1.add_run('Italic Font').italic =True

# New paragraph (below current paragraph)
paragraph_2 = doc_1.add_paragraph('The new second paragraph.')

# New paragraph (above designated end)
prior_paragraph = paragraph_1.insert_paragraph_before('Paragraph inserted before the first paragraph of text')

# Add page break (flexible typesetting)
doc_1.add_page_break()

# New paragraph (above designated end)
paragraph_3 = doc_1.add_paragraph('This is the first paragraph on the second page!')

# Save file (under current directory)
doc_1.save('doc_1.docx')

2. Formal: word operation of python automation

Before operation, we need to understand the page structure of Word document:

  • Document - Document
  • Paragraph - paragraph
  • Text block - Run

**Python docx regards the whole article as a Document * * object, and its basic structure is as follows:

  • Each * * Document contains many Paragraph objects representing "paragraphs", which are stored in Document Paragraphs * *.
  • Each * * Paragraph has many Run objects representing "inline elements", which are stored in the Paragraph Runs * *.

In * * Python docx, run is the most basic unit, and the text style in each run object is consistent, that is, when generating document objects from docx files, python docx * * will divide the text into run objects according to the change of style.

(1) Introduction to the overall page structure

Let's take a small case as the main line to string documents, paragraphs and text blocks:

# Import library
from docx import Document
from docx.shared import RGBColor, Pt,Inches,Cm
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.oxml.ns import qn

# New document (Datawhale)
doc_1 = Document()

# Font settings (global)
'''Change only font.name Is not enough, you need to call._element.rPr.rFonts of set()method.'''

doc_1.styles['Normal'].font.name = u'Song style'
doc_1.styles['Normal']._element.rPr.rFonts.set(qn('w:eastAsia'), u'Song style')

# Add a title (0 is equivalent to the title of the article, the default level is 1, the level range is 0-9, and it is automatically underlined when 0)
heading_1 = doc_1.add_heading('Jay Chou',level = 0)
heading_1.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER   #Center alignment, left alignment by default

# New paragraph
paragraph_1 = doc_1.add_paragraph()
'''
Format paragraph: first line indented 0.75cm,Left, back distance 1.0 inch,1.5 Double line spacing.
'''
paragraph_1.paragraph_format.first_line_indent = Cm(0.75)
paragraph_1.paragraph_format.alignment =  WD_PARAGRAPH_ALIGNMENT.LEFT
paragraph_1.paragraph_format.space_after =  Inches(1.0)
paragraph_1.paragraph_format.line_spacing =  1.5

text = 'Chinese pop singer in Taiwan, China' \
       'Music creators, composers, Lyricists' \
       'Producer, one of the bosses of Javier music company and director.' \
       'In recent years, he has set foot in the film industry. Jay Chou is the most revolutionary and indicator of Asian pop music after 2000' \
       'Sexual singer songwriter, known as the "king of Asian pop". He broke through the theme and form of the original Asian music' \
       'Style, integrating multiple music materials to create changeable song styles, especially hip-hop integrating Chinese and western music styles' \
       'Or rhythm and blues is the most famous, which can be said to be the pioneer of creating the "Chinese style" of Chinese pop music. Jay Chou's' \
       'It has broken the long-standing stagnation of Asian pop music and opened a new page for Asian pop music!'
    
r_1 = paragraph_1.add_run(text)
r_1.font.size =Pt(10)    #Font size
r_1.font.bold =True       #Bold
r_1.font.color.rgb =RGBColor(255,0,0)      #colour

# Save file (under current directory)
doc_1.save('Jay Chou.docx')

From the above example, we can see that the smallest operation object is a text block, which is operated through the specification of run. Such as font size, color, etc; The next level – the format of the paragraph is through the paragraph_format;

(2) Font settings

Through (1), students have noticed that the font setting is a global variable. If I want to set different fonts in different parts, what should I do? This requires operation and setting before application.

'''Font setting 1.py'''
#Import library
from docx import Document
from docx.oxml.ns import qn
from docx.enum.style import WD_STYLE_TYPE

document = Document() # New docx document

# Set Tahoma style
style_font = document.styles.add_style('Song style', WD_STYLE_TYPE.CHARACTER)
style_font.font.name = 'Song style'
document.styles['Song style']._element.rPr.rFonts.set(qn('w:eastAsia'), u'Song style')

# Set Kaiti style
style_font = document.styles.add_style('Regular script', WD_STYLE_TYPE.CHARACTER)
style_font.font.name = 'Regular script'
document.styles['Regular script']._element.rPr.rFonts.set(qn('w:eastAsia'), u'Regular script') # All fonts in the paragraph

# Set the style of song characters in Chinese
style_font = document.styles.add_style('Chinese song', WD_STYLE_TYPE.CHARACTER)
style_font.font.name = 'Chinese song'
document.styles['Chinese song']._element.rPr.rFonts.set(qn('w:eastAsia'), u'Chinese song')

paragraph1 = document.add_paragraph() # Add paragraph
run = paragraph1.add_run(u'aBCDefg This is Chinese', style='Song style') # Set Tahoma style

font = run.font #Set font
font.name = 'Cambira' # Set Western Font
paragraph1.add_run(u'aBCDefg This is Chinese', style='Regular script').font.name = 'Cambira'
paragraph1.add_run(u'aBCDefg This is Chinese', style='Chinese song').font.name = 'Cambira'

document.save('Font setting 1.docx')
'''Font setting 2.py'''
#Import library
from docx import Document
from docx.oxml.ns import qn
from docx.enum.style import WD_STYLE_TYPE

#Define font setting function
def font_setting(doc,text,font_cn):
       style_add = doc.styles.add_style(font_cn, WD_STYLE_TYPE.CHARACTER)
       style_add.font.name = font_cn
       doc.styles[font_cn]._element.rPr.rFonts.set(qn('w:eastAsia'), font_cn)
       par = doc.add_paragraph()
       text = par.add_run(text, style=font_cn)

doc = Document()
a = 'Children, do you have many question marks'
b = 'Why are people reading comics there'
c = 'I'm learning to draw and talk to the piano'

font_setting(doc,a,'Song style')
font_setting(doc,b,'Chinese song')
font_setting(doc,c,'Blackbody')

doc.save('Font setting 2.docx')

We can easily see that the font setting is 1 Py and font settings 2 The difference of Py is whether it is the same paragraph and the font is set to 2 Py has a custom function. Students can choose from specific scenes in practical work.

(3) Insert pictures and tables

#Import library
from docx import Document
from docx.shared import Inches

#open documents
doc_1 = Document('Jay Chou.docx')   #Documents stored in the above script

#New picture
doc_1.add_picture('Jay Chou.jpg',width=Inches(1.0), height=Inches(1.0))

# Create a table with 3 rows and 1 column
table1 = doc_1.add_table(rows=2, cols=1)
table1.style='Medium Grid 1 Accent 1'  #There are many table styles, such as Light Shading Accent 1

# Modify the content of the cell in Row 2 and column 3 to Yingkou
table1.cell(0, 0).text = 'Yingkou'
# Change the content of the cell in row 3 and column 4 to people
table1.rows[1].cells[0].text = 'the people'

# Add a new row at the bottom of the table
row_cells = table1.add_row().cells
# Add content to the first column of the new row
row_cells[0].text = 'come on.'

doc_1.save('Jay Chou cheers for Yingkou.docx')

(4) Set header and footer

In the python docx package, the header and footer objects in the section should be used for specific settings.

from docx import Document
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT

document = Document() # New document

header = document.sections[0].header # Gets the header of the first section
print('Default number of paragraphs in header:', len(header.paragraphs))
paragraph = header.paragraphs[0] # Gets the first paragraph of the header
paragraph.add_run('This is the header of the first section') # Add page content
footer = document.sections[0].footer # Gets the footer of the first section
paragraph = footer.paragraphs[0] # Gets the first paragraph of the footer
paragraph.add_run('This is the footer of the first section') # Add footer content


'''stay docx Two more sections are added to the document, a total of three sections. The page and footer will display "the same as the previous section".
If you do not use the content and style of the previous section, you should header.is_linked_to_previous Property or footer.is_linked_to_previous The property of is set to False,
Used to release "link previous section header" or "link previous section footer".'''
document.add_section() # Add a new section
document.add_section() # Add section 3
header = document.sections[1].header # Get the header of section 2
header.is_linked_to_previous = False # Do not use the content and style of the previous section

#Snap settings 
header = document.sections[1].header # Get the header of section 2
header.is_linked_to_previous = False # Do not use the content and style of the previous section
paragraph = header.paragraphs[0]
paragraph.add_run('This is the header of section 2')
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Set header center alignment
document.sections[1].footer.is_linked_to_previous = False
footer.paragraphs[0].add_run('This is the footer of section 2') # Add section 2 footer
footer.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Set section 2 footer center alignment
header = document.sections[2].header # Get the header of section 3
header.is_linked_to_previous = False # Do not use the content and style of the previous section
paragraph = header.paragraphs[0] # Gets the paragraph in the header
paragraph.add_run('This is the header of section 3')
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT # Set header right alignment
document.sections[2].footer.is_linked_to_previous = False
footer.paragraphs[0].add_run('This is the footer of section 3') # Add section 3 footer
footer.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT # Set section 3 footer right alignment
document.save('Header footer 1.docx') # Save document
Default number of paragraphs in header: 1

The results are as follows:

(5) Code extension

'''Snap settings '''
from docx.enum.text import WD_ALIGN_PARAGRAPH
#LEFT: align LEFT
#CENTER: text centered
#RIGHT: align RIGHT
#JUSTIFY: align text at both ends

'''Set paragraph line spacing'''
from docx.shared import Length
# SINGLE: SINGLE line spacing (default)
#ONE_ POINT_ Five: 1.5 times line spacing
# DOUBLE2: double line spacing
#AT_LEAST: minimum value
#Actual: fixed value
# MULTIPLE: MULTIPLE line spacing

paragraph.line_spacing_rule = WD_LINE_SPACING.EXACTLY #Fixed value
paragraph_format.line_spacing = Pt(18) # Fixed value 18 lbs
paragraph.line_spacing_rule = WD_LINE_SPACING.MULTIPLE #Multiple line spacing
paragraph_format.line_spacing = 1.75 # 1.75 times line spacing

'''Set font properties'''
from docx.shared import RGBColor,Pt
#all_caps: all capital letters
#bold: bold
#Color: font color

#double_strike: double strike
#Hidden: hidden
#imprint: imprint
#Italic: italic
#name: font
#shadow: shadow
#strike: strike
#Subscript: subscript	
#Superscript: superscript
#Underline: underline
---------------------------------------------------------------------------

3, Project practice

1, Demand

As an executive of the company, you invite partners to participate in the company's meetings;

The list of participants is as follows:

The proposed invitation format is as follows:

According to the list of participants, use python to generate invitations in batch.

2, Demand analysis

The logic is relatively simple:

  • Obtain the information of each line in Excel file and extract parameters; Design the invitation style and output it according to the obtained parameters
  • Design word paragraph, font and other styles.

3, Code

# Import library
from openpyxl import load_workbook
from docx import Document
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.shared import RGBColor, Pt,Inches,Cm
from docx.oxml.ns import qn


path = r'D:\study\word automation'
# The path is the location of the Excel file, which can be changed according to the actual situation

workbook = load_workbook(path + r'\excel reach word.xlsx')
sheet = workbook.active   #Default WorkSheet

n = 0   #In order not to traverse the title (the first line of excel)
for row in sheet.rows:
    if n:
        company = row[0].value
        office = row[1].value
        name = row[2].value
        date = str(row[3].value).split()[0]
        print(company, office, name, date)


        doc = Document()
        heading_1 = 'invitation'
        paragraph_1 = doc.add_heading(heading_1, level=1)
        # Center alignment
        paragraph_1.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
        # Modify larger font size separately
        for run in paragraph_1.runs:
            run.font.size = Pt(17)

        greeting_word_1 = 'honorific'
        greeting_word_2 = 'company'
        greeting_word_3 = ',Hello!'
        paragraph_2 = doc.add_paragraph()

        paragraph_2.add_run(greeting_word_1)
        r_1 = paragraph_2.add_run(company)
        r_1.font.bold = True  # Bold
        r_1.font.underline = True    #Underline

        paragraph_2.add_run(greeting_word_2)

        r_2 = paragraph_2.add_run(office)
        r_2.font.bold = True  # Bold
        r_2.font.underline = True    #Underline

        r_3 = paragraph_2.add_run(name)
        r_3.font.bold = True  # Bold
        r_3.font.underline = True    #Underline
        paragraph_2.add_run(greeting_word_3)

        paragraph_3 = doc.add_paragraph()
        paragraph_3.add_run('We sincerely invite you to attend on October 27, 2021 DataWhale The open source 2050 activity is hosted in Beijing bird's nest. I hope you will come and participate at that time.')
        paragraph_3.paragraph_format.first_line_indent = Cm(0.75)
        paragraph_3.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
        paragraph_3.paragraph_format.space_after = Inches(1.0)
        paragraph_3.paragraph_format.line_spacing = 1.5

        paragraph_4 = doc.add_paragraph()
        date_word_1 = 'Invitation time:'
        paragraph_4.add_run(date_word_1)
        paragraph_4.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT
        sign_date = "{}year{}month{}day".format(date.split('-')[0], date.split('-')[1], date.split('-')[2])
        paragraph_4.add_run(sign_date).underline = True
        paragraph_4.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT
        
        #Set full text font
        for paragraph in doc.paragraphs:
            for run in paragraph.runs:
                run.font.color.rgb = RGBColor(0, 0, 0)
                run.font.name = 'Regular script'
                r = run._element.rPr.rFonts
                r.set(qn('w:eastAsia'), 'Regular script')
        doc.save(path + "\{}-invitation.docx".format(name))
    n = n + 1

Keywords: Python doc

Added by allenmak on Sat, 29 Jan 2022 05:34:34 +0200