python office automation Word to PDF, odd pages inserted with blank pages, combined pdf

Transferred from:

https://blog.csdn.net/m0_48010654/article/details/112605971

Write custom directory title here
preface
1, Fundamentals of office automation
1. Batch processing - import os
2. Batch processing - generate file list
3. Batch processing - circular statement
2, Batch Word to PDF
3, Insert blank PDF for odd pages
4, Merge PDF
preface
The primary purpose of learning python is office automation. The use scenario is daily office. The problems to be solved are batch Word to PDF, merging PDF, inserting blank pages into odd pages (convenient for double-sided printing after PDF merging), inserting the content of Excel into Word, batch generating weekly reports, etc.
Because the beginner can't grasp it in place, and hasn't learned how to integrate code to generate a tool for direct use, only part of the code can be attached. The following code is used better. python version is 3.8.
Special note: the code used is collected online, and it is easy to use with a little modification. Thank you for the source of the code blogger.
python has also used Excel content to insert Word to generate files in batches, and scanned pictures into PDF for recognition. Because of the limited space, it will not be listed temporarily. Thanks to the predecessors of CSDN.

1, Fundamentals of office automation
Office automation mainly uses the tools for processing Word, Excel and PDF, and the implementation of batch processing.
The first step of batch processing is to use the os module - to set the file to be processed or the path to place the file. The second step is the circular statement. The module used in the code can use pip install module name -- index URL https://pypi.douban.com/simple Download, – index URL uses image download, which is much faster and avoids error reporting.

1. Batch processing - import os
import os
os.getcwd(path) -- get the current path
os.chdir(r'c: --- ') – change the current path. Remember to enter "R" to prevent the path from being escaped.
os.walk(path) traverses all files under the path in the order of root directory subdirectory file.
os.listdir(path) all files in the current directory (excluding files in subfolders).
You can filter PDF files using the following methods:
1. Judge OS path. Split text (file) [1] = = ". PDF" separates the file name from the extension.
2.file.endswith(".pdf")
3.file .spilt(".")[1] = = "pdf"
Supplementary knowledge [PYthon] OS path. Splitext() and OS path. The difference between split()

2. Batch processing - generate file list
Define a function to generate the absolute path table of the file, (os.path.join() combines the path and file name) using OS Walk traverses the folder and uses if to determine that the string ends with "PDF".

def getFileName(filedir):

    file_list = [os.path.join(root, filespath) \
                 for root, dirs, files in os.walk(filedir) \
                 for filespath in files \
                 if str(filespath).endswith('pdf')
                 ]
    return file_list if file_list else []


3. Batch processing - circular statement
for i in range:
Todo (operation)
perhaps
for i in list:
Todo (operation)

2, Batch Word to PDF
docx2pdf module is used, which is concise and less error reporting than win32. The path input adopts the input method, which basically ensures that the path will not be escaped, and there is no need to enter "\".
map function and lambda custom function generate the file name of absolute path ` #Word to PDF

#pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple/ docx2pdf

from docx2pdf import convert
import os

#Converted file path
director = input("Please enter the file path to convert")
FileList = map(lambda x: director + '\\' + x, os.listdir(director))
for file in FileList:
    try:
        if file.endswith(".docx")or file.endswith(".doc"):
            print(file)
            convert(file, f"{file.split('.')[0]}.pdf")
    except:
                print('could not convert')
    print ("finsh")


Reference link: https://blog.csdn.net/cqcre/article/details/107218349

3, Insert blank PDF for odd pages
Code source: use python to process PDF: add a blank page at the end of odd page pdf
The source of the code is unknown, but the file path is modified through input.

#Odd page PDF insert blank page
import os,PyPDF2,pyperclip
pathofcwd = input("Please enter the to process PDF File path for")
# ^pdf storage location to be processed
class pdfReader:
    # ^A class dealing with pdf puts all the code related to pdf processing here
    blankPdfPath = input("Please enter blank PDF File path for")
    # ^Blank page pdf storage location
    def __init__(self,pdfPath):
        self.pdfPath = pdfPath
        self.blankPageFile, self.blankPage = self.openAndReadit(self.blankPdfPath)
        self.pdfFile, self.pdfReader = self.openAndReadit(self.pdfPath)
    
    def openAndReadit(self,pdfpath):
        """
        generate the pdfReader object for given path in parameter
        """
        pdfFile = open(pdfpath, 'rb')
        pdfReader = PyPDF2.PdfFileReader(pdfFile)
        return (pdfFile,pdfReader)

    def appendBlank(self):
        """
        no para, return a pdf writer with blankPage appended
        """
        pdfWriter = PyPDF2.PdfFileWriter()
        for pageNum in range(self.pdfReader.numPages):
            pageObj = self.pdfReader.getPage(pageNum)
            pdfWriter.addPage(pageObj)
        # add the blank page:
        pdfWriter.addPage(self.blankPage.getPage(0))
        return pdfWriter
    
    def closeAllFile(self):
        self.blankPageFile.close()
        self.pdfFile.close()

os.chdir(pathofcwd)
fileList = os.listdir()

pdfList = filter(
    lambda e:os.path.splitext(e)[1]=='.pdf',
    fileList
)
# ^Filter pdf files, keep list only

pdfReaderList = map(
    lambda e:pdfReader(e),
    pdfList
)
# ^Generate pdfReader class according to pdf file path

pdfReaderList = filter(
    lambda e: e.pdfReader.numPages % 2 == 1,
    pdfReaderList
)
# ^pdfReader class that retains only odd pages of pdf

pdfReaderList = list(pdfReaderList)

for pdfReader in pdfReaderList:
    pdfAddBlankWriter = pdfReader.appendBlank()
    outputPath = os.path.splitext(pdfReader.pdfPath)[0]+'_addBlank'+'.pdf'
    pdfOutputFile = open(outputPath,'wb')
    pdfAddBlankWriter.write(pdfOutputFile)
    pdfOutputFile.close()
    pdfReader.closeAllFile()
    print("preparing to output as:%s" % outputPath)


4, Merge PDF
Only the file path is modified through input.
Code source: combined PDF file of Python

# -*- coding:utf-8-*-
# Use PyPDF2 module to merge all PDF files in the same folder
# Just modify the folder variable where the PDF file is stored: file_dir and output file name variables: outfile

import os
from PyPDF2 import PdfFileReader, PdfFileWriter
import time

# Use the walk function of os module to search all PDF files in the specified directory
# Get the absolute path of all PDF files in the same directory
def getFileName(filedir):

    file_list = [os.path.join(root, filespath) \
                 for root, dirs, files in os.walk(filedir) \
                 for filespath in files \
                 if str(filespath).endswith('pdf')
                 ]
    return file_list if file_list else []

# Merge all PDF files in the same directory
def MergePDF(filepath, outfile):

    output = PdfFileWriter()
    outputPages = 0
    pdf_fileName = getFileName(filepath)

    if pdf_fileName:
        for pdf_file in pdf_fileName:
            print("route:%s"%pdf_file)

            # Read source PDF file
            input = PdfFileReader(open(pdf_file, "rb"))

            # Gets the total number of pages in the source PDF file
            pageCount = input.getNumPages()
            outputPages += pageCount
            print("the number of pages:%d"%pageCount)

            # Add page to output respectively
            #You can select pages through range
            for iPage in range(pageCount):
                output.addPage(input.getPage(iPage))

        print("Total pages after consolidation:%d."%outputPages)
        # Write to target PDF file
        outputStream = open(os.path.join(filepath, outfile), "wb")
        output.write(outputStream)
        outputStream.close()
        print("PDF File merge complete!")

    else:
        print("There is nothing to merge PDF File!")

# Main function
def main():
    time1 = time.time()
    
    file_dir = input("Please enter storage PDF Original folder path of") # The original folder where the PDF is stored is the only place to modify
    outfile = input("The output file name is") # The name of the exported PDF file
    MergePDF(file_dir, outfile)
    time2 = time.time()
    print('Total time:%s s.' %(time2 - time1))

main()


Code source
[1]Word to PDF
[2] Using python to process PDF: add a blank page at the end of odd page pdf
[3] Combined PDF file of Python
--------
Copyright notice: This article is the original article of CSDN blogger "echo silent moan", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/m0_48010654/article/details/112605971

Added by john_6767 on Thu, 17 Feb 2022 17:17:02 +0200