Python 3 advanced office automation reading and writing Excel

Python 3 advanced office automation reading and writing Excel

Click here to learn the following while watching the video explanation

We often need to import data into excel, export data from excel, and process data in Excel.

If the amount of data to be processed is large, manual operation is very time-consuming.

We can automate Excel data processing through Python programs, which can help us save a lot of time.

Read data from Excel

Click here to learn the following while watching the video explanation

If we just want to} read the data in Excel files for processing, we can use} xlrd} this library.

First, we install the xlrd library and execute the following command

pip install xlrd==1.2.0

Note: the new version of xlrd only supports xls format, so we specify to install 1.2 0 old version, can support xlsx format.

Please Click here to download the Excel file include xlsx

There are three forms in this document, which record the monthly income in 2018, 2017 and 2016 respectively, as shown below

If we want to use the program to calculate the sum of all monthly incomes in 2016, 2017 and 2018, but do not include those months marked with an asterisk.

How do you do it?

Step by step, we first learn how to read the contents of Excel cells with a Python program.

Open in xlrd Library_ The workbook function opens an Excel file and returns a workbook object that represents the open Excel file.

You can get a lot of information about the Excel file through this Book object, such as the number of sheets in the Excel file and the names of all sheets.

We can use the following code to read the number and name of forms in the file:

import xlrd

book = xlrd.open_workbook("income.xlsx")

print(f"Number of forms included {book.nsheets}")
print(f"The names of the forms are: {book.sheet_names()}")

To read the data in the cells of a form, you must first obtain the} sheet object.

You can obtain form objects according to the form index or form name, using the following corresponding methods

# The form index starts from 0 and gets the first form object
book.sheet_by_index(0)

# Gets the form object named 2018
book.sheet_by_name('2018')

# Get all the form objects, put them into a list and return
book.sheets()

Description of all properties and methods of the form object. You can Click here to view the official documents

After obtaining the form object, you can get the following information according to its properties:

Number of form rows( nrows)
Number of columns( ncols)
Form name( name)
Form index( number)

as follows

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_index(0)
print(f"Form name:{sheet.name} ")
print(f"Form index:{sheet.number}")
print(f"Number of form lines:{sheet.nrows}")
print(f"Number of form columns:{sheet.ncols}")

After obtaining the form object, you can use the {cell_value} method. The parameters are row number and column number to read the text content in the specified cell. It is as follows:

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_index(0)

# Row and column numbers are calculated from 0
print(f"Cell A1 The content is: {sheet.cell_value(rowx=0, colx=0)}")

Operation result output

Cell A1 The content is: month

You can also use the # row_values # method with the line number as the parameter to read the contents of all cells in the specified line, store them in a list and return them.

As follows:

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_index(0)

# Row and column numbers are calculated from 0
print(f"The first line is: {sheet.row_values(rowx=0)}")

Operation result output

The first line is: ['month', 'income']

You can also use the # col_values # method with the parameter column number to read the contents of all cells in the specified column, store them in a list and return them.

As follows:

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_index(0)

# Row and column numbers are calculated from 0
print(f"The first column is: {sheet.col_values(colx=0)}")

Operation result output

The first column is: ['month', 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]

As you can see, the number is returned as a decimal.

With these methods, we can complete some data processing tasks. For example, we need to calculate the annual income in 2017

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_name('2017')

# Revenue is in column 2
incomes = sheet.col_values(colx=1,start_rowx=1)

print(f"2017 The annual income is: {sum(incomes)}")

So how can we remove the monthly income with asterisk from the summary income?

We need to find out which months are marked with asterisks. Don't count them.

Refer to the following code

import xlrd

book = xlrd.open_workbook("income.xlsx")

sheet = book.sheet_by_name('2017')

# Revenue is in column 2
incomes = sheet.col_values(colx=1,start_rowx=1)

print(f"2017 The annual book income is: {int(sum(incomes))}")

# Remove the monthly income with asterisk
toSubstract = 0
# Month in column 1
monthes = sheet.col_values(colx=0)

for row,month in enumerate(monthes):
    if type(month) is str and month.endswith('*'):
        income = sheet.cell_value(row,1)
        print(month,income)
        toSubstract += income

print(f"2017 The annual real income is: {int(sum(incomes)- toSubstract)}")

Finally, to get the income for 3 years, you need to get all the sheet objects, use the above calculation method, and finally add the income.

As follows:

import xlrd

book = xlrd.open_workbook("income.xlsx")

# Get all sheet objects
sheets = book.sheets()

incomeOf3years = 0
for sheet in sheets:
    # Revenue is in column 2
    incomes = sheet.col_values(colx=1,start_rowx=1)
    # Remove the monthly income with asterisk
    toSubstract = 0
    # Month in column 1
    monthes = sheet.col_values(colx=0)

    for row,month in enumerate(monthes):
        if type(month) is str and month.endswith('*'):
            income = sheet.cell_value(row,1)
            print(month,income)
            toSubstract += income

    actualIncome = int(sum(incomes)- toSubstract)
    print(f"{sheet.name}The annual real income is: {actualIncome}")
    incomeOf3years += actualIncome

print(f'Total income is{incomeOf3years}')

Create a new Excel and write data

Click here to learn the following while watching the video explanation

xlrd , can only read Excel content. If you want to , create a new Excel and , write , data, you can use the , openpyxl , library.

openpyxl} library can read, write and modify files.

However, the openpyxl library does not support EXCEL documents in xls format in the old version of office 2003. If you want to read and write xls format documents, you can use Excel for corresponding format conversion.

Execute pip install openpyxl to install the library

Click here to view the openpyxl reference document

The following code demonstrates some basic usage of openpyxl.

import openpyxl

# Create an Excel workbook object
book = openpyxl.Workbook()

# When creating, a sheet will be automatically generated and obtained through active
sh = book.active

# Modify the current sheet title to payroll
sh.title = 'payroll'

# Save file
book.save('information.xlsx')

# Add a sheet called 'age table' and put it at the end
sh1 = book.create_sheet('Age table-last')

# Add a sheet and put it at the top
sh2 = book.create_sheet('Age table-First',0)

# Add a sheet and specify it as the second form
sh3 = book.create_sheet('Age table 2',1)

# Get a sheet object by name
sh = book['payroll']

# Write content to the first cell
sh['A1'] = 'Hello'

# Get the contents of a cell
print(sh['A1'].value)

# Write content to the first cell according to the row number and column number,
# Note that unlike xlrd, it starts with 1
sh.cell(2,2).value = 'White moon black feather'

# Get the content of a cell according to the row number and column number
print(sh.cell(1, 1).value)

book.save('information.xlsx')

The following example code writes the contents of the age table saved in the dictionary to the excel file

import openpyxl

name2Age = {
    'Fei Zhang' :  38,
    'Zhao Yun' :  27,
    'Xu Chu' :  36,
    'Dianwei' :  38,
    'Guan Yu' :  39,
    'Huang Zhong' :  49,
    'Xu Huang' :  43,
    'ma chao' :  23,
}

# Create an Excel workbook object
book = openpyxl.Workbook()

# When creating, a sheet will be automatically generated and obtained through active
sh = book.active

sh.title = 'Age table'

# Write title block
sh['A1'] =  'full name'
sh['B1'] =  'Age'

# Write content
row = 2

for name,age in name2Age.items():
    sh.cell(row, 1).value = name
    sh.cell(row, 2).value = age
    row += 1

# Save file
book.save('information.xlsx')

If your data is in a list or tuple, you can use the append method to add a new row at the end of the sheet and write data, such as

import openpyxl

name2Age = [
    ['Fei Zhang' ,  38 ] ,
    ['Zhao Yun' ,  27 ] ,
    ['Xu Chu' ,  36 ] ,
    ['Dianwei' ,  38 ] ,
    ['Guan Yu' ,  39 ] ,
    ['Huang Zhong' ,  49 ] ,
    ['Xu Huang' ,  43 ] ,
    ['ma chao' ,  23 ]
]

# Create an Excel workbook object
book = openpyxl.Workbook()
sh = book.active
sh.title = 'Age table'

# Write title block
sh['A1'] =  'full name'
sh['B1'] =  'Age'

for row in name2Age:
    # Data added to the next row
    sh.append(row)

# Save file
book.save('information.xlsx')

Modify data in Excel

Click here to learn the following while watching the video explanation

If you want to {modify an existing Excel file, you can also use the} openpyxl} library.

Modify cell contents

such as

import openpyxl

# Load excel file
wb = openpyxl.load_workbook('income.xlsx')

# Get sheet object
sheet = wb['2017']

sheet['A1'] = 'Modify it'

## Specify a different file name and save as another file
wb.save('income-1.xlsx')

Insert row, insert column

The insert_rows , and insert_cols , methods of the sheet object are used to insert , rows , and , columns, respectively. For example

import openpyxl

wb = openpyxl.load_workbook('income.xlsx')
sheet = wb['2018']

# Insert line 1 at line 2
sheet.insert_rows(2)

# Insert line 3 at line 3
sheet.insert_rows(3,3)

# Insert column 1 at the position of column 2
sheet.insert_cols(2)

# Insert 3 columns at the position of column 2
sheet.insert_cols(2,3)

## Specify a different file name and save as another file
wb.save('income-1.xlsx')

Delete row, delete column

Delete of sheet object_ Rows and delete_ The cols , method is used to delete , rows , and , columns respectively, for example

import openpyxl

wb = openpyxl.load_workbook('income.xlsx')
sheet = wb['2018']

# Delete line 1 at the position of line 2
sheet.delete_rows(2)

# Delete line 3 at the position of line 3
sheet.delete_rows(3,3)

# Delete column 1 at the position of column 2
sheet.delete_cols(2)

# Delete column 3 at the position of column 3
sheet.delete_cols(3,3)

## Specify a different file name and save as another file
wb.save('income-1.xlsx')

Text color, font and size

The {Font} style (including color, Font, size, underline, etc.) in the cell is set through the} Font} object

import openpyxl
# Import Font objects and colors color constants
from openpyxl.styles import Font,colors

wb = openpyxl.load_workbook('income.xlsx')
sheet = wb['2018']

# Specify the cell font color,
sheet['A1'].font = Font(color=colors.RED, #Use preset color constants
                        size=15,    # Set text size
                        bold=True,  # Set to bold
                        italic=True # Set to Italic
                        )

# You can also use colors represented by RGB numbers
sheet['B1'].font = Font(color="981818")

# Specify the font style of the whole line. Here, the third line is specified
font = Font(color="981818")
for y in range(1, 100): # Columns 1 to 100
    sheet.cell(row=3, column=y).font = font

# Specifies the font style of the whole column. Here, the second column is specified
font = Font(bold=True)
for x in range(1, 100): # Lines 1 to 100
    sheet.cell(row=x, column=2).font = font

wb.save('income-1.xlsx')

Background color

import openpyxl
# Import Font objects and colors color constants
from openpyxl.styles import PatternFill

wb = openpyxl.load_workbook('income.xlsx')
sheet = wb['2018']

# Specifies the background color of a cell
sheet['A1'].fill = PatternFill("solid", "E39191")

# Specifies the background color of the whole line. Here, the second line is specified
fill = PatternFill("solid", "E39191")
for y in range(1, 100): # Columns 1 to 100
    sheet.cell(row=2, column=y).fill = fill

wb.save('income-1.xlsx')

Insert picture

Here is the code to insert the picture

import openpyxl
from openpyxl.drawing.image import Image

wb = openpyxl.load_workbook('income.xlsx')
sheet = wb['2018']

# Insert the picture in line 1, column 4
sheet.add_image(Image('1.png'), 'D1')

## Specify a different file name and save as another file
wb.save('income-1.xlsx')

Excel COM interface

On the Windows platform, you can also operate excel through the COM interface of Excel application.

This method is equivalent to using Python program to modify through Excel application. Of course, there are no side effects

And it can realize some special functions, such as automatically printing Excel, merging cells and so on.

COM interface is characterized by fast file opening and slow reading and writing speed.

Opening {Excel files using Excel COM interface is much faster than the above two libraries. Because of the optimization of EXCEL program itself, it can be partially loaded, and the above two libraries are all read into memory first.

If you only read or modify a small amount of data from a large Excel file, the Excel COM interface will be much faster.

However, if you want to read a large amount of data in large Excel, do not use COM interface, it will be very slow.

To use Excel COM interface, first install pywin32 library, and enter the following command in the command line window:

pip install pywin32

For example, you can modify it like this

import win32com.client
excel = win32com.client.Dispatch("Excel.Application")

# excel.Visible = True     # You can make excel visible

# Fill in the absolute path of the Excel file to be modified here
workbook = excel.Workbooks.Open(r"d:\tmp\income1.xlsx")

# Get 2017 form
sheet = workbook.Sheets('2017')

# Modify the cell contents in the first row and first column of the form
# com interface, cell row number and column number start from 1
sheet.Cells(1,1).Value="Hello"

# Save content
workbook.Save()

# Close the Excel file
workbook.Close()

# excel process exit
excel.Quit()

# Release related resources
sheet = None
book = None
excel.Quit()
excel = None

Run it to find that Excel content can also be modified.

For details on using com interface to operate Excel, you can Click here to refer to Microsoft's official documents

COM interface, xlrd library to open data files and read data performance comparison, you can refer to the following code.

import time

def byCom():
    t1 = time.time()
    import win32com.client
    excel = win32com.client.Dispatch("Excel.Application")

    # excel.Visible = True     # You can make excel visible
    workbook = excel.Workbooks.Open(r"h:\tmp\ruijia\data.xlsx")

    sheet = workbook.Sheets(2)

    print(sheet.Cells(2,15).Value)
    print(sheet.UsedRange.Rows.Count)  #How many lines

    t2 = time.time()
    print(f'open: time consuming{t2 - t1}second')

    total = 0
    for row in range(2,sheet.UsedRange.Rows.Count+1):
        value = sheet.Cells(row,15).Value
        if type(value) not in [int,float]:
            continue
        total += value

    print(total)

    t3 = time.time()
    print(f'Read data: time consuming{t3 - t2}second')


def byXlrd():
    t1 = time.time()
    import xlrd

    # Load excel file
    srcBook = xlrd.open_workbook("data.xlsx")
    sheet = srcBook.sheet_by_index(1)

    print(sheet.cell_value(rowx=1,colx=14))
    print(sheet.nrows) #How many lines

    t2 = time.time()
    print(f'open: time consuming{t2 - t1}second')

    total = 0
    for row in range(1,sheet.nrows):
        value = sheet.cell_value(row, 14)
        if type(value) == str:
            continue
        total += value

    print(total)

    t3 = time.time()
    print(f'Read data: time consuming{t3 - t2}second')

byCom()
byXlrd()

 

Keywords: Python

Added by rostislav on Thu, 09 Dec 2021 14:56:28 +0200