Python 3 advanced office automation reading and writing Excel
Click here to learn the following while watching the video explanation
We often need to import data into excel, export data from excel, and process data in Excel.
If the amount of data to be processed is large, manual operation is very time-consuming.
We can automate Excel data processing through Python programs, which can help us save a lot of time.
Read data from Excel
Click here to learn the following while watching the video explanation
If we just want to} read the data in Excel files for processing, we can use} xlrd} this library.
First, we install the xlrd library and execute the following command
pip install xlrd==1.2.0
Note: the new version of xlrd only supports xls format, so we specify to install 1.2 0 old version, can support xlsx format.
Please Click here to download the Excel file include xlsx
There are three forms in this document, which record the monthly income in 2018, 2017 and 2016 respectively, as shown below
If we want to use the program to calculate the sum of all monthly incomes in 2016, 2017 and 2018, but do not include those months marked with an asterisk.
How do you do it?
Step by step, we first learn how to read the contents of Excel cells with a Python program.
Open in xlrd Library_ The workbook function opens an Excel file and returns a workbook object that represents the open Excel file.
You can get a lot of information about the Excel file through this Book object, such as the number of sheets in the Excel file and the names of all sheets.
We can use the following code to read the number and name of forms in the file:
import xlrd book = xlrd.open_workbook("income.xlsx") print(f"Number of forms included {book.nsheets}") print(f"The names of the forms are: {book.sheet_names()}")
To read the data in the cells of a form, you must first obtain the} sheet object.
You can obtain form objects according to the form index or form name, using the following corresponding methods
# The form index starts from 0 and gets the first form object book.sheet_by_index(0) # Gets the form object named 2018 book.sheet_by_name('2018') # Get all the form objects, put them into a list and return book.sheets()
Description of all properties and methods of the form object. You can Click here to view the official documents
After obtaining the form object, you can get the following information according to its properties:
Number of form rows( nrows) Number of columns( ncols) Form name( name) Form index( number)
as follows
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_index(0) print(f"Form name:{sheet.name} ") print(f"Form index:{sheet.number}") print(f"Number of form lines:{sheet.nrows}") print(f"Number of form columns:{sheet.ncols}")
After obtaining the form object, you can use the {cell_value} method. The parameters are row number and column number to read the text content in the specified cell. It is as follows:
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_index(0) # Row and column numbers are calculated from 0 print(f"Cell A1 The content is: {sheet.cell_value(rowx=0, colx=0)}")
Operation result output
Cell A1 The content is: month
You can also use the # row_values # method with the line number as the parameter to read the contents of all cells in the specified line, store them in a list and return them.
As follows:
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_index(0) # Row and column numbers are calculated from 0 print(f"The first line is: {sheet.row_values(rowx=0)}")
Operation result output
The first line is: ['month', 'income']
You can also use the # col_values # method with the parameter column number to read the contents of all cells in the specified column, store them in a list and return them.
As follows:
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_index(0) # Row and column numbers are calculated from 0 print(f"The first column is: {sheet.col_values(colx=0)}")
Operation result output
The first column is: ['month', 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]
As you can see, the number is returned as a decimal.
With these methods, we can complete some data processing tasks. For example, we need to calculate the annual income in 2017
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_name('2017') # Revenue is in column 2 incomes = sheet.col_values(colx=1,start_rowx=1) print(f"2017 The annual income is: {sum(incomes)}")
So how can we remove the monthly income with asterisk from the summary income?
We need to find out which months are marked with asterisks. Don't count them.
Refer to the following code
import xlrd book = xlrd.open_workbook("income.xlsx") sheet = book.sheet_by_name('2017') # Revenue is in column 2 incomes = sheet.col_values(colx=1,start_rowx=1) print(f"2017 The annual book income is: {int(sum(incomes))}") # Remove the monthly income with asterisk toSubstract = 0 # Month in column 1 monthes = sheet.col_values(colx=0) for row,month in enumerate(monthes): if type(month) is str and month.endswith('*'): income = sheet.cell_value(row,1) print(month,income) toSubstract += income print(f"2017 The annual real income is: {int(sum(incomes)- toSubstract)}")
Finally, to get the income for 3 years, you need to get all the sheet objects, use the above calculation method, and finally add the income.
As follows:
import xlrd book = xlrd.open_workbook("income.xlsx") # Get all sheet objects sheets = book.sheets() incomeOf3years = 0 for sheet in sheets: # Revenue is in column 2 incomes = sheet.col_values(colx=1,start_rowx=1) # Remove the monthly income with asterisk toSubstract = 0 # Month in column 1 monthes = sheet.col_values(colx=0) for row,month in enumerate(monthes): if type(month) is str and month.endswith('*'): income = sheet.cell_value(row,1) print(month,income) toSubstract += income actualIncome = int(sum(incomes)- toSubstract) print(f"{sheet.name}The annual real income is: {actualIncome}") incomeOf3years += actualIncome print(f'Total income is{incomeOf3years}')
Create a new Excel and write data
Click here to learn the following while watching the video explanation
xlrd , can only read Excel content. If you want to , create a new Excel and , write , data, you can use the , openpyxl , library.
openpyxl} library can read, write and modify files.
However, the openpyxl library does not support EXCEL documents in xls format in the old version of office 2003. If you want to read and write xls format documents, you can use Excel for corresponding format conversion.
Execute pip install openpyxl to install the library
Click here to view the openpyxl reference document
The following code demonstrates some basic usage of openpyxl.
import openpyxl # Create an Excel workbook object book = openpyxl.Workbook() # When creating, a sheet will be automatically generated and obtained through active sh = book.active # Modify the current sheet title to payroll sh.title = 'payroll' # Save file book.save('information.xlsx') # Add a sheet called 'age table' and put it at the end sh1 = book.create_sheet('Age table-last') # Add a sheet and put it at the top sh2 = book.create_sheet('Age table-First',0) # Add a sheet and specify it as the second form sh3 = book.create_sheet('Age table 2',1) # Get a sheet object by name sh = book['payroll'] # Write content to the first cell sh['A1'] = 'Hello' # Get the contents of a cell print(sh['A1'].value) # Write content to the first cell according to the row number and column number, # Note that unlike xlrd, it starts with 1 sh.cell(2,2).value = 'White moon black feather' # Get the content of a cell according to the row number and column number print(sh.cell(1, 1).value) book.save('information.xlsx')
The following example code writes the contents of the age table saved in the dictionary to the excel file
import openpyxl name2Age = { 'Fei Zhang' : 38, 'Zhao Yun' : 27, 'Xu Chu' : 36, 'Dianwei' : 38, 'Guan Yu' : 39, 'Huang Zhong' : 49, 'Xu Huang' : 43, 'ma chao' : 23, } # Create an Excel workbook object book = openpyxl.Workbook() # When creating, a sheet will be automatically generated and obtained through active sh = book.active sh.title = 'Age table' # Write title block sh['A1'] = 'full name' sh['B1'] = 'Age' # Write content row = 2 for name,age in name2Age.items(): sh.cell(row, 1).value = name sh.cell(row, 2).value = age row += 1 # Save file book.save('information.xlsx')
If your data is in a list or tuple, you can use the append method to add a new row at the end of the sheet and write data, such as
import openpyxl name2Age = [ ['Fei Zhang' , 38 ] , ['Zhao Yun' , 27 ] , ['Xu Chu' , 36 ] , ['Dianwei' , 38 ] , ['Guan Yu' , 39 ] , ['Huang Zhong' , 49 ] , ['Xu Huang' , 43 ] , ['ma chao' , 23 ] ] # Create an Excel workbook object book = openpyxl.Workbook() sh = book.active sh.title = 'Age table' # Write title block sh['A1'] = 'full name' sh['B1'] = 'Age' for row in name2Age: # Data added to the next row sh.append(row) # Save file book.save('information.xlsx')
Modify data in Excel
Click here to learn the following while watching the video explanation
If you want to {modify an existing Excel file, you can also use the} openpyxl} library.
Modify cell contents
such as
import openpyxl # Load excel file wb = openpyxl.load_workbook('income.xlsx') # Get sheet object sheet = wb['2017'] sheet['A1'] = 'Modify it' ## Specify a different file name and save as another file wb.save('income-1.xlsx')
Insert row, insert column
The insert_rows , and insert_cols , methods of the sheet object are used to insert , rows , and , columns, respectively. For example
import openpyxl wb = openpyxl.load_workbook('income.xlsx') sheet = wb['2018'] # Insert line 1 at line 2 sheet.insert_rows(2) # Insert line 3 at line 3 sheet.insert_rows(3,3) # Insert column 1 at the position of column 2 sheet.insert_cols(2) # Insert 3 columns at the position of column 2 sheet.insert_cols(2,3) ## Specify a different file name and save as another file wb.save('income-1.xlsx')
Delete row, delete column
Delete of sheet object_ Rows and delete_ The cols , method is used to delete , rows , and , columns respectively, for example
import openpyxl wb = openpyxl.load_workbook('income.xlsx') sheet = wb['2018'] # Delete line 1 at the position of line 2 sheet.delete_rows(2) # Delete line 3 at the position of line 3 sheet.delete_rows(3,3) # Delete column 1 at the position of column 2 sheet.delete_cols(2) # Delete column 3 at the position of column 3 sheet.delete_cols(3,3) ## Specify a different file name and save as another file wb.save('income-1.xlsx')
Text color, font and size
The {Font} style (including color, Font, size, underline, etc.) in the cell is set through the} Font} object
import openpyxl # Import Font objects and colors color constants from openpyxl.styles import Font,colors wb = openpyxl.load_workbook('income.xlsx') sheet = wb['2018'] # Specify the cell font color, sheet['A1'].font = Font(color=colors.RED, #Use preset color constants size=15, # Set text size bold=True, # Set to bold italic=True # Set to Italic ) # You can also use colors represented by RGB numbers sheet['B1'].font = Font(color="981818") # Specify the font style of the whole line. Here, the third line is specified font = Font(color="981818") for y in range(1, 100): # Columns 1 to 100 sheet.cell(row=3, column=y).font = font # Specifies the font style of the whole column. Here, the second column is specified font = Font(bold=True) for x in range(1, 100): # Lines 1 to 100 sheet.cell(row=x, column=2).font = font wb.save('income-1.xlsx')
Background color
import openpyxl # Import Font objects and colors color constants from openpyxl.styles import PatternFill wb = openpyxl.load_workbook('income.xlsx') sheet = wb['2018'] # Specifies the background color of a cell sheet['A1'].fill = PatternFill("solid", "E39191") # Specifies the background color of the whole line. Here, the second line is specified fill = PatternFill("solid", "E39191") for y in range(1, 100): # Columns 1 to 100 sheet.cell(row=2, column=y).fill = fill wb.save('income-1.xlsx')
Insert picture
Here is the code to insert the picture
import openpyxl from openpyxl.drawing.image import Image wb = openpyxl.load_workbook('income.xlsx') sheet = wb['2018'] # Insert the picture in line 1, column 4 sheet.add_image(Image('1.png'), 'D1') ## Specify a different file name and save as another file wb.save('income-1.xlsx')
Excel COM interface
On the Windows platform, you can also operate excel through the COM interface of Excel application.
This method is equivalent to using Python program to modify through Excel application. Of course, there are no side effects
And it can realize some special functions, such as automatically printing Excel, merging cells and so on.
COM interface is characterized by fast file opening and slow reading and writing speed.
Opening {Excel files using Excel COM interface is much faster than the above two libraries. Because of the optimization of EXCEL program itself, it can be partially loaded, and the above two libraries are all read into memory first.
If you only read or modify a small amount of data from a large Excel file, the Excel COM interface will be much faster.
However, if you want to read a large amount of data in large Excel, do not use COM interface, it will be very slow.
To use Excel COM interface, first install pywin32 library, and enter the following command in the command line window:
pip install pywin32
For example, you can modify it like this
import win32com.client excel = win32com.client.Dispatch("Excel.Application") # excel.Visible = True # You can make excel visible # Fill in the absolute path of the Excel file to be modified here workbook = excel.Workbooks.Open(r"d:\tmp\income1.xlsx") # Get 2017 form sheet = workbook.Sheets('2017') # Modify the cell contents in the first row and first column of the form # com interface, cell row number and column number start from 1 sheet.Cells(1,1).Value="Hello" # Save content workbook.Save() # Close the Excel file workbook.Close() # excel process exit excel.Quit() # Release related resources sheet = None book = None excel.Quit() excel = None
Run it to find that Excel content can also be modified.
For details on using com interface to operate Excel, you can Click here to refer to Microsoft's official documents
COM interface, xlrd library to open data files and read data performance comparison, you can refer to the following code.
import time def byCom(): t1 = time.time() import win32com.client excel = win32com.client.Dispatch("Excel.Application") # excel.Visible = True # You can make excel visible workbook = excel.Workbooks.Open(r"h:\tmp\ruijia\data.xlsx") sheet = workbook.Sheets(2) print(sheet.Cells(2,15).Value) print(sheet.UsedRange.Rows.Count) #How many lines t2 = time.time() print(f'open: time consuming{t2 - t1}second') total = 0 for row in range(2,sheet.UsedRange.Rows.Count+1): value = sheet.Cells(row,15).Value if type(value) not in [int,float]: continue total += value print(total) t3 = time.time() print(f'Read data: time consuming{t3 - t2}second') def byXlrd(): t1 = time.time() import xlrd # Load excel file srcBook = xlrd.open_workbook("data.xlsx") sheet = srcBook.sheet_by_index(1) print(sheet.cell_value(rowx=1,colx=14)) print(sheet.nrows) #How many lines t2 = time.time() print(f'open: time consuming{t2 - t1}second') total = 0 for row in range(1,sheet.nrows): value = sheet.cell_value(row, 14) if type(value) == str: continue total += value print(total) t3 = time.time() print(f'Read data: time consuming{t3 - t2}second') byCom() byXlrd()