Python + openpyxl + copy implements complete (content + format, etc.) copy and paste some contents in one EXCEL into another EXCEL

preface

Application background:
A worksheet needs to be split into multiple worksheets according to certain conditions, and multiple worksheets are stored in a workbook according to certain conditions, and the format, such as color, merged cells, font size, etc., is mainly used to preserve the format when copying the header.
Difficulties:
1. Save data to different worksheets in the same workbook.
2. Copy some contents in excel and keep the corresponding format.
Achieve the goal: perfectly simulate "split a summary table data, screen the responsible person into the corresponding responsible person table, divide each table into two sub tables of inventory and goods preparation, and process the header"

1, Write multiple dataframe s into different working sheets of the same Excel

When multiple dataframe s need to be written to the same excel, df.exe is used each time to_ If you write it in the form of Excel (file name), the system will recreate a new file. This means that the previous file will be overwritten, and you can only get the result file written by the last DF

The above problem can be solved by creating an ExcelWriter object

import pandas as pd

writer = pd.ExcelWriter(os.path.join(os.getcwd(), 'custom.xlsx'))
df1.to_excel(writer, sheet_name='custom sheet_name')#startcol=**, startrow=**)
df2.to_excel(writer, sheet_name='custom sheet_name')#startcol=**, startrow=**)
df3.to_excel(writer, sheet_name='custom sheet_name')#startcol=**, startrow=**)
...
 
 
writer.save()# Write to hard disk

#No sheet_name parameter, multiple dataframe s will be written to the same sheet of the same xlsx file by default

#When startRow and startcol are not transmitted, the default dataframe is in excel. In fact, the rows and columns in sheet1 are one

#It should be noted that the index starting numbers of python and excel are different. If startrow=2 and startcol=4, it is not written from the fourth column of the second row of Excel, but from the fifth column of the third row of Excel
#Be sure to use write Save(), otherwise the generated file has no corresponding content

Original link:

https://blog.csdn.net/weixin_42130167/article/details/89705581

2, Copy and paste some content in Excel completely

When pandas is used for data processing in the above, the output tables are not formatted, and even when reading the data with "merged cells", there will be some difficult to handle null values (based on personal understanding, the value displayed in the merged cells in excel table is the value of the first item, and the others are only null values).
In the actual work, the tables required by the leaders are good-looking formats and some headers, colors, etc. Therefore, pandas is used to completely copy and paste the specific header format.

1. Import module

import easygui as eg
import os
import openpyxl
import copy
import pandas as pd

2. Corresponding column width

The code is as follows:

def colwidth(s0, s1):
    for i, row in enumerate(s0.iter_cols()):
    	# i = chr(i+97).upper()
        i = convertToTitle(i+1)
        lk = s0.column_dimensions[i].width
        if lk == 0:
            lk = 8.38
        s1.column_dimensions[i].width = lk

Where s0 is the excel table to be copied and s1 is the excel table to be pasted. They are all Worksheet objects of openpyxl, that is, Worksheet objects.

The enumerate() function is used to combine a traversable data object (such as list, tuple or string) into an index sequence, and list data and data subscripts at the same time. It is generally used in the for loop.

openpyxl can read all column data of a sheet. In addition to the cols attribute, ITER can also be used_ Cols () method, the return value is the generator. This method can also specify to read rows and columns, and iter_rows is different in that it outputs data by column.

To sum up, enumerate (S0. Iter_columns()) generates A generator for the column corresponding to the subscript. i is initially equal to 0 by default and corresponds to 'column A'. The corresponding number can be converted into letters through chr() (used for the reference of column_dimensions below).
However, chr() cannot generate 'AA' for generating "two digit letters", but the actual excel table may exceed the 'AA' column, which will produce an error report equivalent to the effect of "exceeding the index value", which does not meet the actual requirements.

>>>i = 0 #The subscript corresponding to 'A' in the column generated by the generator is 0
>>>i = chr(i+97).upper()
>>>i
'A'
>>>i = 26
>>>a = chr(i+97).upper()
>>>a
'{'

At this time, it can be seen that when i exceeds 25 (26 letters, so the last item is 25), the corresponding value generated is not 'AA', but the corresponding value in the code of chr().
resolvent:

def convertToTitle(n):
	# N is the added value. At this time, 1 corresponds to 'A', and 0 has no correspondence. Therefore, during application, pay attention to whether the given n should be + 1
    """
    :type n: int
    :rtype: str
    """
    rStr = ""
    while n != 0:
        res = n % 26
        if res == 0:
            res = 26
            n -= 26
        rStr = chr(ord('A') + res - 1) + rStr
        n = n // 26
    return rStr
   # Return letter

At this point, the numerical subscripts correspond to the columns in the excel table one by one.

3. Corresponding row width

The code is as follows:

def rowheight(s0, s1):
    for i in range(1, s0.max_row+1):
        hg = s0.row_dimensions[i].height
        if hg == 0:
            hg = 15
        s1.row_dimensions[i].height = hg

The serial number of a row is composed of pure numbers. There is no problem between the conversion of letters and numbers, but pay attention to the corresponding relationship with the actual number of rows when using range.

4. Copy and paste cell contents

def copycell(s0, s1, min_row=None, max_row=None, min_col=None, max_col=None):
    for i, row in enumerate(s0.iter_rows(min_row=min_row, max_row=max_row, min_col=min_col, max_col=max_col)):
        for j, cell in enumerate(row):
            s1.cell(i+1, j+1, cell.value)
            s1.cell(i+1, j+1).number_format = cell.number_format
            s1.cell(i+1, j+1).alignment = copy.copy(cell.alignment)
            s1.cell(i+1, j+1).font = copy.copy(cell.font)
            s1.cell(i+1, j+1).border = copy.copy(cell.border)
            s1.cell(i+1, j+1).fill = copy.copy(cell.fill)

Where i represents the column, row represents all cells under this column, and cell objects form tuples. List all cells in the row and their corresponding row numbers through enumerate().
iter_rows() and iter_cols() has parameters (min_row=None, max_row=None, min_col=None, max_col=None). Therefore, the area we want to copy and paste can be delineated through parameter settings. For example, if I want to copy the first two lines, I will set max_row=2, so as to achieve the effect of copying and pasting the header.

5. Complete code

import easygui as eg
import os
import openpyxl
import copy
import pandas as pd


# Used for conversion between numbers and letters in excel columns
def convertToTitle(n):
    """
    :type n: int
    :rtype: str
    """
    rStr = ""
    while n != 0:
        res = n % 26
        if res == 0:
            res = 26
            n -= 26
        rStr = chr(ord('A') + res - 1) + rStr
        n = n // 26
    return rStr


# Copy column width
def colwidth(s0, s1):
    for k, co in enumerate(s0.iter_cols()):
        k = convertToTitle(k+1)
        lk = s0.column_dimensions[k].width
        if lk == 0:
            lk = 8.38
        s1.column_dimensions[k].width = lk


# Copy row width
def rowheight(s0, s1):
    for i in range(1, s0.max_row+1):
        hg = s0.row_dimensions[i].height
        if hg == 0:
            hg = 15
        s1.row_dimensions[i].height = hg


# Copy cell
def copycell(s0, s1, min_row=None, max_row=None, min_col=None, max_col=None):
    for i, row in enumerate(s0.iter_rows(min_row=min_row, max_row=max_row, min_col=min_col, max_col=max_col)):
        for j, cell in enumerate(row):
            s1.cell(i+1, j+1, cell.value)
            s1.cell(i+1, j+1).number_format = cell.number_format
            s1.cell(i+1, j+1).alignment = copy.copy(cell.alignment)
            s1.cell(i+1, j+1).font = copy.copy(cell.font)
            s1.cell(i+1, j+1).border = copy.copy(cell.border)
            s1.cell(i+1, j+1).fill = copy.copy(cell.fill)


# Perform replication
def copyfile(sht, sht1):
    colwidth(sht, sht1)
    rowheight(sht, sht1)
    copycell(sht, sht1)
    for merg in sht.merged_cells:
        sht1.merge_cells(str(merg))


# Copy header after data processing
def screen_data(fn):
    df = pd.read_excel(fn, header=1)
    g = os.path.dirname(fn)
    # Put the header to be copied and the table to be processed in the same directory
    header_file = os.path.join(g, 'Product summary_Header.xlsx')
    wb = openpyxl.load_workbook(header_file)
    name_list = list(df['person in charge'].drop_duplicates('first'))
    for name in name_list:
        file_name = os.path.join(g, name + 'Inventory and stock list.xlsx')
        writer = pd.ExcelWriter(file_name)
        # Perform data processing and generate corresponding tables
        for sheet_name in wb.sheetnames:
            header_df = pd.read_excel(header_file, header=1, sheet_name=sheet_name)
            header = header_df.columns
            mid_df = pd.DataFrame(df[df['person in charge'] == name], columns=header)
            mid_df.to_excel(writer, sheet_name=sheet_name, index=False, startrow=1)
        # Be sure to save and close first, otherwise you cannot generate a valid excel sheet
        writer.save()
        writer.close()
        # Copy the header of each worksheet of the corresponding workbook
        for sheet_name in wb.sheetnames:
            ws = wb[sheet_name]
            wb1 = openpyxl.load_workbook(file_name)
            ws1 = wb1[sheet_name]
            copyfile(ws, ws1)
            wb1.save(file_name)


if __name__ == '__main__':
    ff = eg.fileopenbox('Select file', 'Select file')
    screen_data(ff)

3, Summary

This blog post should provide some help in solving problems such as "how python saves multiple dataframes to different worksheets" and "how python copies and pastes the format in an excel table to another excel table", Or directly applicable (reaching out to the party) code. Because in the long-term process of self-study and self-use, I know the importance of Baidu and the mentality of looking for information on the Internet (it's best to apply it directly). Most of the time, you don't need to understand the principle. You just need to solve the current problem. Therefore, there are separate descriptions for key function points, hoping to help readers solve the problem.
For my non professional IT staff, various parameter adjustments and application problems in their actual work can be put forward in the comments and discussed and solved together, but I may not be able to solve them. Please forgive me.
This is the first time to write a formal blog post. If there are deficiencies, you can point out and correct them. Thank you.

Keywords: Python Excel

Added by benreisner on Sun, 19 Dec 2021 13:16:53 +0200