Break up. I have Python

preface

More and more developers say that Excel has not been opened since Python / panda was used. It is four words to process and visualize tables in Python - very fast!

Let me give some obvious examples

1. Delete duplicate lines and blank lines

We directly use dict.fromkeys to convert the current data into a dictionary. The default value is None. It doesn't matter if we can't use it. Then we use list to directly type the result and convert it to list.

In [135]:
for row in rows4:
    print(row)
('name', 'address')
('tom li', 'beijing')
('tom li', 'beijing')
('',)
('mary wang', 'shandong')
('mary wang', 'shandong')
('',)
('de8ug', 'guangzhou')
In [148]:
dict.fromkeys(rows4)
Out[148]:
{('name', 'address'): None,
 ('tom li', 'beijing'): None,
 ('',): None,
 ('mary wang', 'shandong'): None,
 ('de8ug', 'guangzhou'): None}
In [137]:
list(dict.fromkeys(rows4))
Out[137]:
[('name', 'address'),
 ('tom li', 'beijing'),
 ('',),
 ('mary wang', 'shandong'),
 ('de8ug', 'guangzhou')]

At this time, the duplicate data is directly removed. Note that the dict here is a new version of Python 3, so the order has no effect. If you are still using Python 2 or Python 3 Below 5, it is recommended to upgrade the python version.

Next, the processing of empty data. Observation ('',) is a tuple. If the data at the first position is an empty string, the overall length is 1, which can be removed directly through the loop. We can use the syntax sugar in Python to write the loop here. We can do it directly in one line. Finally, we add a judgment, leaving only a length greater than 1. Finally, we can use list to convert it into list.

In [179]:
list(x for x in dict.fromkeys(rows4) if len(x[0])>1)
Out[179]:
[('name', 'address'),
 ('tom li', 'beijing'),
 ('mary wang', 'shandong'),
 ('de8ug', 'guangzhou')]

After the above research is done, directly put the research results into the function to solve the problem of repeated lines and empty lines.

Note the row data we process at this time, so we don't cycle by column anymore. Moreover, after processing in the current sheet, the position of each line will be modified or deleted. So let's use old first\_ Rows = [x for X in sheet. Values] get the data of each old row. Note that the values are directly used to get the data after the sheet here, not the cell object. Old here\_ Rows is a list, so you can use the research just now to directly delete duplicate and empty rows of data.

Next, use sheet delete\_ rows(1, sheet.max\_row)
Delete all rows. The first parameter indicates starting from the first row, and the second parameter indicates the maximum number of rows. Finally, write the new data into the current sheet by looping the new row data.

In [189]:
def handle_duplicate(wb, sheetname):
    """
    Remove duplicate lines, empty lines
    First take out each line and empty it sheet,Write back after processing
    """
    print(f'Start processing worksheet:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    old_rows = [x for x in sheet.values]
    print('Before modification:', old_rows)
    new_rows = list(x for x in dict.fromkeys(old_rows) if len(x[0])>1)
    print('After modification->>', new_rows)

    # Delete all rows
    sheet.delete_rows(1, sheet.max_row)
    # Write new data
    for row in new_rows:
        sheet.append(row)

Run the test and view the results. Again, remember the test! If there are errors, check the code according to the error prompt, debug repeatedly and remove bugs.

In [190]:
wb = load_data()
handle_duplicate(wb, 'Repeat line')
save_as(wb)

2. Delete spaces

Deleting spaces also requires string functions, so here's a simple study. If we want to remove the space in the middle of the string, we can use split to split by default, and then use '' for the split result Just connect the join method. Note that the join is preceded by an empty string. There is no need to use strip to remove the spaces at both ends, because after split, there is only a list composed of several last strings.

In [192]:
a="a b c   "
In [194]:
a.strip()
Out[194]:
'a b c'
In [195]:
a.split()
Out[195]:
['a', 'b', 'c']
In [196]:
''.join(a.split())
Out[196]:
'abc'
In [ ]:

After successful research, write the function. This time it's called handle\_blank.

In [197]:
def handle_blank(wb, sheetname):
    """
    Cycle by column, Confirm the target through parameters
    """
    print(f'Start processing worksheet:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols():  # Loop all columns without parameters
        for cell in col:
            print('Before modification:', cell.value, end='')
            cell.value = ''.join(cell.value.split())
            print('After modification->>',cell.value)
In [198]:
handle_blank(wb, 'Space')

3. Modify date and time format

Sometimes, we need to modify the format of time-related cells in the table. Here, we need to use the time module datetime in Python to splice the required format and convert it with strftime.

Suppose we want to change the previous simple format of January / November day to the style of month, year and day, and add the separator / or -, in the middle, we need to use "% x" or "% Y-%m-%d" for operation. Note that the% plus letters here are only officially defined formats. When we use them, we can splice them and pass them to the function.

More specific splicing formats are as follows:

In [199]:
import datetime
In [209]:
d=datetime.datetime(2019,1,11)
In [203]:
d.strftime("%x")
Out[203]:
'01/11/19'
In [205]:
d.strftime("%Y-%m-%d")
Out[205]:
'2019-01-11'

After the research, we write the function.

First, you need to use M, d = cell value. Split ('/') divides the previous simple date to obtain m, which represents the month and date, and then converts it with datetime to generate a time-related object day. Note that the parameters inside are numbers, so use int conversion, and finally format and output the day. After writing the function, be sure to remember to test.

In [218]:
def handle_time(wb, sheetname):
    """
    Cycle by column, Confirm the target through parameters
    """
    print(f'Start processing worksheet:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols(max_col=1, min_row=2):  # Find the time column, the first column, starting from the second row
        for cell in col:
            print('Before modification:', cell.value, end='')
            m, d = cell.value.split('/')
            day = datetime.datetime(2019, int(m), int(d))
            cell.value = day.strftime("%Y-%m-%d")
            print('After modification->>',cell.value)

In [220]:
wb = load_data()
handle_time(wb, 'time')
save_as(wb)

4. Fix numbers and symbols

Next, we deal with operations related to numbers and symbols. Many of the prices before joining us have decimal points. At this time, I want to save two decimal places and prefix them with the RMB symbol. A new wave of research is needed.

There is a Decimal point. First, we need to ensure the number of digits. We require 2 digits here. Second, we need to round off the redundant digits. It can be done in the following two ways: one is Decimal and the other is round. The difference between the two is that Decimal("0.00") will automatically fill in 0 after specifying the number of digits, and round will automatically discard it when it encounters 0. And round is a little special in the calculation of rounding. See the official documents for details.

We use Decimal here to complete the related operations in the function. Remember the test!

In [227]:
from decimal import Decimal
In [240]:
a = 3.1
b=Decimal(a).quantize(Decimal("0.00"))
print(b)
3.10
In [244]:
round(a,2)  # The number of digits is automatically omitted 0
Out[244]:
3.1

In [247]:
def handle_num(wb, sheetname):
    """
    Cycle by column, Confirm the target through parameters
    """
    print(f'Start processing worksheet:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols(min_col=3, max_col=3, min_row=2):  # Find the time column, the first column, starting from the second row
        for cell in col:
            print('Before modification:', cell.value, end='')
#             cell.value = round(float(cell.value), 3)
            cell.value = '¥' + str(Decimal(cell.value).quantize(Decimal("0.00")))
            print('After modification->>',cell.value)
In [249]:
wb = load_data()
handle_num(wb, 'Numeric symbol')
save_as(wb)

last

If the article is helpful to you, remember to praise the author
Next, we will continue to keep up with new articles about Python, pay little attention and don't get lost.

Keywords: Python Excel pandas

Added by SharkBait on Wed, 15 Dec 2021 09:49:32 +0200