Fundamentals of python: beginners learn how to read pandas--DataFrame data by themselves

       

catalogue

1, Directly read the subscript with square brackets [], DataFrame [column label] [row label]

2, Use dataframe LOC [row label, column label] read according to the row / column label signature index

3, Use dataframe Iloc [row sequence, column sequence] read by the row / column number

4, Dataframe ix[]

I'm learning python by myself. The DataFrame data type of pandas module is a powerful tool for processing table data. The following is my preliminary study and Exploration on the method of reading the specified data in DataFrame.

1, Directly read the subscript with square brackets [], DataFrame [column label] [row label]

1. Read the whole column of data with only one []:

When the DataFrame data does not define the character string column label name, only the serial number can be used to read the data of the corresponding column.

When the DataFrame data defines the character string column label name, only the column label signature can be used to read the data of the corresponding column.

Read multi column data with list in [].

When Ps: [] is a digital slice, the whole row of data in the corresponding range is read according to the row sequence, regardless of whether the DataFrame data has column label signature or not, and the principle of left opening and right closing is followed.

2. Use two [], the first [] is the column label to be selected, and the second [] is the row label to be selected:

Whether the DataFrame data defines a string line signature or not, the second [] can represent the corresponding line with a digital sequence.

The second [] can be selected by serial number or line signature slice. The serial number slice is open on the left and closed on the right, and the line signature slice is included on the left and right.

Cannot read multi column single / multi row data.

After the DataFrame data is reordered, the serial number used for the second [] is still the original index, not the new line number.

The following code examples are df with row and column labels defined and df1 without row and column labels defined:

import pandas as pd

df = pd.DataFrame(
    [[0, 1, 2, 3, 4],
     [10, 11, 12, 13, 14],
     [20, 21, 22, 23, 24],
     [30, 31, 32, 33, 34],
     [40, 41, 42, 43, 44]],
    columns=list('ACBDE'),
    index=list('abcde')
        )

df1 = pd.DataFrame(
    [[0, 1, 2, 3, 4],
     [10, 11, 12, 13, 14],
     [20, 21, 22, 23, 24],
     [30, 31, 32, 33, 34],
     [40, 41, 42, 43, 44]]
        )


print(df['A'])        # Read column A data
print(df['A':'C'])    # Returns an empty DataFrame. Slicing cannot be used when DataFrame data has column labels
print(df[['A', 'C']]) # Read the data of columns A and C
print(df[['A', 'B', 'C']]) # Read the data of columns A, B and C, and rearrange the column order according to the list order

print(df1[0])         # When the DataFrame data has no column label, the data in column 1 is read
print(df[0:3])       # Digital slices are read in line sequence
print(df1[0:3])       # Digital slices are read in line sequence
print(df1[[0, 2]])    # Read the data of columns 1 and 3
print(df1[[0, 2, 1]])    # Read the data of the first, third and second columns, and rearrange the column order according to the list order

print(df['A'][0])     # Read the data in the first row of column a, that is, the data in row a of column a
print(df['A']['a'])   # Read the data in row a of column a, that is, the data in the first row of column a

print(df['A']['a':'c'])   # Read the data from row a to row c in column a, and the row signature can be sliced, both left and right
print(df['A'][0:3])   # Read the data of the first to third rows of column A, open on the left and close on the right
print(df['A'][[0, 3]])  # Read the first and third rows of column A

print(df[['A', 'D']]['a'])  # KeyError: 'a', cannot read multi column single row
print(df[['A', 'D']][0])    # KeyError: 0, cannot read multi column single row

print(df['A':'B'][0:3])     # Returns an empty DataFrame. Column label signature cannot use slicing
print(df1[0:3][0:4])    # Read the first to three rows of data, and the second [] is invalid

2, Use dataframe LOC [row label, column label] read according to the row / column label signature index

Comma separated row and column labels.

When a row or column label signature is defined, only the label name can be used.

When no row or column signature is defined, the serial number is used.

When the column label defaults, the whole row of data is read, and the row label cannot default.

Row / column labels can use lists and slices to read data in the selected range.

When using slices, both tag name slices and serial number slices are included on the left and right.

Continue to use the above two DataFrame data for code examples.

print(df1.loc[0])   # Read entire row of data
print(df.loc['a'])  # Read entire row of data
print(df.loc[['a', 'c', 'b']])    # Read multiple rows of data with a list, and the row order is rearranged according to the list order

print(df.loc[['a', 'c'], ['A', 'B']])  # Use the list to read the data in the selected range
print(df.loc['a':'c', ['A', 'B']])    # Reads the data in the selected range with the tag name slice and list
print(df.loc['a':'c', 'A':'B'])     # You can use the tag name slice to read the data in the selected range, both left and right.
print(df.loc['c':'a':-2, 'A':'B'])     # Slices can be set in steps.

print(df1.loc[0:2, 1:3])  # Slicing with serial number is also included on the left and right.

3, Use dataframe Iloc [row sequence, column sequence] read by the row / column number

Comma separates row sequence and column sequence. You can only use sequence number, not tag name.

Usage is the same as dataframe LOC [row label, column label] is similar.

The slice follows the principle of left opening and right closing, which is similar to dataframe The slice values of LOC are different, which needs to be paid more attention.

print(df.iloc[0:2, 1:3])    # Slice left open right closed

4, Dataframe ix[]

pandas is abandoned after version 0.20.0 and will not be studied.

Keywords: Python

Added by tomfmason on Sat, 15 Jan 2022 18:50:34 +0200