catalogue
1, Directly read the subscript with square brackets [], DataFrame [column label] [row label]
3, Use dataframe Iloc [row sequence, column sequence] read by the row / column number
I'm learning python by myself. The DataFrame data type of pandas module is a powerful tool for processing table data. The following is my preliminary study and Exploration on the method of reading the specified data in DataFrame.
1, Directly read the subscript with square brackets [], DataFrame [column label] [row label]
1. Read the whole column of data with only one []:
When the DataFrame data does not define the character string column label name, only the serial number can be used to read the data of the corresponding column.
When the DataFrame data defines the character string column label name, only the column label signature can be used to read the data of the corresponding column.
Read multi column data with list in [].
When Ps: [] is a digital slice, the whole row of data in the corresponding range is read according to the row sequence, regardless of whether the DataFrame data has column label signature or not, and the principle of left opening and right closing is followed.
2. Use two [], the first [] is the column label to be selected, and the second [] is the row label to be selected:
Whether the DataFrame data defines a string line signature or not, the second [] can represent the corresponding line with a digital sequence.
The second [] can be selected by serial number or line signature slice. The serial number slice is open on the left and closed on the right, and the line signature slice is included on the left and right.
Cannot read multi column single / multi row data.
After the DataFrame data is reordered, the serial number used for the second [] is still the original index, not the new line number.
The following code examples are df with row and column labels defined and df1 without row and column labels defined:
import pandas as pd df = pd.DataFrame( [[0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]], columns=list('ACBDE'), index=list('abcde') ) df1 = pd.DataFrame( [[0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]] ) print(df['A']) # Read column A data print(df['A':'C']) # Returns an empty DataFrame. Slicing cannot be used when DataFrame data has column labels print(df[['A', 'C']]) # Read the data of columns A and C print(df[['A', 'B', 'C']]) # Read the data of columns A, B and C, and rearrange the column order according to the list order print(df1[0]) # When the DataFrame data has no column label, the data in column 1 is read print(df[0:3]) # Digital slices are read in line sequence print(df1[0:3]) # Digital slices are read in line sequence print(df1[[0, 2]]) # Read the data of columns 1 and 3 print(df1[[0, 2, 1]]) # Read the data of the first, third and second columns, and rearrange the column order according to the list order print(df['A'][0]) # Read the data in the first row of column a, that is, the data in row a of column a print(df['A']['a']) # Read the data in row a of column a, that is, the data in the first row of column a print(df['A']['a':'c']) # Read the data from row a to row c in column a, and the row signature can be sliced, both left and right print(df['A'][0:3]) # Read the data of the first to third rows of column A, open on the left and close on the right print(df['A'][[0, 3]]) # Read the first and third rows of column A print(df[['A', 'D']]['a']) # KeyError: 'a', cannot read multi column single row print(df[['A', 'D']][0]) # KeyError: 0, cannot read multi column single row print(df['A':'B'][0:3]) # Returns an empty DataFrame. Column label signature cannot use slicing print(df1[0:3][0:4]) # Read the first to three rows of data, and the second [] is invalid
2, Use dataframe LOC [row label, column label] read according to the row / column label signature index
Comma separated row and column labels.
When a row or column label signature is defined, only the label name can be used.
When no row or column signature is defined, the serial number is used.
When the column label defaults, the whole row of data is read, and the row label cannot default.
Row / column labels can use lists and slices to read data in the selected range.
When using slices, both tag name slices and serial number slices are included on the left and right.
Continue to use the above two DataFrame data for code examples.
print(df1.loc[0]) # Read entire row of data print(df.loc['a']) # Read entire row of data print(df.loc[['a', 'c', 'b']]) # Read multiple rows of data with a list, and the row order is rearranged according to the list order print(df.loc[['a', 'c'], ['A', 'B']]) # Use the list to read the data in the selected range print(df.loc['a':'c', ['A', 'B']]) # Reads the data in the selected range with the tag name slice and list print(df.loc['a':'c', 'A':'B']) # You can use the tag name slice to read the data in the selected range, both left and right. print(df.loc['c':'a':-2, 'A':'B']) # Slices can be set in steps. print(df1.loc[0:2, 1:3]) # Slicing with serial number is also included on the left and right.
3, Use dataframe Iloc [row sequence, column sequence] read by the row / column number
Comma separates row sequence and column sequence. You can only use sequence number, not tag name.
Usage is the same as dataframe LOC [row label, column label] is similar.
The slice follows the principle of left opening and right closing, which is similar to dataframe The slice values of LOC are different, which needs to be paid more attention.
print(df.iloc[0:2, 1:3]) # Slice left open right closed
4, Dataframe ix[]
pandas is abandoned after version 0.20.0 and will not be studied.