DataFrame: sorting and ranking

Sorting and ranking of DataFrame

1. Explain

There are two sorts in DataFrame: index sort and value sort.

Index sort: sort_index();

Value sorting: sort_values();

Value ranking: rank()

For index sorting, it involves the sorting of row index and column index, as well as whether it is ascending or descending. Function DF sort_ Index (axis =, ascending =, inplace =), pay special attention to these three parameters. Axis indicates whether to operate on rows or columns; Ascending indicates whether the operation is ascending or descending.

For value sorting, it also involves row and column sorting, ascending and descending sorting. Function DF sort_ Values (by =, axis =, ascending =, place =), we also need to pay special attention to these parameters, but there is an additional by operation. We need to indicate which row or column we sort according to.

Note: axis=0 indicates row operation, and axis=1 indicates column operation; ascending=True indicates ascending order, and ascending=False indicates descending order; inplace=True indicates the operation on the original DataFrame itself, so no assignment operation is required. inplace=False is equivalent to the copy of the original DataFrame. Some subsequent operations are performed on the copy file. Therefore, we need to assign a variable to save the results of the operation.

2. Index sort: DF sort_ index()

① The row index is arranged in ascending order

df = pd.DataFrame({"A":[1,3,5,7,9],
                   "D":[1,2,3,4,5],
                   "C":[3,6,9,12,15],
                   "B":[2,4,6,8,10]},
                   index=list("acbed"))
display(df)
display(id(df))

df.sort_index(axis=0,ascending=True,inplace=True)
display(df)
display(id(df))

df1 = df.sort_index(axis=0,ascending=True)
display(df1)
display(id(df1))

The results are as follows:

            

② The column indexes are arranged in descending order

df = pd.DataFrame({"A":[1,3,5,7,9],
                   "D":[1,2,3,4,5],
                   "C":[3,6,9,12,15],
                   "B":[2,4,6,8,10]},
                   index=list("acbed"))
display(df)

df.sort_index(axis=1,ascending=False,inplace=True)
display(df)

The results are as follows:

            

 

3. Value sorting: DF sort_ values()

① Sort a column in ascending order (meaningful)

df = pd.DataFrame({"A":[3,1,5,9,7],
                   "D":[4,1,2,5,3],
                   "C":[3,15,9,6,12],
                   "B":[2,4,6,10,8]},
                   index=list("acbed"))
display(df)

df.sort_values(by="A",axis=0,ascending=True,inplace=True)
display(df)

The results are as follows:

            

② Arrange a row in descending order (not practical)

df = pd.DataFrame({"A":[3,1,5,9,7],
                   "D":[4,1,2,5,3],
                   "C":[3,15,9,6,12],
                   "B":[2,4,6,10,8]},
                   index=list("acbed"))
display(df)

df.sort_values(by="A",axis=1,ascending=False,inplace=True)
display(df)

The results are as follows:

            

③ Joint sorting of multiple columns (important)

df = pd.DataFrame({"A":[3,1,3,9,7],
                   "D":[666,1,888,5,3],
                   "C":[3,15,9,6,12],
                   "B":[2,4,6,10,8]},
                   index=list("acbed"))
display(df)

df.sort_values(by=["A","D"],axis=0,ascending=[True,False],inplace=True)
df

The results are as follows:

            

Note: in the above figure, we sort by columns A and D respectively. First, we sort by column A in ascending order. When column A has the same value, we sort by column D in descending order.

4,sort_ Na in values()_ Position parameter

na_ The position parameter is used to set the display position of the missing value. first means that the missing value is displayed in the front; last indicates that the missing value is displayed at the end.

df = pd.DataFrame({"A":[10,8,np.nan,2,4],
                   "D":[1,7,5,3,8],
                   "B":[5,2,8,4,1]},
                   index=list("abcde"))
display(df)

df.sort_values(by="A",axis=0,inplace=True,na_position="first")
display(df)

df.sort_values(by="A",axis=0,inplace=True,na_position="last")
display(df)

The results are as follows:

            

5. "Value ranking": rank() function

1) Description of common parameters of rank() function

2) Raw data

x = {"name":["Zhang San","Li Si","Wang Wu","Zhao Liu","Zheng Qi","Chen Ba","Huang Jiu","Sun Shi"],
     "sales":[60,40,50,40,30,80,70,60]}
df = pd.DataFrame(x)
display(df)

The results are as follows:

            

3) The rank() function uses the following

① method="first"

x = {"name":["Zhang San","Li Si","Wang Wu","Zhao Liu","Zheng Qi","Chen Ba","Huang Jiu","Sun Shi"],
     "sales":[60,40,50,40,30,80,70,60]}
df = pd.DataFrame(x)
display(df)

df["ranking"] = df["sales"].rank(method="first")
display(df)

The results are as follows:

            

② method="min"

x = {"name":["Zhang San","Li Si","Wang Wu","Zhao Liu","Zheng Qi","Chen Ba","Huang Jiu","Sun Shi"],
     "sales":[60,40,50,40,30,80,70,60]}
df = pd.DataFrame(x)
display(df)

df["ranking"] = df["sales"].rank(method="min")
display(df)

The results are as follows:

            

③ method="max"

x = {"name":["Zhang San","Li Si","Wang Wu","Zhao Liu","Zheng Qi","Chen Ba","Huang Jiu","Sun Shi"],
     "sales":[60,40,50,40,30,80,70,60]}
df = pd.DataFrame(x)
display(df)

df["ranking"] = df["sales"].rank(method="max")
display(df)

The results are as follows:

            

④ method="average"

x = {"name":["Zhang San","Li Si","Wang Wu","Zhao Liu","Zheng Qi","Chen Ba","Huang Jiu","Sun Shi"],
     "sales":[60,40,50,40,30,80,70,60]}
df = pd.DataFrame(x)
display(df)

df["ranking"] = df["sales"].rank(method="average")
display(df)

The results are as follows:

         

Added by unkwntech on Fri, 28 Jan 2022 15:49:41 +0200