Introduction to DataFrame structure

1. Explanation of DataFrame data structure


   index refers to row index, column refers to column index, and values refers to values. In fact, both row index and column index can be regarded as index index. From each row, the DataFrame can be seen as a Series series of rows stacked up and down, and the index of each Series is the column index [0,1,2,3]; From the perspective of each column, the DataFrame can be regarded as a Series series of columns stacked around. The index of each Series is the row index [0,1,2].
   the default understanding of DataFrame is: DataFrame is actually composed of many columns with different data types. For the above figure, this DataFrame is actually composed of the following four Series, and their indexes are row indexes [0,1,2].

  a DataFrame can be compared to a table in MySQL:
   in MySQL tables, the data types of each column field are basically different, and each table has many column fields;
  if each column in the MySQL table is regarded as a Series of data types, a MySQL table can be regarded as composed of many Series with different data types, which is consistent with what we have described above.

2. index property and columns property of DataFrame

1) Construct a DataFrame
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

The results are as follows:

2) index and columns properties
df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

x = df.index
display(x)
list(df.index)

y = df.columns
display(y)
list(df.columns)

The results are as follows:

① Modify row index: DF index
df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

df.index = ["a","b","c"]
display(df)

The results are as follows:

② Modify column index: DF columns
df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

df.columns = ["a","b","c"]
display(df)

The results are as follows:

3) Index object index of DataFrame

   by observing the "data structure diagram of DataFrame", we can find that each df has both a row index and a column index columns. However, both row index and column index columns are uniformly called "index objects". The difference is that when creating df and specifying the parameter name of the parameter, in order to facilitate the distinction between branch index and column index, the "index object" of row index is called index and the "index object" of column index is called columns.
   remember: the elements in the Index object do not support modification.

# pd.Index() is used to create an index object
x = pd.Index([1,2,3])
display(x)
display(type(x))

x[0] = 1

The results are as follows:

3. name attribute

1) How to understand the name attribute of DataFrame


   we know that every row and column in the extracted DataFrame is a series, and each sereis constituting the DataFrame object has a name, which is the index of the corresponding row and column. As shown in the figure, there are eight colors of "red, orange, yellow, green, blue, indigo, purple and black", numbered 1-8 respectively, and each number corresponds to a series. The name of Series1 is "region 1", the name of Series2 is "region 2"... The name of Series8 is "Guangzhou".

df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

df.loc["Region 1"].name
df.loc["Region 2"].name
......
df["Guangzhou"].name

The results are as follows:

2) Set name attribute for row index and column index: DF index. Name and DF columns. name
df = pd.DataFrame(np.random.randint(70,100,(3,5)), 
                  index=["Region 1", "Region 2", "Region 3"], 
                  columns=["Beijing","Tianjin", "Shanghai","Shenyang", "Guangzhou"])
display(df)

df.index.name = "index_name"
df.columns.name = "columns_name"
display(df)

The results are as follows:


To sum up: through the above demonstration, we can not only have a name for each row and column of the DataFrame, but also set a name for the row index and column index of the DataFrame respectively.

Added by kevinbarker on Fri, 28 Jan 2022 09:33:25 +0200