Python - pandas module - DataFrame data structure

Python - pandas module - DataFrame data structure

pandas

  • Numpy is more suitable for dealing with unified numerical array data
  • pandas is specially designed to handle tables and mixed data
  • panda has two data structures:
    1. Series
    2. DataFrame

DataFrame

  • DataFrame is a tabular (similar to Excel) data structure, which contains a set of ordered columns. Each column can be of different value types (value, string, bool, etc.)
  • Compared with Series (only row index), DataFrame has both row index and column index
  • It can be regarded as a dictionary composed of multiple Series sharing a column index

Create DataFrame object

Create function

obj = pandas.Dataframe()#Create function

Create from dictionary

  • Define a dictionary
  • The key of the dictionary is the column name of the DataFrame
  • The value of the dictionary is a serise values
  • When the index of the Dataframe is not set, it is transmitted to 0~N by default
    Take an example:
'''establish DataFrema'''
import pandas as pd
#Define a dictionary
dic = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],'year':[2000,2001,2002,2001,2002,2003],'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
#Create objects through DataFrame()
frame = pd.DataFrame(dic)
print('Create a dictionary without passing in a specified index of DataFrame Object:')
print(frame)

Created from a two-dimensional array

It's just that the elements created through a two-dimensional array are of the same type
If the index columns and index are not re passed in, they are assigned from 0 to n

import pandas as pd
import numpy as np
arr = np.arange(24).reshape(6,4)
frame = pd.DataFrame(arr,columns=['a','b','c','d'],index=['one','two','three','four','five','six'])
print(frame)

columns = specify column name, index = specify row index

Modify the column name of the frame
It is actually indexed in the dictionary according to the value passed in by columns

When the specified column name is included in the dictionary key

'''Specify column name'''
frame2 = pd.DataFrame(dic,columns=['pop','year','state'])
print('After modifying the column name DataFrame')
print(frame2)

When the specified column name is not included in the dictionary key

When the specified column name is not included in the dictionary key and the index cannot be found,
Create a new column with the value set to NaN (resulting in missing values)

'''The specified column names are not all in key in'''
frame3 = pd.DataFrame(dic,columns=['pop','year','state','debt'],index=['one','two','three','four','five','six'])
print('Specify the column name and row index DataFrame')
print(frame3)

View attribute values

View columns (column index)

print(frame3.columns)

View index (row index)

print(frame3.index)

View values (element array)

print(frame3.values)

Index column: view the specified column (return a Series)

Returns a Series

DataFrema.column

print('yera Columnar Series')
print(frame.year)

To view the specified column type:

print('year The data type of the column')
print(type(frame.year))

DataFrema[column]

#DataFrema[column]
print('stste Columnar Series')
print(frame['state'])

Index row: view the specified row

In Sreies, we use series [index], but in DataFrema, this method has been occupied by index columns, so we use another method to index columns

DataFrame.loc[index]

The data type of the row element is also a Series

import pandas as pd
dic = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],'year':[2000,2001,2002,2001,2002,2003],'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(dic,index=['one','two','three','four','five','six'])
print('three OK Series')
print(frame.loc['three'])
print('three Data type of row:',type(frame.loc['three']))

DataFrame.iloc [line number]

Line number starts at 0

print('Data in row 0 Series:')
print(frame.iloc[0])

Re index row dataframe reindex()

Change row position

'''Modify the row index. Column index'''
import pandas as pd
import numpy as np
arr = np.arange(24).reshape(6,4)
frame = pd.DataFrame(arr)
print('Before modification')
print(frame)
#Modify row index
frame2 = frame.reindex([4,5,2,1,3,0])
print('After modification')
print(frame2)

DataFrame.index = [] modify row index

DataFrame.columns = [] modify column index

Modify index

#Modify index name
import pandas as pd
import numpy as np
arr = np.arange(24).reshape(6,4)
frame = pd.DataFrame(arr)
#Modify index name
frame.columns=['a','b','c','d']
frame.index=['one','two','three','four','five','six']
#After modifying the index name
print(frame

Addition, deletion and modification of rows and columns

appoint:
axis = 1 indicates the column
Default representation line

data = DataFrame.drop(Listing, axis = 1)
data = DataFrame.drop(Row name)

Keywords: Python data structure Data Analysis pandas

Added by delfa on Sat, 25 Dec 2021 05:41:14 +0200