Hands on data analysis -- data reconstruction

1. Data consolidation

1.1 import basic library

import numpy as np
import pandas as pd

1.2 loading data

#Load all the data in the data folder and observe the relationship between them compared with the previous original data
text_left_up = pd.read_csv("train-left-up.csv")
text_left_down = pd.read_csv("train-left-down.csv")
text_right_up = pd.read_csv("train-right-up.csv")
text_right_down = pd.read_csv("train-right-down.csv")
text = pd.read_csv('train.csv')

#The four data are the overall data, which are divided into four blocks according to the upper left, lower left, upper right and lower right.

1.3 merging data

# Method 1: use concat Method: add data train-left-up.csv and train-right-up.csv Merge horizontally into one table, and # Save this table as result_up use concat Method: add train-left-down and train-right-down Horizontal merge into one # Table and save the table as result_down.  Then put the result above_ Up and result_down is merged vertically into result.
list_up = [text_left_up,text_right_up]
result_up = pd.concat(list_up,axis=1)

list_down=[text_left_down,text_right_down]
result_down = pd.concat(list_down,axis=1)

result = pd.concat([result_up,result_down])



#Method 2: use DataFrame's own methods, join method and append method
resul_up = text_left_up.join(text_right_up)

result_down = text_left_down.join(text_right_down)

result = result_up.append(result_down)


#Method 3: use the merge method of panels and the append method of DataFrame
result_up = pd.merge(text_left_up,text_right_up,left_index=True,right_index=True)

result_down = pd.merge(text_left_down,text_right_down,left_index=True,right_index=True)

result = resul_up.append(result_down)

[thinking] 1. Compare the different and same methods of merge, join and concat.
2. Think about why the append method of DataFrame is required in task 4 and task 5,
3. How to use merge or join to complete task 4 and task 5?

merge function

parameter	explain
left	Spliced left DataFrame object
right	Spliced right DataFrame object
on	The name of the column or index level to join. Must be found in the left and right DataFrame objects. If not passed and left_index and right_ If the index is False, the intersection of columns in the DataFrame will be inferred as a join key.
left_on	The column (index level) in the DataFrame on the left is used as the key. (column name, index level name, array with length equal to the length of DataFrame.)
right_on	ditto
left_index	left_index=True, the row label is used as its connection key. For a DataFrame with MultiIndex (hierarchy), the number of levels must match the number of connection keys in the DataFrame on the right.
right_index	ditto
how	(left, right, outer, inner). Default inner. inner is the intersection and outer is the union.
sort	: sort the resulting dataframes by the join key in dictionary order. The default is True, and setting it to False will significantly improve performance in many cases
copy	Always copy data from the passed DataFrame object (default = True)

join function

parameter	explain
how	Rules used to specify data retention after table consolidation (the same as merge)
on	The name of the column or index level to join.
suffix	If you encounter a column with the same name but different values in the process of merging with tables, and you want to keep them during merging, you can use suffixes to add suffixes to the duplicate column names of each table

Combat function

objs	series, dataframe or panel sequence lsit
axis	Axes that need to merge links. 0 is lie, 1 is hang, and the default axis=0
join	The connection mode is inner or outer

merge	Horizontal splicing, database style link merging, you can specify the key of the link
join	Transverse splicing
conbat	Both horizontal and vertical

2. Because append is used, longitudinal splicing can be performed (additional lines can be added).

3. Only merge and join can't work, because both are spliced horizontally.

2. Look at the data from another angle

2.1 changing data to Series type data

#What does this stack function do?
#Change the data from table structure to curly bracket structure, that is, its row index into column index

unit_result=text.stack().head(20)

3. Data aggregation and operation

3.1 group by () usage

#Group and aggregate according to the contents of one or more columns of the DataFrame itself
survived_sex = text['Survived'].groupby(text['Sex']).sum()

df  = text['Fare'].groupby(text['Sex'])

survived_pclass = text['Survived'].groupby(text['Pclass'])

three point two Usage of agg() function.

text.groupby('Sex').agg({'Fare': 'mean', 'Pclass': 'count'}).rename(columns=                                                                
                         {'Fare': 'mean_fare', 'Pclass': 'count_pclass'})

Keywords: Python Data Analysis Data Mining

Added by Begbie on Fri, 19 Nov 2021 13:29:49 +0200

Programming VIP