pandas has so many practical and commonly used skills that it's not easy to collect them

Hello, I'm Lao Wu. You can also call me classmate Wu. Younger friends can also call me senior brother Wu. Welcome to the world of data analysis with me and learn together!

Interested friends can pay attention to me or my friends Data analysis column , there are many high-quality articles to share with you.

Early review:

Learn pandas complete code [super detailed] data viewing, input and output, selection, integration, cleaning, conversion, remodeling, mathematical and statistical methods and sorting

Learn the full set of pandas code [super detailed] box operation, grouping aggregation, time series and data visualization

Since I finished sorting out these two blog posts about pandas, it is not difficult for me to know from the reading of the blog posts and the data collected that people recognize this kind of practical blog posts. At the same time, I sometimes find that even if I sorted out so much about pandas, I still missed some knowledge points more or less, After all, pandas has too many practical functions.

Well, in today's blog post, I have further sorted out the practical skills that are not mentioned or not specific enough in the previous two blog posts. Please collect them quickly. I promise you will use them sooner or later.

1. Various settings of output table

1.1 specify sheet output

df.to_excel('test.xlsx', sheet_name='laowu')

1.2 specify missing value padding

df.to_excel('test.xlsx', na_rep='-')

1.3 infinite representation

df.to_excel('test.xlsx', inf_rep='inf')

1.4 floating point number format

# Keep two decimal places after the decimal point
df.to_excel('test.xlsx', float_format="%.2f")

1.5 output only specified columns

df.to_excel('test.xlsx', columns=['column1', 'column2', 'column3'])

1.6 without meter

# Without header
df.to_excel('test.xlsx', header=False)

1.7 without index

df.to_excel('test.xlsx', index=False)

1.8 specify index

df.to_excel('test.xlsx', index_label=['team', 'name'])

1.9 select a specific range of indexes

# From which row, which column
df.to_excel('test.xlsx', startrow=10, startcol=3)

1.10 designated code

df.to_excel('test.xlsx', encoding='utf-8')

1.11 use specified engine

df.style.to_excel('test.xlsx', engine='openpyxl')

1.12 freeze specified rows and columns

df.to_excel('test.xlsx', freeze_panes=(0,3))

2 save multiple dataframe s to different sheet s under the same table

with pd.ExcelWriter(excel_name) as writer:
    df_all.to_excel(writer, sheet_name='All ', index = 0, header = None)
    df_tmall.to_excel(writer, sheet_name='Tmall (Non cat super) ', index = 0, header = None)
    df_super.to_excel(writer, sheet_name='Tmall supermarket ', index = 0, header = None)
    df_jd.to_excel(writer, sheet_name='JD.COM', index = 0, header = None)
    df_pdd.to_excel(writer, sheet_name='Pinduoduo', index = 0, header = None)

3 select, modify or delete data with specified conditions

3.1 select, modify or delete rows with specified conditions

a=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
df=pd.DataFrame(a,index=['row0','row1','row2'],columns=list('ABC'))

# selection
cond_1 = (df.B >= 5) & (df.C < 12)
df_1 = df[cond_1]

# modify
cond_2 = (df.B >= 2) & (df.C < 12)
df.loc[cond_3, 'C'] = 0  # The value of the C field in the row data that meets the criteria is changed to 0
# df.loc[cond_2] = 0  # The values of all fields that meet the conditions are modified to 0

# delete
cond_3 = df['A'].isin([1, 7])
df_3=df[~cond_3]

To learn and use flexibly, ~ this symbol is really useful for negation.
At the same time, although the function of isin can also be realized by | (or), the efficiency is much higher than |.

3.2 select, modify or delete columns with specified conditions

cols=[x for i,x in enumerate(df.columns) if df.iat[0,i]==3]  #Use enumerate to traverse row0 and put the column containing the number 3 into cols

# Select a column whose row contains a specific value
df_4=df[cols]  

# modify
df_5.loc[:, cols] = 0

# delete
df_6 =df.drop(cols, axis=1) #Use the drop method to delete columns with specific values

3.3 delete rows or columns with null values

df1 = pd.DataFrame(
    [
        [np.nan, 2, np.nan, 0],
        [3, 4, np.nan, 1],
        [np.nan, np.nan, np.nan, 5],
        [np.nan, 3, np.nan, 4]
    ],columns=list('ABCD'))
print(df1)
df2=df1.copy()
df1['A']=df1['A'].fillna('null') #Assign all null values of column A in df to 'null'
# Delete the row of a column with null value
df1=df1[~df1['A'].isin(['null'])]
 
# Delete the column in which the null value of a row is located
df2[0:1]=df2[0:1].fillna('null')
cols=[x for i,x in enumerate(df2.columns) if df2.iat[0,i]=='null']
df2=df2.drop(cols,axis=1)

4 add a row of data

df=DataFrame(np.arange(16).reshape((4,4)),index=['a','b','c','d'],columns=['c1','c2','c3','c4']) 

# The first method
df['e'] = [1, 2, 3, 4]

# The second method
df_new = pd.DataFrame([5, 6, 7, 8]).T
## If the column of df4 is modified to be consistent with that of df3, it will not be on the same column after merging
df_new.columns = df.columns
## To merge the two dataframe s, you need to set ignore_index=True
df = pd.concat([df, df_new], ignore_index=True)

5. Modify data type

Generally, we modify the data type of a column

5.1 modify the data of specified conditions

df['A'].astype('str')
df['B'].astype(np.float64)

# Meet specific conditions
cond = df.A > 60
df.loc[cond, 'A'].astype('str')

5.2 modify the data of the specified data type

# Modify the data of int64 data type to int16
# Modify the data of float64 data type to float32
new_types = {np.dtype(np.int64): np.int16, 
             np.dtype(np.float64): np.float32}

df = df.astype(df.dtypes.map(new_types).to_dict())

6 data table beauty

df.style

If you have too much to say in this part, dig yourself a hole first. If necessary, you can add it later.

Conclusion

After reading this article, there are more knowledge points to share with you. Take your time to find ha, which is the link below.

Recommended columns

👨‍👩‍👦‍👦 Machine learning: share machine learning practical projects and explanations of common models
👨‍👩‍👦‍👦 Data analysis: share data analysis, practical projects and common skills

Review of previous contents

💚 Learn a full set of Python code [super detailed] Introduction to python, core syntax, data structure, advanced Python [to you who want to learn Python well]
❤️ Learn pandas complete code [super detailed] data viewing, input and output, selection, integration, cleaning, conversion, remodeling, mathematical and statistical methods and sorting
💙 Learn the full set of pandas code [super detailed] box operation, grouping aggregation, time series and data visualization
💜 Learn the basic operation, data type, array operation, copy and attempt, index, slice and iteration, shape operation, general function and linear algebra of NumPy full set of code [super detail]

Pay attention to me and learn more about it!

CSDN@Report, I also have to study hard today

Keywords: Python Data Analysis Data Mining

Added by iKwak on Wed, 05 Jan 2022 05:42:17 +0200

Programming VIP