Hello, I'm Lao Wu. You can also call me classmate Wu. Younger friends can also call me senior brother Wu. Welcome to the world of data analysis with me and learn together!
Interested friends can pay attention to me or my friends Data analysis column , there are many high-quality articles to share with you.
Early review:
Since I finished sorting out these two blog posts about pandas, it is not difficult for me to know from the reading of the blog posts and the data collected that people recognize this kind of practical blog posts. At the same time, I sometimes find that even if I sorted out so much about pandas, I still missed some knowledge points more or less, After all, pandas has too many practical functions.
Well, in today's blog post, I have further sorted out the practical skills that are not mentioned or not specific enough in the previous two blog posts. Please collect them quickly. I promise you will use them sooner or later.
1. Various settings of output table
1.1 specify sheet output
df.to_excel('test.xlsx', sheet_name='laowu')
1.2 specify missing value padding
df.to_excel('test.xlsx', na_rep='-')
1.3 infinite representation
df.to_excel('test.xlsx', inf_rep='inf')
1.4 floating point number format
# Keep two decimal places after the decimal point df.to_excel('test.xlsx', float_format="%.2f")
1.5 output only specified columns
df.to_excel('test.xlsx', columns=['column1', 'column2', 'column3'])
1.6 without meter
# Without header df.to_excel('test.xlsx', header=False)
1.7 without index
df.to_excel('test.xlsx', index=False)
1.8 specify index
df.to_excel('test.xlsx', index_label=['team', 'name'])
1.9 select a specific range of indexes
# From which row, which column df.to_excel('test.xlsx', startrow=10, startcol=3)
1.10 designated code
df.to_excel('test.xlsx', encoding='utf-8')
1.11 use specified engine
df.style.to_excel('test.xlsx', engine='openpyxl')
1.12 freeze specified rows and columns
df.to_excel('test.xlsx', freeze_panes=(0,3))
2 save multiple dataframe s to different sheet s under the same table
with pd.ExcelWriter(excel_name) as writer: df_all.to_excel(writer, sheet_name='All ', index = 0, header = None) df_tmall.to_excel(writer, sheet_name='Tmall (Non cat super) ', index = 0, header = None) df_super.to_excel(writer, sheet_name='Tmall supermarket ', index = 0, header = None) df_jd.to_excel(writer, sheet_name='JD.COM', index = 0, header = None) df_pdd.to_excel(writer, sheet_name='Pinduoduo', index = 0, header = None)
3 select, modify or delete data with specified conditions
3.1 select, modify or delete rows with specified conditions
a=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]]) df=pd.DataFrame(a,index=['row0','row1','row2'],columns=list('ABC')) # selection cond_1 = (df.B >= 5) & (df.C < 12) df_1 = df[cond_1] # modify cond_2 = (df.B >= 2) & (df.C < 12) df.loc[cond_3, 'C'] = 0 # The value of the C field in the row data that meets the criteria is changed to 0 # df.loc[cond_2] = 0 # The values of all fields that meet the conditions are modified to 0 # delete cond_3 = df['A'].isin([1, 7]) df_3=df[~cond_3]
- To learn and use flexibly, ~ this symbol is really useful for negation.
- At the same time, although the function of isin can also be realized by | (or), the efficiency is much higher than |.
3.2 select, modify or delete columns with specified conditions
cols=[x for i,x in enumerate(df.columns) if df.iat[0,i]==3] #Use enumerate to traverse row0 and put the column containing the number 3 into cols # Select a column whose row contains a specific value df_4=df[cols] # modify df_5.loc[:, cols] = 0 # delete df_6 =df.drop(cols, axis=1) #Use the drop method to delete columns with specific values
3.3 delete rows or columns with null values
df1 = pd.DataFrame( [ [np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4] ],columns=list('ABCD')) print(df1) df2=df1.copy() df1['A']=df1['A'].fillna('null') #Assign all null values of column A in df to 'null' # Delete the row of a column with null value df1=df1[~df1['A'].isin(['null'])] # Delete the column in which the null value of a row is located df2[0:1]=df2[0:1].fillna('null') cols=[x for i,x in enumerate(df2.columns) if df2.iat[0,i]=='null'] df2=df2.drop(cols,axis=1)
4 add a row of data
df=DataFrame(np.arange(16).reshape((4,4)),index=['a','b','c','d'],columns=['c1','c2','c3','c4']) # The first method df['e'] = [1, 2, 3, 4] # The second method df_new = pd.DataFrame([5, 6, 7, 8]).T ## If the column of df4 is modified to be consistent with that of df3, it will not be on the same column after merging df_new.columns = df.columns ## To merge the two dataframe s, you need to set ignore_index=True df = pd.concat([df, df_new], ignore_index=True)
5. Modify data type
Generally, we modify the data type of a column
5.1 modify the data of specified conditions
df['A'].astype('str') df['B'].astype(np.float64) # Meet specific conditions cond = df.A > 60 df.loc[cond, 'A'].astype('str')
5.2 modify the data of the specified data type
# Modify the data of int64 data type to int16 # Modify the data of float64 data type to float32 new_types = {np.dtype(np.int64): np.int16, np.dtype(np.float64): np.float32} df = df.astype(df.dtypes.map(new_types).to_dict())
6 data table beauty
df.style
If you have too much to say in this part, dig yourself a hole first. If necessary, you can add it later.
Conclusion
After reading this article, there are more knowledge points to share with you. Take your time to find ha, which is the link below.
Recommended columns
👨👩👦👦 Machine learning: share machine learning practical projects and explanations of common models
👨👩👦👦 Data analysis: share data analysis, practical projects and common skills
Review of previous contents
💚 Learn a full set of Python code [super detailed] Introduction to python, core syntax, data structure, advanced Python [to you who want to learn Python well]
❤️ Learn pandas complete code [super detailed] data viewing, input and output, selection, integration, cleaning, conversion, remodeling, mathematical and statistical methods and sorting
💙 Learn the full set of pandas code [super detailed] box operation, grouping aggregation, time series and data visualization
💜 Learn the basic operation, data type, array operation, copy and attempt, index, slice and iteration, shape operation, general function and linear algebra of NumPy full set of code [super detail]
Pay attention to me and learn more about it!