Pandas versus Excel Notes 2

I. Creating Sequences

series can represent excel rows and columns
How to create it:
1. Using dictionaries, key is converted to index and value is converted to data.
2. Use lists

"Mode 1"
s1=pd.Series()#Generate a sequence that represents rows and columns
d={'x':100,'y':200,'z':300}
s1=pd.Series(d)#Converting a dictionary into a sequence
print(s1)
"Mode 2"
L1=[100,200,300]
L2=['x','y','z']
s1=pd.Series(L1,index=L2)#Generate a sequence that represents rows and columns
#perhaps
s1=pd.Series([100,200,300],index=['x','y','z'])


Add the sequence to the dataframe

import pandas as pd
s1=pd.Series([1,2,3],index=[1,2,3],name='A')
s2=pd.Series([10,20,30],index=[1,2,3],name='B')
s3=pd.Series([100,200,300],index=[1,2,3],name='C')
df=pd.DataFrame({s1.name:s1,s2.name:s2,s3.name:s3})
df


Create dataframe directly

II. Reading of Data Area

excel for operation at this time

import pandas as pd
import xlrd
books=pd.read_excel('E:/ruanjianDM/people.xlsx',skiprows=3,usecols='C:I')#Skip the first three lines, C to column I
books['ID'].at[0]=100 #books['ID'].at[index] specifies columns and rows
books

3. Automatic Filling

Simple filling

"1.Fill 1-n number"
for i in books.index:
    books['ID'].at[i]=i+1 #Change the ID above to 1
"2.Fill in regular strings"
#Direct padding can cause errors, setting Instor to str type
books=pd.read_excel('E:/ruanjianDM/people.xlsx',skiprows=3,usecols='C:L',dtype={'ID':str,'Instore':str,'Date':str})
for i in books.index:
    books['Instore'].at[i]="Yes" if i%2==0 else "No"
books.head()

Fill in the date sequence

Date function: date(2018,1,1), date (year, month, day)
Time Delta (day = i) only has day, can only add date, minute, millisecond, hour, no month.

books.rename(columns={"Data":"Date"},inplace=True)#Reset Column Name
from datetime import date,timedelta
start=date(2018,1,1)
for i in books.index:
    books['Date'].at[i]=start+timedelta(days=i)#On the basis of start, it increases or decreases one day, two days and three days in turn.
#Increase one day
books.head()

IV. Column Operation and Function Computation

The operation in Excel is a cell and the operation in pandas is a column.
Special case: do not want to start from scratch, calculate some of the lines

from numpy import random  as nr
books.Prince=nr.rand(books.shape[0],1)
print(books.head())
for i in range(1,3):#Left closed right open interval, not 16
    books['Prince'].at[i]=books['Prince'].at[i]*books['ID'].at[i]
def add_2(x):
    return x+2
books['ID']=books['ID'].apply(add_2)#Write only the function name, no parentheses, ID column plus 2
print(books.head())


Five, sort

"Single Conditional Sorting"
#books.sort_values(by='Prince',inplace=True)#The default is ascending order
books.sort_values(by='Prince',ascending=False,inplace=True)
books.head()
"Multiple Conditional Sorting,Two conditions are considered to get the ranking result, which will not disturb the original row correspondence relationship."
books.sort_values(by=['ID','Prince'],ascending=[False,True],inplace=True)
books.head()


Screening filtration


The statement of the two filter conditions can be understood as a=books.loc[books ['ID']. apply(ID_3_to_6)] first filters a part of the table, which is being filtered.
a.loc[books['Prince'].apply(Prince)]

"Define filter function"
def ID_3_to_6(a):
    return 3<=a<6
def Prince(s):
    return 2<=s<=5
    
books.loc[books['ID'].apply(ID_3_to_6)]#Return TRUE if the condition is satisfied, Loc locates a line according to true and prints it.

books.loc[books['ID'].apply(ID_3_to_6)].loc[books['Prince'].apply(Prince)]#Two screening conditions


Keywords: Excel

Added by flforlife on Tue, 08 Oct 2019 14:01:02 +0300