01 python reads files and packages

Read files using open

  • .read()
    • Using open, you can read files without requiring any packages
    • r means read.
    • If the file and notebook are in the same folder, you just need to write the name of the file instead of filling in the full path.
    • file.read means to read everything in this file
file = open('data.txt','r')
print(file.read())             #Read everything in this file
,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7
file = open('data.txt','r')
print(file.read(5))           #Read the first five characters in this file
,A,B,
  • .readlines()
    • If you change. read to. readlines, there will be different effects:
    • Put everything in a list, and each item in it equals every line in the original file, means a new line. Read all content including format. It can read the content of a specified line, but. read can't.
file = open('data.txt','r')
print(file.readlines())          #Read the entire content as a list
[',A,B,C,D\n', '0,foo,one,small,1\n', '1,foo,one,large,2\n', '2,foo,one,large,8\n', '3,foo,two,small,3\n', '4,foo,two,small,3\n', '5,bar,one,large,4\n', '6,bar,one,small,5\n', '7,bar,two,small,6\n', '8,bar,two,large,7\n']
file = open('data.txt','r')
print(file.readlines()[5])         #Read the specified line 5
4,foo,two,small,3
  • Operate on the contents when reading the data (using the for loop)
file = open('data.txt','r') 
i =1
for line in file:
    print('read line',i) 
    i = i+1 
    print(line)              #Automatic line change
read line 1
,A,B,C,D

read line 2
0,foo,one,small,1

read line 3
1,foo,one,large,2

read line 4
2,foo,one,large,8

read line 5
3,foo,two,small,3

read line 6
4,foo,two,small,3

read line 7
5,bar,one,large,4

read line 8
6,bar,one,small,5

read line 9
7,bar,two,small,6

read line 10
8,bar,two,large,7
  • Read other forms of files
    • csv
file = open('data.csv','r') 
i =1
for line in file:
    print('read line',i) 
    i = i+1 
    print(line)
read line 1
,A,B,C,D

read line 2
0,foo,one,small,1

read line 3
1,foo,one,large,2

read line 4
2,foo,one,large,8

read line 5
3,foo,two,small,3

read line 6
4,foo,two,small,3

read line 7
5,bar,one,large,4

read line 8
6,bar,one,small,5

read line 9
7,bar,two,small,6

read line 10
8,bar,two,large,7

Write to a file

  • w denotes write
  • Be sure to write file.close() to indicate that you're done.
  • \ n denotes newline
file = open('hello.txt','w')
file.write('Hello World!')
file.close()

Use pandas to read files

  • Open is the most convenient way to open the txt file. If there is a lot of data, pandas is usually used to read it.
  • pandas can restore the form of data written in csv.
  • There's an extra column of Uname:0 because pandas give an index and the table itself has an index. If you want to eliminate this, add index_col=0.
import pandas as pd
df = pd.read_csv('data.csv')
df
Unnamed: 0 A B C D
0 0 foo one small 1
1 1 foo one large 2
2 2 foo one large 8
3 3 foo two small 3
4 4 foo two small 3
5 5 bar one large 4
6 6 bar one small 5
7 7 bar two small 6
8 8 bar two large 7
import pandas as pd
df = pd.read_csv('data.csv',index_col=0)
df
A B C D
0 foo one small 1
1 foo one large 2
2 foo one large 8
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
  • Read excel file
import pandas as pd
df = pd.read_excel('data.xlsx')
df
A B C D
0 foo one small 1
1 foo one large 2
2 foo one large 8
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
  • Read txt file
import pandas as pd
df = pd.read_table('data.txt')
df
,A,B,C,D
0 0,foo,one,small,1
1 1,foo,one,large,2
2 2,foo,one,large,8
3 3,foo,two,small,3
4 4,foo,two,small,3
5 5,bar,one,large,4
6 6,bar,one,small,5
7 7,bar,two,small,6
8 8,bar,two,large,7
import pandas as pd
df = pd.read_table('data.txt',index_col=0)          #Delete the index of the first column
df
,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7
import pandas as pd
df = pd.read_table('data.txt',sep = ',',index_col=0)          #Delete the index of the first column
df
A B C D
0 foo one small 1
1 foo one large 2
2 foo one large 8
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
  • Storage file
df.to_excel('dat.xlsx')
df.to_csv('dat.csv')
df.to_csv('dat.txt')

Read txt in complex format

  • Skprows means that several lines are skipped and not read
  • header=None denotes that the name of each column of data does not exist in the data, starting from 0.
  • names denote their own naming
  • nrows=5 denotes five five readings
df = pd.read_table('data1.txt',sep = ',')
df
# real data
#num1 num2 num3 num4 message
# good data NaN NaN NaN NaN
# csv file NaN NaN NaN NaN
1 2 3 4 hello
5 6 7 8 world
9 10 11 12 hello
1 2 3 4 hello
5 6 7 8 good
9 10 11 12 fine
df = pd.read_table('data1.txt',sep = ',',skiprows = [0,1,2,3],header = None,names = ['n1','n2','n3','n4','message'], index_col = ['message'],nrows = 5)
df
n1 n2 n3 n4
message
hello 1 2 3 4
world 5 6 7 8
hello 9 10 11 12
hello 1 2 3 4
good 5 6 7 8
df = pd.read_table('data1.txt',sep = ',',skiprows = [0,2,3], index_col = ['message'],nrows = 3)         #Read only the first three lines
df
#num1 num2 num3 num4
message
hello 1 2 3 4
world 5 6 7 8
hello 9 10 11 12

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html

  • This page introduces pandas.read_table. If you encounter functions that you will not encounter, you can go to see the documentation of these functions.

Keywords: Big Data Excel

Added by johnbrayn on Sun, 19 May 2019 11:04:43 +0300