Read files using open
- .read()
- Using open, you can read files without requiring any packages
- r means read.
- If the file and notebook are in the same folder, you just need to write the name of the file instead of filling in the full path.
- file.read means to read everything in this file
file = open('data.txt','r')
print(file.read())
,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7
file = open('data.txt','r')
print(file.read(5))
,A,B,
- .readlines()
- If you change. read to. readlines, there will be different effects:
- Put everything in a list, and each item in it equals every line in the original file, means a new line. Read all content including format. It can read the content of a specified line, but. read can't.
file = open('data.txt','r')
print(file.readlines())
[',A,B,C,D\n', '0,foo,one,small,1\n', '1,foo,one,large,2\n', '2,foo,one,large,8\n', '3,foo,two,small,3\n', '4,foo,two,small,3\n', '5,bar,one,large,4\n', '6,bar,one,small,5\n', '7,bar,two,small,6\n', '8,bar,two,large,7\n']
file = open('data.txt','r')
print(file.readlines()[5])
4,foo,two,small,3
- Operate on the contents when reading the data (using the for loop)
file = open('data.txt','r')
i =1
for line in file:
print('read line',i)
i = i+1
print(line)
read line 1
,A,B,C,D
read line 2
0,foo,one,small,1
read line 3
1,foo,one,large,2
read line 4
2,foo,one,large,8
read line 5
3,foo,two,small,3
read line 6
4,foo,two,small,3
read line 7
5,bar,one,large,4
read line 8
6,bar,one,small,5
read line 9
7,bar,two,small,6
read line 10
8,bar,two,large,7
- Read other forms of files
file = open('data.csv','r')
i =1
for line in file:
print('read line',i)
i = i+1
print(line)
read line 1
,A,B,C,D
read line 2
0,foo,one,small,1
read line 3
1,foo,one,large,2
read line 4
2,foo,one,large,8
read line 5
3,foo,two,small,3
read line 6
4,foo,two,small,3
read line 7
5,bar,one,large,4
read line 8
6,bar,one,small,5
read line 9
7,bar,two,small,6
read line 10
8,bar,two,large,7
Write to a file
- w denotes write
- Be sure to write file.close() to indicate that you're done.
- \ n denotes newline
file = open('hello.txt','w')
file.write('Hello World!')
file.close()
Use pandas to read files
- Open is the most convenient way to open the txt file. If there is a lot of data, pandas is usually used to read it.
- pandas can restore the form of data written in csv.
- There's an extra column of Uname:0 because pandas give an index and the table itself has an index. If you want to eliminate this, add index_col=0.
import pandas as pd
df = pd.read_csv('data.csv')
df
|
Unnamed: 0 |
A |
B |
C |
D |
0 |
0 |
foo |
one |
small |
1 |
1 |
1 |
foo |
one |
large |
2 |
2 |
2 |
foo |
one |
large |
8 |
3 |
3 |
foo |
two |
small |
3 |
4 |
4 |
foo |
two |
small |
3 |
5 |
5 |
bar |
one |
large |
4 |
6 |
6 |
bar |
one |
small |
5 |
7 |
7 |
bar |
two |
small |
6 |
8 |
8 |
bar |
two |
large |
7 |
import pandas as pd
df = pd.read_csv('data.csv',index_col=0)
df
|
A |
B |
C |
D |
0 |
foo |
one |
small |
1 |
1 |
foo |
one |
large |
2 |
2 |
foo |
one |
large |
8 |
3 |
foo |
two |
small |
3 |
4 |
foo |
two |
small |
3 |
5 |
bar |
one |
large |
4 |
6 |
bar |
one |
small |
5 |
7 |
bar |
two |
small |
6 |
8 |
bar |
two |
large |
7 |
import pandas as pd
df = pd.read_excel('data.xlsx')
df
|
A |
B |
C |
D |
0 |
foo |
one |
small |
1 |
1 |
foo |
one |
large |
2 |
2 |
foo |
one |
large |
8 |
3 |
foo |
two |
small |
3 |
4 |
foo |
two |
small |
3 |
5 |
bar |
one |
large |
4 |
6 |
bar |
one |
small |
5 |
7 |
bar |
two |
small |
6 |
8 |
bar |
two |
large |
7 |
import pandas as pd
df = pd.read_table('data.txt')
df
|
,A,B,C,D |
0 |
0,foo,one,small,1 |
1 |
1,foo,one,large,2 |
2 |
2,foo,one,large,8 |
3 |
3,foo,two,small,3 |
4 |
4,foo,two,small,3 |
5 |
5,bar,one,large,4 |
6 |
6,bar,one,small,5 |
7 |
7,bar,two,small,6 |
8 |
8,bar,two,large,7 |
import pandas as pd
df = pd.read_table('data.txt',index_col=0)
df
|
,A,B,C,D |
0,foo,one,small,1 |
1,foo,one,large,2 |
2,foo,one,large,8 |
3,foo,two,small,3 |
4,foo,two,small,3 |
5,bar,one,large,4 |
6,bar,one,small,5 |
7,bar,two,small,6 |
8,bar,two,large,7 |
import pandas as pd
df = pd.read_table('data.txt',sep = ',',index_col=0)
df
|
A |
B |
C |
D |
0 |
foo |
one |
small |
1 |
1 |
foo |
one |
large |
2 |
2 |
foo |
one |
large |
8 |
3 |
foo |
two |
small |
3 |
4 |
foo |
two |
small |
3 |
5 |
bar |
one |
large |
4 |
6 |
bar |
one |
small |
5 |
7 |
bar |
two |
small |
6 |
8 |
bar |
two |
large |
7 |
df.to_excel('dat.xlsx')
df.to_csv('dat.csv')
df.to_csv('dat.txt')
Read txt in complex format
- Skprows means that several lines are skipped and not read
- header=None denotes that the name of each column of data does not exist in the data, starting from 0.
- names denote their own naming
- nrows=5 denotes five five readings
df = pd.read_table('data1.txt',sep = ',')
df
|
|
|
|
# real data |
#num1 |
num2 |
num3 |
num4 |
message |
# good data |
NaN |
NaN |
NaN |
NaN |
# csv file |
NaN |
NaN |
NaN |
NaN |
1 |
2 |
3 |
4 |
hello |
5 |
6 |
7 |
8 |
world |
9 |
10 |
11 |
12 |
hello |
1 |
2 |
3 |
4 |
hello |
5 |
6 |
7 |
8 |
good |
9 |
10 |
11 |
12 |
fine |
df = pd.read_table('data1.txt',sep = ',',skiprows = [0,1,2,3],header = None,names = ['n1','n2','n3','n4','message'], index_col = ['message'],nrows = 5)
df
|
n1 |
n2 |
n3 |
n4 |
message |
|
|
|
|
hello |
1 |
2 |
3 |
4 |
world |
5 |
6 |
7 |
8 |
hello |
9 |
10 |
11 |
12 |
hello |
1 |
2 |
3 |
4 |
good |
5 |
6 |
7 |
8 |
df = pd.read_table('data1.txt',sep = ',',skiprows = [0,2,3], index_col = ['message'],nrows = 3)
df
|
#num1 |
num2 |
num3 |
num4 |
message |
|
|
|
|
hello |
1 |
2 |
3 |
4 |
world |
5 |
6 |
7 |
8 |
hello |
9 |
10 |
11 |
12 |
- This page introduces pandas.read_table. If you encounter functions that you will not encounter, you can go to see the documentation of these functions.