01 python reads files and packages

Read files using open

.read()
- Using open, you can read files without requiring any packages
- r means read.
- If the file and notebook are in the same folder, you just need to write the name of the file instead of filling in the full path.
- file.read means to read everything in this file

file = open('data.txt','r')
print(file.read())             #Read everything in this file

,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7

file = open('data.txt','r')
print(file.read(5))           #Read the first five characters in this file

,A,B,

.readlines()
- If you change. read to. readlines, there will be different effects:
- Put everything in a list, and each item in it equals every line in the original file, means a new line. Read all content including format. It can read the content of a specified line, but. read can't.

file = open('data.txt','r')
print(file.readlines())          #Read the entire content as a list

[',A,B,C,D\n', '0,foo,one,small,1\n', '1,foo,one,large,2\n', '2,foo,one,large,8\n', '3,foo,two,small,3\n', '4,foo,two,small,3\n', '5,bar,one,large,4\n', '6,bar,one,small,5\n', '7,bar,two,small,6\n', '8,bar,two,large,7\n']

file = open('data.txt','r')
print(file.readlines()[5])         #Read the specified line 5

4,foo,two,small,3

Operate on the contents when reading the data (using the for loop)

file = open('data.txt','r') 
i =1
for line in file:
    print('read line',i) 
    i = i+1 
    print(line)              #Automatic line change

read line 1
,A,B,C,D

read line 2
0,foo,one,small,1

read line 3
1,foo,one,large,2

read line 4
2,foo,one,large,8

read line 5
3,foo,two,small,3

read line 6
4,foo,two,small,3

read line 7
5,bar,one,large,4

read line 8
6,bar,one,small,5

read line 9
7,bar,two,small,6

read line 10
8,bar,two,large,7

Read other forms of files
- csv

file = open('data.csv','r') 
i =1
for line in file:
    print('read line',i) 
    i = i+1 
    print(line)

read line 1
,A,B,C,D

read line 2
0,foo,one,small,1

read line 3
1,foo,one,large,2

read line 4
2,foo,one,large,8

read line 5
3,foo,two,small,3

read line 6
4,foo,two,small,3

read line 7
5,bar,one,large,4

read line 8
6,bar,one,small,5

read line 9
7,bar,two,small,6

read line 10
8,bar,two,large,7

Write to a file

w denotes write
Be sure to write file.close() to indicate that you're done.
\ n denotes newline

file = open('hello.txt','w')
file.write('Hello World!')
file.close()

Use pandas to read files

Open is the most convenient way to open the txt file. If there is a lot of data, pandas is usually used to read it.
pandas can restore the form of data written in csv.
There's an extra column of Uname:0 because pandas give an index and the table itself has an index. If you want to eliminate this, add index_col=0.

import pandas as pd
df = pd.read_csv('data.csv')
df

	Unnamed: 0	A	B	C	D
0	0	foo	one	small	1
1	1	foo	one	large	2
2	2	foo	one	large	8
3	3	foo	two	small	3
4	4	foo	two	small	3
5	5	bar	one	large	4
6	6	bar	one	small	5
7	7	bar	two	small	6
8	8	bar	two	large	7

import pandas as pd
df = pd.read_csv('data.csv',index_col=0)
df

	A	B	C	D
0	foo	one	small	1
1	foo	one	large	2
2	foo	one	large	8
3	foo	two	small	3
4	foo	two	small	3
5	bar	one	large	4
6	bar	one	small	5
7	bar	two	small	6
8	bar	two	large	7

Read excel file

import pandas as pd
df = pd.read_excel('data.xlsx')
df

	A	B	C	D
0	foo	one	small	1
1	foo	one	large	2
2	foo	one	large	8
3	foo	two	small	3
4	foo	two	small	3
5	bar	one	large	4
6	bar	one	small	5
7	bar	two	small	6
8	bar	two	large	7

Read txt file

import pandas as pd
df = pd.read_table('data.txt')
df

	,A,B,C,D
0	0,foo,one,small,1
1	1,foo,one,large,2
2	2,foo,one,large,8
3	3,foo,two,small,3
4	4,foo,two,small,3
5	5,bar,one,large,4
6	6,bar,one,small,5
7	7,bar,two,small,6
8	8,bar,two,large,7

import pandas as pd
df = pd.read_table('data.txt',index_col=0)          #Delete the index of the first column
df


,A,B,C,D
0,foo,one,small,1
1,foo,one,large,2
2,foo,one,large,8
3,foo,two,small,3
4,foo,two,small,3
5,bar,one,large,4
6,bar,one,small,5
7,bar,two,small,6
8,bar,two,large,7

import pandas as pd
df = pd.read_table('data.txt',sep = ',',index_col=0)          #Delete the index of the first column
df

	A	B	C	D
0	foo	one	small	1
1	foo	one	large	2
2	foo	one	large	8
3	foo	two	small	3
4	foo	two	small	3
5	bar	one	large	4
6	bar	one	small	5
7	bar	two	small	6
8	bar	two	large	7

Storage file

df.to_excel('dat.xlsx')
df.to_csv('dat.csv')
df.to_csv('dat.txt')

Read txt in complex format

Skprows means that several lines are skipped and not read
header=None denotes that the name of each column of data does not exist in the data, starting from 0.
names denote their own naming
nrows=5 denotes five five readings

df = pd.read_table('data1.txt',sep = ',')
df

				# real data
#num1	num2	num3	num4	message
# good data	NaN	NaN	NaN	NaN
# csv file	NaN	NaN	NaN	NaN
1	2	3	4	hello
5	6	7	8	world
9	10	11	12	hello
1	2	3	4	hello
5	6	7	8	good
9	10	11	12	fine

df = pd.read_table('data1.txt',sep = ',',skiprows = [0,1,2,3],header = None,names = ['n1','n2','n3','n4','message'], index_col = ['message'],nrows = 5)
df

	n1	n2	n3	n4
message
hello	1	2	3	4
world	5	6	7	8
hello	9	10	11	12
hello	1	2	3	4
good	5	6	7	8

df = pd.read_table('data1.txt',sep = ',',skiprows = [0,2,3], index_col = ['message'],nrows = 3)         #Read only the first three lines
df

	#num1	num2	num3	num4
message
hello	1	2	3	4
world	5	6	7	8
hello	9	10	11	12

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html

This page introduces pandas.read_table. If you encounter functions that you will not encounter, you can go to see the documentation of these functions.

Keywords: Big Data Excel

Added by johnbrayn on Sun, 19 May 2019 11:04:43 +0300

Programming VIP

01 python reads files and packages

Read files using open

Write to a file

Use pandas to read files

Read txt in complex format

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html

Popular Keywords