demand
Now there is a file list. If you query directly in the text, the efficiency is very low and slow. Now you want to import it into redis and then use it later. How do you deal with this operation
functional requirement
The following functions need to be implemented
- Import
- ergodic
- delete
To query, you can directly get the key, so here is the traversal function
Implementation of import function
Using single process import
Single process import is to read the text line by line and then insert it into the database. Let's see how long it takes to implement this
Number of file entries
[root@lab102 ssd]# cat /root/chuli/file.list|wc -l 983040
[root@lab102 ssd]# time python3 filetoredis-rclone.py /root/chuli/file.list 0 Start time is : Thu Nov 25 11:31:10 2021 End time is : Thu Nov 25 11:33:31 2021 real 2m20.554s user 1m25.535s sys 0m31.026s
The code is as follows:
#! /usr/bin/env python3 # -*- coding:utf-8 -*- import sys import redis import time starttime = time.asctime( time.localtime(time.time()) ) inputfile=sys.argv[1] r=redis.StrictRedis(host='localhost',port=6379,db=0) def filekeytoredis(inputfile): for line in open(inputfile): r.set(line.strip('\n'),0) filekeytoredis(inputfile) endtime = time.asctime( time.localtime(time.time()) ) print ("Start time is :", starttime) print ("End time is :", endtime)
The operation of 980000 files took 2 minutes and 20 seconds to complete the import
Let's look at the implementation of multi process
Using multi process import
[root@lab102 pget]# time python3 filetoredis.py /root/chuli/file.list Start time is : Thu Nov 25 11:43:24 2021 End time is : Thu Nov 25 11:43:51 2021 real 0m27.962s user 2m18.184s sys 0m52.835s
The operation of 980000 files took 27 seconds to complete the import, much faster
The contents of the filetoredis.py script are as follows:
#! /usr/bin/env python3 # -*- coding:utf-8 -*- import sys import redis import time import multiprocessing starttime = time.asctime( time.localtime(time.time()) ) inputfile=sys.argv[1] def initialize(): global r r=redis.StrictRedis(host='localhost',port=6379,db=0) linekeys=[] def redisset(key): r.set(key,0) def filekeytoredis(inputfile): count=len(open(inputfile).readlines()) for line in open(inputfile): line=line.strip('\n') linekeys.append(line) count=count-1 if count < 20000: if count==0: pool = multiprocessing.Pool(20,initialize) pool.map(redisset, linekeys) pool.close() pool.join() else: pass else: if len(linekeys) == 20000: pool = multiprocessing.Pool(20,initialize) pool.map(redisset, linekeys) pool.close() pool.join() #print(linekeys) linekeys.clear() else: pass filekeytoredis(inputfile) endtime = time.asctime( time.localtime(time.time()) ) print ("Start time is :", starttime) print ("End time is :", endtime)
The above multi process is to read the file list, and then allocate the file list to multiple processes for related processing
Implementation of traversal function
Single process traversal
[root@lab102 pget]# time python3 checkredis.py 0 real 0m6.825s user 0m5.334s sys 0m0.135s
The script implementation is as follows:
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # This script is to clean up the query database, followed by the number of db import sys import redis import time starttime = time.asctime( time.localtime(time.time()) ) dbnumber=sys.argv[1] r=redis.StrictRedis(host='localhost',port=6379,db=dbnumber) cursor=0 while True: cursor,keys = r.scan(cursor,match="*",count=20000) for key in keys: #print(key) pass if cursor == 0: break
The traversal of 980000 key s was completed in the above 6 seconds
It's not clear how to implement multiple processes in this place. The main time itself is to get the cursor
Implementation of deletion function
Single process deletion
[root@lab102 pget]# time python3 cleanredis.py 0 real 2m14.760s user 1m21.221s sys 0m30.077s
The clearredis.py deletion script is as follows:
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # This script is to clean up the redis database, followed by the number of db import sys import redis import time starttime = time.asctime( time.localtime(time.time()) ) dbnumber=sys.argv[1] r=redis.StrictRedis(host='localhost',port=6379,db=dbnumber) cursor=0 while True: cursor,keys = r.scan(cursor,match="*",count=10000) for key in keys: r.delete(key) if cursor == 0: break
It takes 2 minutes and 14 seconds to delete 980000 in a single process
Multi process deletion
[root@lab102 rclone]# time python3 cleanredis.py 0 real 0m38.574s user 1m39.278s sys 0m37.553s
The script reads as follows
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # This script is to clean up the redis database, followed by the number of db import sys import redis import time import multiprocessing starttime = time.asctime( time.localtime(time.time()) ) dbnumber=sys.argv[1] r=redis.StrictRedis(host='localhost',port=6379,db=dbnumber) def keydelete(key): global r r.delete(key) cursor=0 while True: cursor,keys = r.scan(cursor,match="*",count=10000) pool = multiprocessing.Pool(10) pool.map(keydelete, keys) pool.close() pool.join() if cursor == 0: break
980000 entries, 38 seconds for multi process deletion
summary
It's really very fast to store some data through redis. Here, it's just import, traversal, deletion and other operations. If you query, you can directly get, usually key val. if you query key, you need to traverse it. Here, you just record some basic operations
In this article, it is realized through single process and multi process. You can see that multi process can still improve the speed