Scenarios and phenomena to be optimized
The project gives the flash gevent framework, which will periodically (5 seconds) write a json file to the file system
The file size is 40M
The phenomenon is that gevent will not switch the code of other processes until the file is written
principle
When reading and writing files, linux will block the threads that write files (regardless of whether the fd of the file is set to blocking or non blocking)
Because the io is always ready for files, the thread will always call write to write. Unlike the network io, it can detect the ready state asynchronously
If the project is a single threaded program, use gevent's collaboration to realize parallelism. Once a collaboration of gevent writes a file, the single thread of the whole process will be blocked, and gevent will not be given the opportunity to schedule the collaboration. Therefore, all the collaboration will get stuck
test
# coding=utf-8 from gevent.monkey import patch_all patch_all() import gevent import json import os import sys import redis AMOUNT = 300000 OUTPUT_FILE = 'test.json' r = redis.Redis(host='127.0.0.1', port=6379, decode_responses=True) dict = {} # Generate large json def generate_dict(): print 'begin generate dict of {} subject'.format(AMOUNT) for i in xrange(0, AMOUNT): dict[i] = {"avatar": "/static/upload/photo/2019-11-02/v2_0cc3325d6467d8ebadde2edc8f3c92aab409b87c.jpg", "birthday": None, "create_time": 1572669596, "department": "QA", "description": "", "end_time": None, "entry_date": None, "extra_id": None, "groups": [ 0 ], "id": 12268, "interviewee": "", "interviewee_pinyin": "", "inviter_id": None, "job_number": "", "name": "40735", "remark": "", "start_time": None, "subject_type": 0, "title": "", "wg_number": "" } print 'complete generate dict' # gevent another co process def foo(): for i in xrange(0, 30): gevent.sleep(0.1) print i sys.stdout.flush() # Write file in blocking mode def write_file1(): print 'begin write {}'.format(OUTPUT_FILE) with open(OUTPUT_FILE, 'w') as fp: json.dump(dict, fp) print 'complete write {}'.format(OUTPUT_FILE) # Write file in non blocking mode def write_file2(): print 'begin write {}'.format(OUTPUT_FILE) b = json.dumps(dict) fd = os.open(OUTPUT_FILE, os.O_CREAT | os.O_WRONLY | os.O_NONBLOCK) os.write(fd, b) os.close(fd) print 'complete write {}'.format(OUTPUT_FILE) # Write redis def write_redis(): print 'begin write {}'.format(OUTPUT_FILE) b = json.dumps(dict) r.set('storage_test', b) print 'complete write {}'.format(OUTPUT_FILE) if __name__ == '__main__': generate_dict() g1 = gevent.spawn(foo) gevent.sleep(1) # g2 = gevent.spawn(write_file1) # g2 = gevent.spawn(write_file2) g2 = gevent.spawn(write_redis) gevent.joinall([g1, g2])
The test program has two coroutines. One is the foo function, which will output numbers circularly (and gevent.sleep takes a very short time to give gevent a chance to schedule coroutines)
Another collaboration is the io collaboration (writing files or redis)
Here are the test results
Whether you use write_file1 (fd of blocking) or write_file2 (fd of non blocking), there will be no digital output in "begin write" and "complete write" in the log, which proves that the thread has been blocked and the coroutine will not be scheduled
But write_redis, the "begin write" and "complete write" in the log will be mixed with digital output, which proves that the network io can be called non blocking (epoll actually used). The thread will not block when io is not ready and will execute the scheduling of the process
conclusion
Try to avoid writing large files. Do not store files periodically. Organize the data into redis or mysql
reference resources
Explain why gevent's monkey patch does not automatically set the blocked file descriptor to non blocking
https://github.com/gevent/gevent/issues/1070
How to open a non blocking file descriptor
https://stackoverflow.com/questions/9259380/how-to-write-to-a-file-using-non-blocking-ioI
Is the write() function in C blocking or non-blocking? Depends on the parameters when creating the file descriptor
https://stackoverflow.com/questions/42449987/is-the-write-function-in-c-blocking-or-non-blocking
If fd is a file, even if it is created as non blocking, it will be blocked when writing and read ing
https://www.remlab.net/op/nonblock.shtm