Flash + gunicorn enables web services to call Python/pytorch programs concurrently to solve the problem of multithreading / multiprocessing

Flash + gunicorn enables web services to call Python / Python programs concurrently to solve the problem of multithreading / multiprocessing

Project scenario:

Project requirements: forward the client's request to the flash program instance via the Web server and call the python / Python program.
Problem Description: since the flame framework defaults to the single process and single thread blocking task mode, in order to achieve concurrency, you can use gunicorn to deploy the flame service. Here, the deployment of python applications is implemented with Flask+gunicorn. Similarly, it can be migrated to the deployment of machine learning and deep learning models such as python and tensorflow.
Environment preparation: pip install gunicorn

Solution:

1. Using Flask framework to realize Web service calling Python program

Flask is a lightweight Web application framework written in Python.
For the installation and use of Flask, please refer to the quotation below the code. This article only gives an example of a simple application.
main.py

# main.py
from flask import Flask
app = Flask(__name__)
@app.route('/predict/')
def index():
   return 'this server is running on port:5000, url is predict'
   # Any operation can be implemented in the index function, such as machine learning / deep learning model operation, etc

if __name__=='__main__':
   app.debug = True
   app.run(host="0.0.0.0", port=5000)

Flash implements Web services calling Python programs

Run main Py, send the request with postman http://0.0.0.0:6001/predict/ , it will print this server is running on port:5000, url is predict
As shown in the figure below:

Since the task mode of single process and single thread blocking is the default of the flash framework, in order to achieve concurrency, you can use gunicorn to deploy the flash service.

2. gunicorn implements concurrency

gunicorn supports three parameter settings, mainly using command line and config file.
(1) Command line: specify parameters directly on the command line:

gunicorn -w 5 --threads 4 -b 0.0.0.0:8000 main:app --reload

Concurrency can be achieved after entering instructions. Where, main refers to the python file of the flash application, and app refers to the flash application.
The parameters of gunicorn are explained in detail as follows:

-c CONFIG : CONFIG,The path of the configuration file, which is started through the configuration file; Production environment use;

-b ADDRESS : ADDRESS，ip Add ports to bind the running host;

-w INT, --workers INT: The number of processes used to process work. It is a positive integer. The default value is 1;

-k STRTING, --worker-class STRTING: The working mode to be used. The default is sync Asynchronous, can download eventlet and gevent And specify

--threads INT: The number of worker threads processing the request, running each with the specified number of threads worker. Is a positive integer. The default value is 1.

--worker-connections INT: The maximum number of concurrent clients is 1000 by default.

--backlog int: The maximum number of pending connections, that is, the number of customers waiting to be serviced. 2048 by default, generally not modified;

-p FILE, --pid FILE: set up pid The file name of the file. If it is not set, it will not be created pid file

--access-logfile FILE : Access log directory to write to

--access-logformat STRING: Access log format to write

--error-logfile FILE, --log-file FILE : The file directory to write the error log to.

--log-level LEVEL : Error log output level.

--limit-request-line INT : HTTP The maximum size of the number of rows in the request header. This parameter is used to limit HTTP The allowable size of the request line. By default, this value is 4094. The value is 0~8190 Number of.

--limit-request-fields INT : limit HTTP The number of request header fields in the request. This field is used to limit the number of request header fields to prevent DDOS Attack. By default, this value is 100 and cannot exceed 32768

--limit-request-field-size INT : limit HTTP The size of the request header in the request. By default, this value is 8190 bytes. The value is an integer or 0. When the value is 0, it means that the size of the request header will not be limited

-t INT, --timeout INT: After so many seconds, the work will be killed and restarted. Generally set to 30 seconds;

--daemon: Whether to start as a daemon, default false；

--chdir: Switch directories before loading applications;

--graceful-timeout INT: By default, this value is 30, after timeout(From the time the restart signal is received)Work that is still alive will be forcibly killed; Generally, default is used;

--keep-alive INT: stay keep-alive The number of seconds to wait for a request on the connection. By default, the value is 2. Generally set at 1~5 Seconds.

--reload: Default to False. This setting is used for development and causes work to restart whenever an application changes.

--spew: Print every statement executed by the server. By default False. This selection is atomic, that is, either print all or not print all;

--check-config : Displays the current configuration. The default value is False，Is displayed.

-e ENV, --env ENV: Setting environment variables;

(2) Start as profile
Configuration file config Py is as follows:

# coding:utf-8
# config.py
import os
import multiprocessing
bind = '127.0.0.1:8000'      #Bind ip and port numbers
backlog = 512                #listen queue 
# chdir = '/home/test/server/bin'    # gunicorn the destination working directory to switch to
timeout = 30                 #overtime
worker_class = 'gevent'      # Using gevent mode, you can also use sync mode, which is the default

workers = multiprocessing.cpu_count() * 2 + 1    # Number of processes
threads = 4     # Specifies the number of threads open per process
daemon = True   # How to run in the background

Concurrency can be achieved by executing the following instructions:

gunicorn -c confing.py main:app

Where main refers to the python file of the flash application, and app refers to the flash application. Note that config. Is used here Port 8000 in py configuration file instead of main Py, so you can also use Python main Py starts the flash application.

Reference:

Flash implements Web services calling Python programs

Configuration of gunicorn in python

Deploy Python / Python application with Flash + gunicorn + nginx

In the following, we will add a pit about multiple concurrent web service calls in the pytoch model~

Keywords: Python Front-end Linux Concurrent Programming Flask