Feel that the number of PHP-FPM processes is not enough?

[TOC]

Feel that the PHP-FPM process is not enough?

As a phper, the most used architecture is LNMP. Every time the traffic comes, our service will change from hundreds of milliseconds to a few seconds. At this time, we speculate that mysql has slow sql, redis has large key s, and the number of PHP FPM processes is not enough. The above situation can be checked through some business logs. What we mainly prove this time is the practice of insufficient PHP FPM processes.

Reproduce the scene

  1. Adjust the number of my local PHP-FPM processes to 2

    #vim /etc/php-fpm.d/www.conf
    
    pm = static
    pm.max_children = 2
  1. Use ab to pressure test the interface

    $ ab -c 40  -n 3000 http://127.0.0.1/group/check_groups
    
    Server Software:        nginx/1.16.0
    Server Hostname:        miner_platform.cn
    Server Port:            80
    
    Document Path:          /group/check_groups
    Document Length:        44 bytes
    
    Concurrency Level:      40
    Time taken for tests:   29.384 seconds
    Complete requests:      3000
    Failed requests:        0
    Write errors:           0
    Total transferred:      699000 bytes
    HTML transferred:       132000 bytes
    Requests per second:    102.10 [#/sec] (mean)
    Time per request:       391.788 [ms] (mean)
    Time per request:       9.795 [ms] (mean, across all concurrent requests)
    Transfer rate:          23.23 [Kbytes/sec] received
    
    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.2      0       3
    Processing:   306  344  80.6    318    3558
    Waiting:      306  343  80.5    318    3555
    Total:        307  344  80.6    318    3558
    
    Percentage of the requests served within a certain time (ms)
      50%    318
      66%    322
      75%    333
      80%    369
      90%    428
      95%    461
      98%    508
      99%    553
     100%   3558 (longest request)
    

Try to solve the problem

1. PHP-FPM STATUS

We found interfaces from 318ms to 3.558s. How do we know that there are not enough PHP FPM processes to cause this problem? In other words, is there any way to let us know that PHP FPM cannot be handled internally? At this time, we need to open PHP FPM built-in status. Detailed steps refer to: www.cnblogs.com/tinywan/p/6848269....

$ curl http://127.0.0.1/status.php

pool:                 www
process manager:      static
start time:           29/Nov/2021:18:27:38 +0800
start since:          6493
accepted conn:        3136
listen queue:         38
max listen queue:     39
listen queue len:     128
idle processes:       0
active processes:     2
total processes:      2
max active processes: 2
max children reached: 0
slow requests:        0

Please refer to the links above for details. We will mainly talk about the following parameters

  • listen queue: This is the number of PHP FPM servers in the accept queue.
  • max listen queue: the maximum number of waiting connections since the PHP FPM process was started (to put it bluntly, it is the maximum persistence of the listen queue we mentioned above)
  • listen queue len: students with socket network programming experience know it. int listen(int sockfd, int backlog); Yes, this parameter can be set, but it is related to system settings.

2. netstat view link status

Our conclusion is that when the PHP FPM process can't handle it, the request will be placed in the accept queue. After knowing this, we don't even need to pass status.

  • The first line represents the listening socket, and Recv-Q represents the length of the accept queue.
$netstat -antp | grep php-fpm

tcp       38      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      97/php-fpm: master  
tcp        8      0 127.0.0.1:9000          127.0.0.1:55540         ESTABLISHED 964/php-fpm: pool w 
tcp        8      0 127.0.0.1:9000          127.0.0.1:55536         ESTABLISHED 965/php-fpm: pool w

To sum up, we know that when the number of PHP-FPM processes is insufficient, the accept queue length of the connection requested by nginx clients will become larger. Is that all? No, we still need to analyze why we can get this phenomenon.

Principle analysis

Briefly describe the working process of PHP-FPM

First, we need to briefly talk about the working process of PHP FPM. Let's simply model its pseudo code (here only to describe the whole socket process)

// 1. Create socket
$socket = socket_create(AF_INET, SOCK_STREAM, 0);
// 2. Bind socket
socket_bind($socket, "0.0.0.0", 9000);
// 3. Monitor socket
socket_listen($socket, 5);

for($i=0;$i<2;$i++) {
    $pid = pcntl_fork()
    // 4. Create 2 processes
    if ($pid == 0) {

        // 5. Sub process accepts socket
        while($fd = socket_accept($socket)) {
            echo "client ${fd}connect" . PHP_EOL;
            $tmp = socket_read($fd, 1024);
            echo "client data:" . $tmp . PHP_EOL;
            $data = "HTTP/1.1 200 ok\r\nContent-Length:2\r\n\r\nhi";
            socket_write($fd, $data, strlen($data));
        }    
        exit;
    }
}

// 5. Listen for subprocess exit
// Other TODO
  1. The master process creates a listening socket, but does not process business
  2. The work process accepts the synchronous blocking request (blocked in accept), and then processes the business.

Grab nginx - > PHP FPM socket

We know the working process of PHP FPM. At this time, we need to know the interaction process between nginx and PHP FPM through a request.

$curl http://miner_platform.cn/group/check_groups
{"code":10006,"message":"sign\u65e0\u6548."}
  1. nginx system call

    All the points needing attention are annotated in this. What is captured is the nginx work process

     $ strace -f -s 64400 -p 958
     strace: Process 958 attached
     epoll_wait(8, [{EPOLLIN, {u32=1226150064, u64=94773974503600}}], 512, -1) = 1
     accept4(6, {sa_family=AF_INET, sin_port=htons(46616), sin_addr=inet_addr("127.0.0.1")}, [112->16], SOCK_NONBLOCK) = 3
     epoll_ctl(8, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=1226159737, u64=94773974513273}}) = 0
     epoll_wait(8, [{EPOLLIN, {u32=1226159737, u64=94773974513273}}], 512, 60000) = 1
     recvfrom(3, "GET /group/check_groups HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: miner_platform.cn\r\nAccept: */*\r\n\r\n", 1024, 0, NULL, NULL) = 99
     stat("/data/miner_platform/src/public/group/check_groups", 0x7ffcb593d1b0) = -1 ENOENT (No such file or directory)
     stat("/data/miner_platform/src/public/group/check_groups", 0x7ffcb593d1b0) = -1 ENOENT (No such file or directory)
     epoll_ctl(8, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1226159737, u64=94773974513273}}) = 0
     lstat("/data", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
     lstat("/data/miner_platform", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
     lstat("/data/miner_platform/src", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
     lstat("/data/miner_platform/src/public", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
     getsockname(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
     // 1. Create socket    
     socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 11
     ioctl(11, FIONBIO, [1])                 = 0
     epoll_ctl(8, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1226163953, u64=94773974517489}}) = 0
     // 2. Connection 127.0.0.1:9000    
     connect(11, {sa_family=AF_INET, sin_port=htons(9000), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)    
     epoll_wait(8, [{EPOLLOUT, {u32=1226159737, u64=94773974513273}}, {EPOLLOUT, {u32=1226163953, u64=94773974517489}}], 512, 60000) = 2
     getsockopt(11, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
     // 3. Write this request according to FASTCGI protocol     
     writev(11, [{iov_base="\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\2!\7\0\17)SCRIPT_FILENAME/data/miner_platform/src/public/index.php\f\0QUERY_STRING\16\3REQUEST_METHODGET\f\0CONTENT_TYPE\16\0CONTENT_LENGTH\v\nSCRIPT_NAME/index.php\v\23REQUEST_URI/group/check_groups\f\nDOCUMENT_URI/index.php\r\37DOCUMENT_ROOT/data/miner_platform/src/public\17\10SERVER_PROTOCOLHTTP/1.1\16\4REQUEST_SCHEMEhttp\21\7GATEWAY_INTERFACECGI/1.1\17\fSERVER_SOFTWAREnginx/1.16.0\v\tREMOTE_ADDR127.0.0.1\v\5REMOTE_PORT46616\v\tSERVER_ADDR127.0.0.1\v\2SERVER_PORT80\v\21SERVER_NAMEminer_platform.cn\17\3REDIRECT_STATUS200\17\vHTTP_USER_AGENTcurl/7.29.0\t\21HTTP_HOSTminer_platform.cn\v\3HTTP_ACCEPT*/*\0\0\0\0\0\0\0\1\4\0\1\0\0\0\0\1\5\0\1\0\0\0\0", iov_len=592}], 1) = 592
     epoll_wait(8, [{EPOLLIN|EPOLLOUT, {u32=1226163953, u64=94773974517489}}], 512, 60000) = 1
     // 4. Accept PHP-FPM response results   
     recvfrom(11, "\1\6\0\1\0\257\1\0X-Powered-By: PHP/7.2.16\r\nCache-Control: no-cache, private\r\nDate: Wed, 01 Dec 2021 12:24:52 GMT\r\nContent-Type: application/json\r\n\r\n{\"code\":10006,\"message\":\"sign\\u65e0\\u6548.\"}\0\1\3\0\1\0\10\0\0\0\0\0\0\0\"}\0", 4096, 0, NULL, NULL) = 200
     epoll_wait(8, [{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=1226163953, u64=94773974517489}}], 512, 60000) = 1
     readv(11, [{iov_base="", iov_len=3896}], 1) = 0
     // 5. Close the socket connection    
     close(11)                               = 0
     // 6. Respond to the browser    
     writev(3, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1.16.0\r\nContent-Type: application/json\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nX-Powered-By: PHP/7.2.16\r\nCache-Control: no-cache, private\r\nDate: Wed, 01 Dec 2021 12:24:52 GMT\r\n\r\n", iov_len=222}, {iov_base="2c\r\n", iov_len=4}, {iov_base="{\"code\":10006,\"message\":\"sign\\u65e0\\u6548.\"}", iov_len=44}, {iov_base="\r\n", iov_len=2}, {iov_base="0\r\n\r\n", iov_len=5}], 5) = 277
     write(5, "127.0.0.1 - - [01/Dec/2021:20:24:52 +0800] \"GET /group/check_groups HTTP/1.1\" 200 55 \"-\" \"curl/7.29.0\" \"-\" 1.029 127.0.0.1:9000 200 1.030\n", 138) = 138
     setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
     epoll_wait(8, [{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=1226159737, u64=94773974513273}}], 512, 65000) = 1
     recvfrom(3, "", 1024, 0, NULL, NULL)    = 0
     close(3)                                = 0
     epoll_wait(8, 
  2. PHP FPM system call

    Grab PHP FPM work process

    // 1. accept received the data sent by nginx(127.0.0.1:45512) client
    965   accept(9, {sa_family=AF_INET, sin_port=htons(45512), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 4
     Many are omitted in the middle
    // 2. Respond to the client
    965   write(4, "\1\6\0\1\0\257\1\0X-Powered-By: PHP/7.2.16\r\nCache-Control: no-cache, private\r\nDate: Wed, 01 Dec 2021 12:37:18 GMT\r\nContent-Type: application/json\r\n\r\n{\"code\":10006,\"message\":\"sign\\u65e0\\u6548.\"}\0\1\3\0\1\0\10\0\0\0\0\0\0\0p\0\0", 200) = 200
    // 3. Do not write data to this socket
    965   shutdown(4, SHUT_WR)              = 0
    // 4. Accept nginx(127.0.0.1:45512) client data 
    965   recvfrom(4, "\1\5\0\1\0\0\0\0", 8, 0, NULL, NULL) = 8
    // 5. Accept nginx(127.0.0.1:45512) client data 
    965   recvfrom(4, "", 8, 0, NULL, NULL) = 0
    // 6. Close this connection
    965   close(4)                          = 0
    965   lstat("/data/miner_platform/src/vendor/composer/../../app/Http/Middleware/BusinessHeaderCheck.php", {st_mode=S_IFREG|0777, st_size=989, ...}) = 0
    965   stat("/data/miner_platform/src/app/Http/Middleware/BusinessHeaderCheck.php", {st_mode=S_IFREG|0777, st_size=989, ...}) = 0
    965   chdir("/")                        = 0
    965   times({tms_utime=3583, tms_stime=1977, tms_cutime=0, tms_cstime=0}) = 4315309933
    965   setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, NULL) = 0
    965   fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
    965   setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, NULL) = 0
    965   accept(9, 

TCP triple handshake

We have made it clear that the process is the same when the request concurrency is high. At this time, we lead to the following figure, which is the same as the process described above, but details the process of three handshakes. At this time, we introduce sync queue and accept queue.

  1. We call listen (the above is executed by the PHP FPM master process), and at the same time, the kernel creates two queues, sync queue and accept queue
  2. Step 2: after the Server (referring to the PHP FPM master process) sends the SYN+ACK message, this information will be put into the sync queue
  3. When the three handshakes are completed, the connection queue that is not taken away by the application (referring to the PHP FPM work process) calling accept. At this time, the socket is in the ESTABLISHED state. Each time the application calls the accept() function, the connection of the queue header will be removed. If the queue is empty, accept () usually blocks. A fully connected queue is also called an accept queue.

conclusion

After the above analysis, we know what sync queue and accept queue are. Application and accept queue and kernel are a production and consumption model. The kernel is the producer, the accept queue stores queue information, and the application is the consumer. Students who have used queues know that when concurrency is high, there will be more data in the queue, or the slow consumption of producers will lead to slower and slower connection processing. Therefore, the usual approach is to increase consumers and improve consumption speed. This also coincides with our above phenomenon.

Keywords: Laravel php-fpm

Added by Soumen on Thu, 09 Dec 2021 02:21:58 +0200