Java extension Nginx 7: shared memory

Welcome to my GitHub

Here we classify and summarize all the original works of Xinchen (including supporting source code): https://github.com/zq2599/blog_demos

Overview of this article

  • As the seventh article in the "Java extends Nginx" series, let's learn about a utility shared memory. Before we officially start, let's look at a problem
  • On a computer, nginx opens multiple worker s, as shown in the following figure. If we use nginx clojure at this time, it is equivalent to four jvm processes, which are independent of each other. Multiple requests for the same url may be processed by any of the four JVMs:
  • Now there is a requirement: count the total number of times a url is accessed. What should I do? Using global variables in java Memory certainly won't work, because there are four jvm processes responding to requests, and you can't save them on any one
  • Smart, you should think of redis. Indeed, redis can solve such problems, but if it does not involve multiple servers, but only single nginx, you can also consider another simple scheme provided by nginx clojure: shared memory. As shown in the figure below, on a computer, different processes operate the same memory area, and the total number of accesses can be put into this memory area:
  • Compared with redis, the benefits of shared memory are obvious:
  1. redis is an extra deployed service. Shared memory does not require additional deployment services
  2. redis requests to go through the network. Shared memory does not need to go through the network
  • Therefore, if the stand-alone version of nginx encounters the data synchronization problem of multiple workers, we can consider the shared memory scheme, which is also the main content of our actual battle today: when using nginx clojure for java development, use the shared memory to synchronize data among multiple workers

  • This paper consists of the following contents:

  1. First save the count in java memory and run it in a multi worker environment to verify that the problem of inaccurate count does exist
  2. Use the Shared Map provided by nginx clojure to solve the problem

Save count with heap memory

  • Write a content handler with the following code, use UUID to indicate the worker identity, and use requestCount to record the total number of requests. Add one for each request processed:
package com.bolingcavalry.sharedmap;

import nginx.clojure.java.ArrayMap;
import nginx.clojure.java.NginxJavaRingHandler;
import java.io.IOException;
import java.util.Map;
import java.util.UUID;
import static nginx.clojure.MiniConstants.CONTENT_TYPE;
import static nginx.clojure.MiniConstants.NGX_HTTP_OK;

public class HeapSaveCounter implements NginxJavaRingHandler {

    /**
     * The identity of the current jvm process is indicated by UUID
     */
    private String tag = UUID.randomUUID().toString();

    private int requestCount = 1;

    @Override
    public Object[] invoke(Map<String, Object> map) throws IOException {

        String body = "From "
                    + tag
                    + ", total request count [ "
                    + requestCount++
                    + "]";

        return new Object[] {
                NGX_HTTP_OK, //http status 200
                ArrayMap.create(CONTENT_TYPE, "text/plain"), //headers map
                body
        };
    }
}
  • Modify nginx Conf worker_ If the processes configuration is changed to auto, the number of workers will be automatically set according to the number of CPU cores of the computer:
worker_processes  auto;
  • nginx adds a location configuration. The service class is HeapSaveCounter just written:
location /heapbasedcounter {
	content_handler_type 'java';
    content_handler_name 'com.bolingcavalry.sharedmap.HeapSaveCounter';
}
  • Compile, build and deploy, and then start nginx. First look at the number of jvm processes. As shown below, there are eight jvm processes in addition to jps itself, which is equal to the number of CPU cores of the computer and the set worker_processes are compliant:
(base) willdeMBP:~ will$ jps
4944
4945
4946
4947
4948
4949
4950
4968 Jps
4943
  • First access / heapbasedcounter with Safari browser. The first response is shown in the figure below. The total number is 1:

  • When the page is refreshed, the UUID remains unchanged and the total number becomes 2, which means that two requests are sent to the JVM of the same worker:

  • Use the Chrome browser to access the same address, as shown in the following figure. This time, the UUID changes, proving that the request is processed by the jvm of another worker, and the total number becomes 1:

  • So far, the problem has been proved: when there are multiple workers, the count saved with the member variable of the jvm class is only the case of each worker, not the total number of the whole nginx

  • Next, let's look at how to solve this problem with shared memory

About shared memory

  • Nginx clojure provides two types of shared memory: Tiny Map and Hash Map. They are key & value storage. Keys and values can be of these four types: int, long, string and byte array
  • The differences between Tiny Map and Hash Map are shown in the following table. It can be seen that they are mainly quantitative restrictions and the amount of memory used:
characteristicTiny MapHash Map
Number of keys2^31=2.14Billions64 bit system: 2 ^ 63
32-bit system: 2 ^ 31
Maximum memory used64 bit system: 4G
32-bit system: 2G
Limited by operating system
Size of a single key16MLimited by operating system
Size of a single value64 bit system: 4G
32-bit system: 2G
Limited by operating system
Memory used by the entry object itself24 byte64 bit system: 40 byte s
32-bit system: 28 byte s
  • You can choose from using Tiny Map and Hash Map based on the above differences. For the actual combat of this article, using Tiny Map is enough
  • Next, enter the actual combat

Use shared memory

  • Using shared memory is divided into two steps, as shown in the following figure. Configure it first and then use it:
  • Now nginx Add an http configuration item shared in conf_ Map, which specifies that the name of the shared memory is uri_access_counters:
# Add an initialization allocation of shared memory, type tiny, space 1M and number of keys 8K
shared_map uri_access_counters  tinymap?space=1m&entries=8096;
  • Then write a new content handler. When receiving a request, the handler will update the number of requests in the shared memory. The total code is as follows. There are several important points to pay attention to, which will be mentioned later:
package com.bolingcavalry.sharedmap;

import nginx.clojure.java.ArrayMap;
import nginx.clojure.java.NginxJavaRingHandler;
import nginx.clojure.util.NginxSharedHashMap;
import java.io.IOException;
import java.util.Map;
import java.util.UUID;
import static nginx.clojure.MiniConstants.CONTENT_TYPE;
import static nginx.clojure.MiniConstants.NGX_HTTP_OK;

public class SharedMapSaveCounter implements NginxJavaRingHandler {

    /**
     * The identity of the current jvm process is indicated by UUID
     */
    private String tag = UUID.randomUUID().toString();

    private NginxSharedHashMap smap = NginxSharedHashMap.build("uri_access_counters");

    @Override
    public Object[] invoke(Map<String, Object> map) throws IOException {
        String uri = (String)map.get("uri");

        // Try to create a new key in shared memory and initialize its value to 1,
        // If the initialization is successful, the return value is 0,
        // If the return value is not 0, it indicates that the key already exists in the shared memory
        int rlt = smap.putIntIfAbsent(uri, 1);

        // If rlt is not equal to 0, it means that the key already exists in the shared memory before calling putIntIfAbsent,
        // All you have to do is add one,
        // If relt is equal to 0, change rlt to 1, indicating that the total number of accesses has been equal to 1
        if (0==rlt) {
            rlt++;
        } else {
            // Atomicity plus one, so that it will be executed in sequence when concurrent
            rlt = smap.atomicAddInt(uri, 1);
            rlt++;
        }

        // The returned body content should reflect the identity of the JVM and the count in the share map
        String body = "From "
                + tag
                + ", total request count [ "
                + rlt
                + "]";

        return new Object[] {
                NGX_HTTP_OK, //http status 200
                ArrayMap.create(CONTENT_TYPE, "text/plain"), //headers map
                body
        };
    }
}
  • Detailed comments have been added to the above code. I'm sure you can understand it at a glance. Here are some key explanations:
  1. One thing to keep in mind when writing the above code: this code may run in a high concurrency scenario, that is, different processes and different threads are executing this code at the same time
  2. NginxSharedHashMap class is a subclass of ConcurrentMap, so it is thread safe. We should pay more attention to the synchronization of cross process reading and writing. For example, the third and fourth points to be mentioned next are the synchronization problems to be considered when multiple processes execute this code at the same time
  3. putIntIfAbsent is similar to setnx of redis. It can be used as a cross process distributed lock. It will be set successfully only when the specified key does not exist. At this time, it returns 0. If the return value is not equal to 0, it indicates that the key already exists in the shared memory
  4. atomicAddInt ensures atomicity. When multiple processes are concurrent, using this method to accumulate can ensure accurate calculation (if we write code ourselves, first read, then accumulate, and then write, we will encounter the problem of concurrent coverage)
  5. About the atomicAddInt method, let's recall the AtomicInteger class of java. Its incrementAndGet method can calculate accurately in the scenario of simultaneous invocation of multiple threads. That's because CAS is used to ensure it. What about nginx clojure? I was curious to explore the implementation of this method. This is a piece of C code. Finally, I didn't see the loop related to CAS, but only the simplest accumulation, as shown in the following figure:
  6. Obviously, when multiple processes execute at the same time, the code in the above figure will have the problem of data coverage, so there are only two possibilities. The first is that even if multiple worker s exist, there is only one process executing the underlying shared memory operation
  7. The second: Xinchen's C language level is not good. She doesn't understand the logic of JVM calling C at all. She feels that this possibility is great: if the C language level is good, Xinchen will use C to do nginx extension. There's no need to study nginx clojure! (if you understand the calling logic of this code, please give me some advice. Thank you.)
  • Coding is complete in nginx Configure a location on conf and use SharedMapSaveCounter as the content handler:
location /sharedmapbasedcounter {
    content_handler_type 'java';
 	content_handler_name 'com.bolingcavalry.sharedmap.SharedMapSaveCounter';
}
  • Compile, build and deploy, restart nginx
  • First use Safari browser to access / sharedmapbasedcounter. The first response is shown in the figure below. The total number is 1:
  • When the page is refreshed, the UUID changes, which proves that the request goes to another worker, and the total number becomes 2, which means that the shared memory takes effect. Different processes use the same variable to calculate data:
  • Use the Chrome browser to access the same address. As shown in the figure below, the UUID changes again, proving that the request is processed by the jvm of the third worker, but the number of accesses is always correct:
  • The actual combat is completed. In the previous code, only two API s are used to operate the shared memory. The knowledge points learned are limited. Next, do some appropriate extended learning

One point extension

  • As mentioned earlier, NginxSharedHashMap is a subclass of ConcurrentMap. The commonly used put and get methods operate the heap memory of the current process in ConcurrentMap. If NginxSharedHashMap directly uses these methods of the parent class, isn't it irrelevant to the shared memory?
  • With this question, go to the source code of nginxshardhashmap, as shown in the following figure. The truth is clear: the common methods of get and put have been rewritten. The get and nputNumber in the red box are both native methods, which are operating shared memory:
  • So far, the shared memory learning of nginx clojure has been completed, and there are more lightweight schemes for cross process data synchronization in high concurrency scenarios. As for whether to use it or redis, I believe you have a final conclusion in your mind

Source download

  • The complete source code of Java extension Nginx can be downloaded from GitHub. The address and link information are shown in the table below( https://github.com/zq2599/blog_demos):
namelinkremarks
Project Home https://github.com/zq2599/blog_demosThe project is on the home page of GitHub
git warehouse address (https)https://github.com/zq2599/blog_demos.gitThe warehouse address of the source code of the project, https protocol
git warehouse address (ssh)git@github.com:zq2599/blog_demos.gitThe warehouse address of the source code of the project, ssh protocol
  • There are multiple folders in this git project. The source code of this article is in the shared map demo sub project under the nginx clojure tutorials folder, as shown in the red box below:
  • This article involves nginx For the modification of conf, the complete reference is here: https://raw.githubusercontent.com/zq2599/blog_demos/master/nginx-clojure-tutorials/files/nginx.conf

You're not alone. Xinchen's original accompanies you all the way

  1. Java series
  2. Spring series
  3. Docker series
  4. kubernetes series
  5. Database + middleware series
  6. DevOps series

Keywords: Java Nginx

Added by hoyo on Sun, 20 Feb 2022 20:55:44 +0200