- JMX_Expoter
- monitor
At present, CDH and HDP will merge closed source in the future. The company plans to spend time developing a similar platform. I have also made some efforts in cluster monitoring.
monitor
For a cluster management platform, the first thing to bear the brunt is how to realize the monitoring. After all, many times we open it only because we receive an alarm in the mailbox: -), so how should we obtain the information of Hadoop and other clusters? At this time, you need to understand a simple knowledge point: JMX.
Let's briefly introduce what Java's JMX is. The whole process of JMX is called Java Management Extensions, which translates to JAVA memory management. The most commonly used is the monitoring and management of JVM, such as JVM memory, CPU utilization, number of threads, garbage collection, etc. In addition, it can also be used for dynamic modification of log level. For example, log4j supports dynamic modification of log level of online services in JMX mode.
{% asset_img JMX architecture. The second in png is the last hot blood%}
In short, it is a tool developed by Java to monitor JVM indicators. It can be used by some interfaces JConsole and VisualVM. For details, please refer to here!
Get JMX information of the cluster
Both Hadoop and Hbase clusters provide a convenient way to obtain cluster jmx information by adding / jmx after the access address. For example, when we visit the NameNode page of hdfs, the address is localhost:50070, then adding / jmx after it is localhost:50070/jmx. Access to the following information.
{% asset_img jmx monitoring page. The second in png is the last hot blood%}
The specific indicator information in the above figure can be found in the corresponding Hadoop official document metrics chapter Found in, including Namenode and Datanode related information. Similarly, if we add / jmx after port 8088, we can get the related indicator information about Yarn.
Now we have the monitoring information. Next, if we want to display the data in the monitoring line chart, we need a time series database, because the monitoring index data must have the corresponding time to be meaningful. At present, the common time series database + interface combination is Prometheus (time series database) + Grafana (interface display), Then the current problem is how to transfer the JMX information in Hadoop cluster to Prometheus. A simple investigation shows that Prometheus has developed a plug-in to support the transfer of JMX information corresponding to java programs to his own timing database, Plug in address.
Download the plug-in according to your Java environment version and place the plug-in in the location you choose. Now that you have the plug-in, it's almost how to use the plug-in in the cluster. We begin to modify the configuration in the cluster.
Hadoop cluster configuration Jmx_expoter
Finally, add the following code in hadoop-env.sh. Note that the path in it is modified to the path in your own system. The main function of this code is to add the jar package we downloaded, transfer the configuration file to the jar package, and specify the port to be occupied by the service. The configuration file Prometheus in the corresponding location here_ When testing config.yml, you can directly create an empty file.
if ! grep -q <<<"$HDFS_NAMENODE_OPTS" jmx_prometheus_javaagent; then HDFS_NAMENODE_OPTS="$HDFS_NAMENODE_OPTS -javaagent:/usr/local/Cellar/hadoop/3.3.1/jmx_prometheus_javaagent-0.16.1.jar=27001:/usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/prometheus_config.yml" fi if ! grep -q <<<"$HDFS_DATANODE_OPTS" jmx_prometheus_javaagent; then HDFS_DATANODE_OPTS="$HDFS_DATANODE_OPTS -javaagent:/usr/local/Cellar/hadoop/3.3.1/jmx_prometheus_javaagent-0.16.1.jar=27002:/usr/local/Cellar/hadoop/3.3.1/libexec/etc/hadoop/prometheus_config.yml" fi
matters needing attention:
1. The above code cannot be written directly into the following mode, because it cannot be written in $Hadoop_ multiple -javaagent opts in opts means that multiple - javaagent options cannot appear directly. You must write - javaagent in if else code to avoid this problem. For details, please refer to this stackoverflow answer
#Writing this mode will report an error export HADOOP_NAMENODE_OPTS="$HADOOP_NAMENODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml" export HADOOP_DATANODE_OPTS="$HADOOP_DATANODE_OPTS -javaagent:/home/ec2-user/jmx_exporter/jmx_prometheus_javaagent-0.10.jar=9102:/home/ec2-user/jmx_exporter/prometheus_config.yml"
2. Each JMX service port of the same machine must be distinguished
For example, the port occupied by the jmx service of namenode is 27001, and the port occupied by the jmx service of datanode is 27002. If the same port is used, the following error will be reported when starting the hdfs service (. / start DFS. SH).
Starting namenodes on [localhost] Starting datanodes localhost: /usr/local/Cellar/hadoop/3.3.1/libexec/bin/../libexec/hadoop-functions.sh: line 1821: 11125 Abort trap: 6 hadoop_start_daemon "${daemonname}" "$class" "${pidfile}" "$@" >> "${outfile}" 2>&1 < /dev/null localhost: ERROR: Cannot set priority of datanode process 11125 localhost: ERROR: Cannot disconnect datanode process 11125
After this configuration is completed, the jmx information of Hadoop will be collected to the specified port. Next, we can test our collected data on the web page. The access address is the previously configured port localhost:27001
{% asset_img 27001.png second is the last hot blood%}
Similarly, we should also configure the above similar codes in yan-env.sh for Yan related information collection, in which we should also pay attention to distinguishing port numbers, and do not appear two - javaagents at the same time. Put the two javaagents in different if else.
Hbase cluster configuration
Because I am a stand-alone version of Hbase, I only configure HBASE_MASTER_OPTS and HBASE_JMX_BASE option. If it is in cluster mode, Hbase may also need to be configured_ REGIONSERVER_ Opts, replace the following with your own file path, and then add it to the end of hbase-env.sh.
export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=20101 -javaagent:$HBASE_HOME/lib/jmx_prometheus_javaagent-0.16.1.jar=27000:$HBASE_HOME/conf/hbase_jmx_config.yaml"
In this way, we can access the JMX information of each cluster through the specified port. The next step is to configure Prometheus to import the data into the timing database.
Configure Prometheus
Open the Prometheus configuration file, add the jmx data configuration of the NameNode, DataNode and Hbase of the Hadoop cluster, add the following code, and restart the Prometheus service.
- job_name: "hbase" static_configs: - targets: ["localhost:27000"] labels: instance: localhost - job_name: "hadoop namenode" static_configs: - targets: ["localhost:27001"] labels: instance: localhost - job_name: "hadoop datanode" static_configs: - targets: ["localhost:27002"] labels: instance: localhost
We open the Prometheus page to view the corresponding target and the tasks we configured. If the following options appear and are green, it is normal. Open the collection result website and find that the index name collected by Prometheus is processed compared with the index name of the original cluster 50070/jmx. For example, an index in Prometheus is called hadoop_namenode_memnonheapmaxm is named memnonheapmaxm in 50070/jmx, followed by service name, etc. the matching rule should be in the plug-in configuration file prometheus_config.yml. For details, see Plug in address.
{% asset_img prometheus.png second is the last hot blood%}
Finally, select the sum of the indicators we need and show it in Grafana. The specific methods are not shown here. You can refer to [this tutorial](