background
In the process of supporting customers, we found that some customers encountered problems with BE service in the process of deploying and using StarRocks. It is generally unclear how to troubleshoot them? What indicators do you focus on? How much more is an unreasonable state, etc. From deployment to use, this article shares common problems and solutions of StarRocks BE service, hoping to help you better understand and use StarRocks.
Common problems and Solutions
deploy
The following are some check items deployed. You can also directly use the base in the appendix_ check. SH script
- Check whether the CPU supports AVX2 instruction set (because the vectorization engine of StarRocks currently depends on it)
#The output of this command indicates that the CPU supports avx2
cat /proc/cpuinfo |grep avx2
- Port check
If the following command has an output indicating that the default port has been occupied, you need to reconfigure the port in Fe Conf or be In conf
FE
ss -antpl|grep -E '8030|9010|9020|9030'
BE
ss -antpl|grep -E '9060|9050|8040|8060'
- Configure the necessary kernel parameters
Close swap
echo 0 | sudo tee /proc/sys/vm/swappiness
The kernel allows all physical memory to be allocated
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
BE service downtime
Currently starrocks2 Version 0 + has fixed the problem of OOM. It is recommended that users of the old version can upgrade to version 2.0 +
Hang up frequently
If the be service hangs up and still hangs up after pulling up, you can be in the be service with problems Add the following configuration to conf, first disable the compaction, and then pull up the be service again to see if it will hang.
max_compaction_concurrency=0
If the above operations are done, the BE service will not BE down due to OOM. It can BE confirmed that the reason is due to compaction, which may have imported a large amount of data before. You can try adjusting max_compaction_concurrency=1 let the previously imported data BE merged first (as shown in the figure, if the monitoring item of starrocks_fe_tablet_max_compaction_score is lower than 50, there is no pressure on the current compaction). After that, according to max_ compaction_ Adjust the parameter appropriately according to the memory occupation when concurrency = 1 (pay attention to observe the parameter after adjustment).
Error in query or import: there is no ScanNode backend
show backends view BE status
1) If the BE status is Alive, the BE node may BE hung. You can try to restart the BE service recovery
2) If the be status is False, check dmesg -T to determine whether the be service is down due to oom (log like this: out of memory: kill process XXX (stallocks_be))
a) If it's oom
- You can view / proc / sys / VM / overcommit first_ Whether memory is configured as 1. If not, modify the configuration to 1 first
- Available at BE Configure MEM in conf_ Limit = XX% limit the proportion of the maximum memory used by BE
- You can check the current parallelism (show variables like '%parallel_fragment_exec_instance_num') and single sql memory limit (show variables like '%exec_mem_limit'). The current single be memory usage will also be limited by parallelism * exec_ mem_ Limit, these two variables can be adjusted according to the memory of be node
b) If it is not oom, you need to check be Out log, in the official forum StarRocks database Forum Post or in https://github.com/StarRocks/starrocks/issues issue
appendix
#!/bin/bash function cpu_check(){ echo "" echo "############################ CPU inspect #############################" cat /proc/cpuinfo |grep avx2 2>&1 >/dev/null if [ $? -ne 0 ];then echo -e "\033[31mcpu not support vector\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function jdk_check(){ echo "" echo "############################ JDK inspect #############################" if [ -z $JAVA_HOME ];then echo -e "\033[31mJAVA_HOME not set\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function swap_check(){ echo "" echo "############################ swap inspect #############################" swap_number=$(cat /proc/sys/vm/swappiness) if [ $swap_number -ne 0 ];then echo -e "\033[31mswap not close,please \"echo 0 | sudo tee /proc/sys/vm/swappiness\"\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function kernel_check(){ echo "" echo "############################ Parameter check #############################" oom_number=$(cat /proc/sys/vm/overcommit_memory) if [ $oom_number -ne 1 ];then echo -e "\033[31mplease \"echo 1 | sudo tee /proc/sys/vm/overcommit_memory\",details in https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function fe_port_check(){ echo "" echo "############################ FE Port check #############################" # default port 8030,9010,9020,9030 ports=$(ss -antpl|grep -E '8030|9010|9020|9030'|wc -l) if [ $ports -ge 0 ];then echo -e "\033[31mFe ports already used,please use \"ss -antpl|grep -E '8030|9010|9020|9030'\" check and reconfig.\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function be_port_check(){ echo "" echo "############################ BE Port check #############################" # default port 9060,9050,8040,8060 ports=$(ss -antpl|grep -E '9060|9050|8040|8060'|wc -l) if [ $ports -ge 0 ];then echo -e "\033[31mBe ports already used,please use \"ss -antpl|grep -E '9060|9050|8040|8060'\" check and reconfig.\033[0m" else echo -e "\033[32msuccess\033[0m" fi } function check(){ cpu_check jdk_check swap_check kernel_check fe_port_check be_port_check } check