StarRocks BE service FAQ handling: This is enough

background

In the process of supporting customers, we found that some customers encountered problems with BE service in the process of deploying and using StarRocks. It is generally unclear how to troubleshoot them? What indicators do you focus on? How much more is an unreasonable state, etc. From deployment to use, this article shares common problems and solutions of StarRocks BE service, hoping to help you better understand and use StarRocks.

Common problems and Solutions

deploy

The following are some check items deployed. You can also directly use the base in the appendix_ check. SH script
  1. Check whether the CPU supports AVX2 instruction set (because the vectorization engine of StarRocks currently depends on it)
#The output of this command indicates that the CPU supports avx2
cat /proc/cpuinfo |grep avx2
  1. Port check
If the following command has an output indicating that the default port has been occupied, you need to reconfigure the port in Fe Conf or be In conf
FE
ss -antpl|grep -E '8030|9010|9020|9030'
BE
ss -antpl|grep -E '9060|9050|8040|8060'
  1. Configure the necessary kernel parameters
Close swap
echo 0 | sudo tee /proc/sys/vm/swappiness
The kernel allows all physical memory to be allocated
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
 

BE service downtime

Currently starrocks2 Version 0 + has fixed the problem of OOM. It is recommended that users of the old version can upgrade to version 2.0 +

Hang up frequently

If the be service hangs up and still hangs up after pulling up, you can be in the be service with problems Add the following configuration to conf, first disable the compaction, and then pull up the be service again to see if it will hang.
max_compaction_concurrency=0
If the above operations are done, the BE service will not BE down due to OOM. It can BE confirmed that the reason is due to compaction, which may have imported a large amount of data before. You can try adjusting max_compaction_concurrency=1 let the previously imported data BE merged first (as shown in the figure, if the monitoring item of starrocks_fe_tablet_max_compaction_score is lower than 50, there is no pressure on the current compaction). After that, according to max_ compaction_ Adjust the parameter appropriately according to the memory occupation when concurrency = 1 (pay attention to observe the parameter after adjustment).

Error in query or import: there is no ScanNode backend

show backends view BE status
1) If the BE status is Alive, the BE node may BE hung. You can try to restart the BE service recovery
2) If the be status is False, check dmesg -T to determine whether the be service is down due to oom (log like this: out of memory: kill process XXX (stallocks_be))
a) If it's oom
  • You can view / proc / sys / VM / overcommit first_ Whether memory is configured as 1. If not, modify the configuration to 1 first
  • Available at BE Configure MEM in conf_ Limit = XX% limit the proportion of the maximum memory used by BE
  • You can check the current parallelism (show variables like '%parallel_fragment_exec_instance_num') and single sql memory limit (show variables like '%exec_mem_limit'). The current single be memory usage will also be limited by parallelism * exec_ mem_ Limit, these two variables can be adjusted according to the memory of be node
b) If it is not oom, you need to check be Out log, in the official forum StarRocks database Forum Post or in https://github.com/StarRocks/starrocks/issues issue
 

appendix

#!/bin/bash


function cpu_check(){
    echo ""
    echo "############################ CPU inspect #############################"
    cat /proc/cpuinfo |grep avx2 2>&1 >/dev/null
    if [ $? -ne 0 ];then
        echo -e "\033[31mcpu not support vector\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function jdk_check(){
    echo ""
    echo "############################ JDK inspect #############################"
    if [ -z $JAVA_HOME ];then
        echo -e "\033[31mJAVA_HOME not set\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function swap_check(){
    echo ""
    echo "############################ swap inspect #############################"
    swap_number=$(cat /proc/sys/vm/swappiness)
    if [ $swap_number -ne 0 ];then
        echo -e "\033[31mswap not close,please \"echo 0 | sudo tee /proc/sys/vm/swappiness\"\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function kernel_check(){
    echo ""
    echo "############################ Parameter check #############################"
    oom_number=$(cat /proc/sys/vm/overcommit_memory)
    if [ $oom_number -ne 1 ];then
        echo -e "\033[31mplease \"echo 1 | sudo tee /proc/sys/vm/overcommit_memory\",details in https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function fe_port_check(){
    echo ""
    echo "############################ FE Port check #############################"
    # default port 8030,9010,9020,9030
    ports=$(ss -antpl|grep -E '8030|9010|9020|9030'|wc -l)
    if [ $ports -ge 0 ];then
        echo -e "\033[31mFe ports already used,please use \"ss -antpl|grep -E '8030|9010|9020|9030'\" check and reconfig.\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function be_port_check(){
    echo ""
    echo "############################ BE Port check #############################"
    # default port 9060,9050,8040,8060
    ports=$(ss -antpl|grep -E '9060|9050|8040|8060'|wc -l)
    if [ $ports -ge 0 ];then
        echo -e "\033[31mBe ports already used,please use \"ss -antpl|grep -E '9060|9050|8040|8060'\" check and reconfig.\033[0m"
    else
        echo -e "\033[32msuccess\033[0m"
    fi
}

function check(){
    cpu_check
    jdk_check
    swap_check
    kernel_check
    fe_port_check
    be_port_check
}

check

 

Keywords: OLAP

Added by deano2010 on Mon, 07 Mar 2022 09:55:53 +0200