Reproducing pointrcnn + Ubuntu 16.043080 graphics card + pytorch1.7.1+cu110

Reproduce pointrcnn

         In the process of reproducing pointrcnn, the easiest place to report errors is to compile CUDA code. Most of the issue s of github emphasize the version problems of gcc and pytorch, but I use 3080 graphics card, which only supports > = cuda11.0, so installing a lower version of pytorch will still report errors. The following is my reproduction process.

1. Configuration environment

  • ubuntu: 16.04
  • gcc: 5.4.0
  • Graphics card driver: 470.63.01
  • cuda: 11.0
  • cudnn: 8.0.1
  • pytorch: 1.7.1+cu11

ubuntu and gcc

         Ubuntu 18.04 can't install gcc5.4.0, so I reinstalled the system when I was confused at first. After running through, I felt that it was unnecessary to reinstall the system. So if your system is not Ubuntu 16.04, you can not consider reinstalling the system and changing the GCC Version (but I haven't tried. If you succeed, you can write it in the comment area).

        View gcc version

gcc -v

Graphics card driver

        The graphics card driver is downward compatible with cuda, so the higher the better

cuda

        Because the computing power of 3080 is 8.6, and CUDA 10.2 only supports 7.5, you must install CUDA 11.0 or above

cudnn

        This rarely goes wrong, as long as it matches cuda

pytorch

        When installing pytorch, pay attention to its cudatoolkit version. The cudatoolkit version should be < = CUDA version

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

2. Change code

        It mainly changes the cpp code of the three modules. In order to avoid trouble, you can directly gitee Replace the corresponding file after downloading.

                PointRCNN/pointnet2_lib/pointnet2/src

                PointRCNN/lib/utils/iou3d/src

                PointRCNN/lib/utils/roipool3d/src

        1. Replace the following codes in all cpp files

#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDAtensor ")
#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, " must be contiguous ")
#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)

        Replace with

#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)

        This code is to check whether the input data type is tensor and will only report warning. I also added these codes to other cpp files, which can not be changed.

        2. Comment out the following codes in all cpp files

extern THCState *state;
and
cudaStream_t stream = THCState_getCurrentStream(state);

        3. replace all the parameters stream calling the kernel function in all cpp files.

c10::cuda::getCurrentCUDAStream()

4. Compilation

         1. Direct compilation

sh build_and_install.sh

        Will report an error
        nvcc fatal   : Unsupported gpu architecture 'compute_86'

        ninja: build stopped: subcommand failed.

        2. Reduce computational power requirements

        This error message means that the gpu with the current computing power of 8.6 is not supported. Execute the following command to solve it

export TORCH_CUDA_ARCH_LIST="8.0"

        But then I had a problem

        ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found

        terms of settlement reference resources , more comprehensive, the following is my personal practice process

        First of all, the error message means that glibcxx cannot be found_ 3.4.22 this document

        3. Supplement GLIBCXX_3.4.22 documents

        (1) Use the command to see which versions are available at present

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

        (2) Find out whether there is a higher version of libstdc++.so. The reference search is too time-consuming, so I directly find libstdc++.so in the anaconda/lib folder, as shown below

        Then find the location of the libstdc++.so.6.0.26 file

        My is / home/ld/anaconda3/lib/libstdc++.so.6.0.26

        (3) Copy files to the following

sudo cp /home/ld/anaconda3/lib/libstdc++.so.6.0.26 /usr/lib/x86_64-linux-gnu/

          (4) Then delete the original soft link

sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6

        (5) Create a new soft link (sudo ln -s copied file location soft link name)

sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.26 /usr/lib/x86_64-linux-gnu/libstdc++.so.6

        4. Recompile

sh build_and_install.sh

5. Prepare data and weights

        (1) Arrange the dataset as follows

PointRCNN
├── data
│   ├── KITTI
│   │   ├── ImageSets
│   │   ├── object
│   │   │   ├──training
│   │   │      ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │   ├──testing
│   │   │      ├──calib & velodyne & image_2
├── lib
├── pointnet2_lib
├── tools

        Put the weight file PointRCNN.pth under the tools folder

        (2) Conduct test evaluation

python eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt PointRCNN.pth --batch_size 1 --eval_mode rcnn --set RPN.LOC_XZ_FINE False

        (3) Conduct training

python train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200

Keywords: Python Pytorch Deep Learning Object Detection Autonomous vehicles

Added by Mike-2003 on Fri, 01 Oct 2021 23:11:52 +0300