Initial use of Mask RCNN (running demo) and problems encountered

Source code: https://github.com/matterport/Mask_RCNN

Main problems: Can not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR and

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv1/convolution}}]]
     [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

 

I. Environmental Configuration

Originally intended to tensorflow-gpu 1.13.1+cuda10.1+cudnn7.5+keras, these are the latest versions. After configuration, it was found that tensorflow 1.13.1 does not support CUDA 10.1 temporarily, so it changed to CUDA 10.0, which still makes an error. Here, notice that cudnn also needs to be replaced. I changed to cudnn7.5 which supports CUDA 10.0, and then run demo, which can run demo directly in pyebook or demo. To export demo.py and then execute it on the terminal or other IED s, I'm used to using the terminal, and the terminal and jupyter notebook run prompts may be slightly different, but it's important to note that the exported demo.py needs to comment out a line of code: get_ipython(). run_line_magic ('matplotlib','inline'). There are some dependencies that need to be installed. In requirements.txt, there are other files that need to be downloaded. The pre-trained weight file mask_rcnn_coco.h5( https://github.com/matterport/Mask_RCNN/releases (pycotools file)( https://github.com/waleedka/coco In the python API directory, if you use python 3, you need to change python in Makefile to python 3. If python is python 3 by default, you don't need to change it. In a word, you need to pay attention to the corresponding version. After make runs Makefile, you will generate a _mask file in pycotools directory. This is the pycotools. _mask needed by the program, and then copy the pycotools directory to sa. Under mples/coco, if this module is missing when running demo, then there is a problem in this respect. Either it is not copied to the specified location, or it is not made, or the version of the file generated by make is inconsistent, and the version inconsistency needs to be noted (mentioned later).

 

Two, operation

In the next environment, when I run demo, the results are as follows:

Using TensorFlow backend.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                93
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001


WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/hhm/desktop/Mask_RCNN-master/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-10 19:11:31.584481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-10 19:11:31.609012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz
2019-04-10 19:11:31.609640: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc617c450 executing computations on platform Host. Devices:
2019-04-10 19:11:31.609677: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-10 19:11:31.752051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-10 19:11:31.753130: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc61401b0 executing computations on platform CUDA. Devices:
2019-04-10 19:11:31.753182: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2019-04-10 19:11:31.753590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.65GiB
2019-04-10 19:11:31.753632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-10 19:11:31.782047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-10 19:11:31.782119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-10 19:11:31.782138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-10 19:11:31.799722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1461 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image                    shape: (375, 500, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
2019-04-10 19:11:43.651131: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-04-10 19:11:49.326969: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-10 19:11:49.341018: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "demo.py", line 130, in <module>
    results = model.detect([image], verbose=1)
  File "/home/hhm/desktop/Mask_RCNN-master/mrcnn/model.py", line 2540, in detect
    self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training.py", line 1169, in predict
    steps=steps)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
    batch_outs = f(ins_batch)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv1/convolution}}]]
	 [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

Can not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR error, and tensorflow. python. framework. errors_impl. UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try to see if a warning log message was printed above.
     [[{{node conv1/convolution}}]]
     [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

I've searched for a long time. It's useless to delete any cached files hidden in the main directory. nv, any version of cudnn does not match. It's the result of trying to match all the versions of CUDA 10.0 given by the official website. I have not reduced the tensorflow version. It's really troublesome to think about the downgraded version. The corresponding cuda, cudnn and so on should be replaced, and fear of residual shadow. Ring. After several days, the problem remained unsolved (to be honest, it was a waste of time). I finally tried to reduce the version of tensorflow. I have two tensorflow (one is PIP installation, one is CONDA installation, the first is PIP installation, and then I installed it with CONDA after the above problems, but CONDA installation and pip installation coexist, there are two tensorflow-gpu, but other than tensorboard, tensorflow-base and so on. It was replaced by CONDA installation, but I used PIP in conda's virtual environment for no reason. I uninstalled the tensorflow installed by PIP and tried to use conda's tensorflow. I found that it was not possible. When import ing, it prompted me that there was no tensorflow module, and I did not continue to make many attempts. All of them were uninstalled, including those installed under CONDA and te flow. Nsorflow related software.

Then install tensorflow 1.12.0. Attention should be paid to installing tensorflow 1.12.0.

conda install tensorflow-gpu

By default, the latest version, 1.13.1, will be installed as follows:

conda install tensorflow-gpu=1.12

After installation, I can run demo again, without some dependencies in requirements.txt, and keras needs to be re-installed. I don't know why uninstalling tensorflow will lead to the loss of these packages. So I reinstall them. I can install them in conda, but I can't use pip or install them directly. After installation, I run demo error-reporting prompt lacking pycotools. _mask module. _______ But I think it's strange that this module hasn't been replaced. Then I make it again, copy it and still report an error. Then I find that python in CONDA environment has changed to 3.6 (previously 3.7, corresponding to tensorflow 1.13.1, where tensorflow 1.12.0 is automatically replaced by python 3.6). Then I change python 3 in Makefile to python 3.6 (previously through python 3.6). N3.7 make), so that the error is reported, prompting the lack of modules, change back to python 3, when CONDA external environment is python 2.7, 3.6, 3.7, but python 3 defaults to 3.7, using python 3 make is python 3.7 make, this is still a problem, version higher than python 3.6 in conda, and then activate CONDA tensor flow environment, CONDA activator flow_3.7 (here is my CONDA tensor flow_3.7) The name of tensorflow environment, which was created by ourselves before, is changed and made again. At this time, the generated _mask file is suitable for python 3.6. It is copied to the samples/coco directory. The demo results are as follows:

Using TensorFlow backend.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                93
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001


WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
2019-04-13 15:34:51.692344: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-13 15:34:52.387094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-13 15:34:52.388045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.62GiB
2019-04-13 15:34:52.388099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-13 15:34:59.753714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-13 15:34:59.753792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-13 15:34:59.753815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-13 15:34:59.754226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image                    shape: (640, 480, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
2019-04-13 15:35:11.103215: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:11.189938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.12GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.401613: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.663684: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.697191: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.766519: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.859735: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.253020: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.627718: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.650576: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 845.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

The following storage allocation doesn't know what it is. It seems that the display memory is insufficient. It seems that it has succeeded and has not been misreported as before.

 

3. A little doubt

When conda downloaded tensorflow 1.12.0, it automatically downloaded matching cuda and cudnn, cuda 9.2 and cudnn 7.3.1, respectively. I did not manually configure cuda and cudnn for tensorflow 1.12.0. I did not check whether tensorflow 1.12.0 supports tensorflow 10.0 or not, so there is no result like above, it seems no problem. Then I manually installed cuda 10.1 and cuda 10.0 without any change. And I check through nvcc --version that it is still in use with cuda 10.0, there are questions.

Keywords: Python pip github Makefile

Added by siric on Sun, 19 May 2019 15:04:35 +0300