Source code: https://github.com/matterport/Mask_RCNN
Main problems: Can not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR and
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1/convolution}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
I. Environmental Configuration
Originally intended to tensorflow-gpu 1.13.1+cuda10.1+cudnn7.5+keras, these are the latest versions. After configuration, it was found that tensorflow 1.13.1 does not support CUDA 10.1 temporarily, so it changed to CUDA 10.0, which still makes an error. Here, notice that cudnn also needs to be replaced. I changed to cudnn7.5 which supports CUDA 10.0, and then run demo, which can run demo directly in pyebook or demo. To export demo.py and then execute it on the terminal or other IED s, I'm used to using the terminal, and the terminal and jupyter notebook run prompts may be slightly different, but it's important to note that the exported demo.py needs to comment out a line of code: get_ipython(). run_line_magic ('matplotlib','inline'). There are some dependencies that need to be installed. In requirements.txt, there are other files that need to be downloaded. The pre-trained weight file mask_rcnn_coco.h5( https://github.com/matterport/Mask_RCNN/releases (pycotools file)( https://github.com/waleedka/coco In the python API directory, if you use python 3, you need to change python in Makefile to python 3. If python is python 3 by default, you don't need to change it. In a word, you need to pay attention to the corresponding version. After make runs Makefile, you will generate a _mask file in pycotools directory. This is the pycotools. _mask needed by the program, and then copy the pycotools directory to sa. Under mples/coco, if this module is missing when running demo, then there is a problem in this respect. Either it is not copied to the specified location, or it is not made, or the version of the file generated by make is inconsistent, and the version inconsistency needs to be noted (mentioned later).
Two, operation
In the next environment, when I run demo, the results are as follows:
Using TensorFlow backend. Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 93 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME coco NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 1000 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001 WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/hhm/desktop/Mask_RCNN-master/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-04-10 19:11:31.584481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-04-10 19:11:31.609012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz 2019-04-10 19:11:31.609640: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc617c450 executing computations on platform Host. Devices: 2019-04-10 19:11:31.609677: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-04-10 19:11:31.752051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-04-10 19:11:31.753130: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc61401b0 executing computations on platform CUDA. Devices: 2019-04-10 19:11:31.753182: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1 2019-04-10 19:11:31.753590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.65GiB 2019-04-10 19:11:31.753632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-04-10 19:11:31.782047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-10 19:11:31.782119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-04-10 19:11:31.782138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-04-10 19:11:31.799722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1461 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) Processing 1 images image shape: (375, 500, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 float64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 2019-04-10 19:11:43.651131: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-04-10 19:11:49.326969: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-04-10 19:11:49.341018: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "demo.py", line 130, in <module> results = model.detect([image], verbose=1) File "/home/hhm/desktop/Mask_RCNN-master/mrcnn/model.py", line 2540, in detect self.keras_model.predict([molded_images, image_metas, anchors], verbose=0) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training.py", line 1169, in predict steps=steps) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop batch_outs = f(ins_batch) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node conv1/convolution}}]] [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
Can not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR error, and tensorflow. python. framework. errors_impl. UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try to see if a warning log message was printed above.
[[{{node conv1/convolution}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
I've searched for a long time. It's useless to delete any cached files hidden in the main directory. nv, any version of cudnn does not match. It's the result of trying to match all the versions of CUDA 10.0 given by the official website. I have not reduced the tensorflow version. It's really troublesome to think about the downgraded version. The corresponding cuda, cudnn and so on should be replaced, and fear of residual shadow. Ring. After several days, the problem remained unsolved (to be honest, it was a waste of time). I finally tried to reduce the version of tensorflow. I have two tensorflow (one is PIP installation, one is CONDA installation, the first is PIP installation, and then I installed it with CONDA after the above problems, but CONDA installation and pip installation coexist, there are two tensorflow-gpu, but other than tensorboard, tensorflow-base and so on. It was replaced by CONDA installation, but I used PIP in conda's virtual environment for no reason. I uninstalled the tensorflow installed by PIP and tried to use conda's tensorflow. I found that it was not possible. When import ing, it prompted me that there was no tensorflow module, and I did not continue to make many attempts. All of them were uninstalled, including those installed under CONDA and te flow. Nsorflow related software.
Then install tensorflow 1.12.0. Attention should be paid to installing tensorflow 1.12.0.
conda install tensorflow-gpu
By default, the latest version, 1.13.1, will be installed as follows:
conda install tensorflow-gpu=1.12
After installation, I can run demo again, without some dependencies in requirements.txt, and keras needs to be re-installed. I don't know why uninstalling tensorflow will lead to the loss of these packages. So I reinstall them. I can install them in conda, but I can't use pip or install them directly. After installation, I run demo error-reporting prompt lacking pycotools. _mask module. _______ But I think it's strange that this module hasn't been replaced. Then I make it again, copy it and still report an error. Then I find that python in CONDA environment has changed to 3.6 (previously 3.7, corresponding to tensorflow 1.13.1, where tensorflow 1.12.0 is automatically replaced by python 3.6). Then I change python 3 in Makefile to python 3.6 (previously through python 3.6). N3.7 make), so that the error is reported, prompting the lack of modules, change back to python 3, when CONDA external environment is python 2.7, 3.6, 3.7, but python 3 defaults to 3.7, using python 3 make is python 3.7 make, this is still a problem, version higher than python 3.6 in conda, and then activate CONDA tensor flow environment, CONDA activator flow_3.7 (here is my CONDA tensor flow_3.7) The name of tensorflow environment, which was created by ourselves before, is changed and made again. At this time, the generated _mask file is suitable for python 3.6. It is copied to the samples/coco directory. The demo results are as follows:
Using TensorFlow backend. Configurations: BACKBONE resnet101 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.7 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 93 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (56, 56) NAME coco NUM_CLASSES 81 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 1000 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001 WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. 2019-04-13 15:34:51.692344: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-04-13 15:34:52.387094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-04-13 15:34:52.388045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.62GiB 2019-04-13 15:34:52.388099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-04-13 15:34:59.753714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-13 15:34:59.753792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-04-13 15:34:59.753815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-04-13 15:34:59.754226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) Processing 1 images image shape: (640, 480, 3) min: 0.00000 max: 255.00000 uint8 molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64 image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 float64 anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32 2019-04-13 15:35:11.103215: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:11.189938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.12GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:12.401613: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:12.663684: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:12.697191: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:12.766519: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:12.859735: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:13.253020: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:13.627718: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-04-13 15:35:13.650576: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 845.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
The following storage allocation doesn't know what it is. It seems that the display memory is insufficient. It seems that it has succeeded and has not been misreported as before.
3. A little doubt
When conda downloaded tensorflow 1.12.0, it automatically downloaded matching cuda and cudnn, cuda 9.2 and cudnn 7.3.1, respectively. I did not manually configure cuda and cudnn for tensorflow 1.12.0. I did not check whether tensorflow 1.12.0 supports tensorflow 10.0 or not, so there is no result like above, it seems no problem. Then I manually installed cuda 10.1 and cuda 10.0 without any change. And I check through nvcc --version that it is still in use with cuda 10.0, there are questions.