Some optimization on improving the accuracy of OCR recognition

preface

1. This article is based on the previous article: Some optimizations on improving the accuracy of OCR recognition (2) Some optimizations are made to improve the accuracy of picture direction recognition to 96%.
2. Before reading this article, it is recommended to read the previous one for better understanding

1, Optimization ideas

1. In the last article, we used the paddleocr direction classifier to directly identify the direction of the picture. We found that the effect is not very good and the efficiency is very low. It takes an average of 2s to identify a picture.
2. In view of the above problems, a new optimization scheme is proposed:

  • Use the text rectangle detection of paddleocr to get the coordinates of all text rectangles
  • Take out the coordinates of the text rectangle with an aspect ratio between 5 - 25 and 0.04 - 0.2
  • Randomly take out a rectangle or sort it according to the size of the aspect ratio, and take out the rectangle with the middle aspect ratio (here, for convenience, directly take out the 0th rectangle)
  • Use the extracted rectangle to cut from the original image
  • The extracted image is used as the input of paddeocr direction classifier

2, Complete code

import cv2
import os
import time
import numpy as np
from PIL import Image
from paddleocr import PaddleOCR





class GetImageRotation(object):
    def __init__(self):
        self.ocr = PaddleOCR(use_angle_cls=True)
        self.ocr_angle = PaddleOCR(use_angle_cls=True)
    
    def get_real_rotation_when_null_rect(self, rect_list):
        w_div_h_sum = 0
        count = 0
        for rect in rect_list:
            p0 = rect[0]
            p1 = rect[1]
            p2 = rect[2]
            p3 = rect[3]
            width = abs(p1[0] - p0[0])
            height = abs(p3[1] - p0[1])
            w_div_h =  width / height
            if abs(w_div_h - 1.0) < 0.5:
                count +=1
                continue
            w_div_h_sum += w_div_h  
        length = len(rect_list) - count
        if length == 0:
            length = 1
        if w_div_h_sum / length >= 1.5:
            return 1
        else:
            return 0

    def get_real_rotation_flag(self, rect_list):
        ret_rect = []
        w_div_h_list = []
        w_div_h_sum = 0
        for rect in rect_list:
            p0 = rect[0]
            p1 = rect[1]
            p2 = rect[2]
            p3 = rect[3]
            width = abs(p1[0] - p0[0])
            height = abs(p3[1] - p0[1])
            w_div_h =  width / height
            # w_div_h_list.append(w_div_h)
            # print(w_div_h)
            if 5 <= abs(w_div_h - 1.0) <= 25 or 0.04 <= abs(w_div_h) <= 0.2:
                ret_rect.append(rect)
                w_div_h_sum += w_div_h

        if w_div_h_sum / len(ret_rect) >= 1.5:
            return 1, ret_rect
        else:
            return 0, ret_rect

    def crop_image(self, rect, image):
        p0 = rect[0]
        p1 = rect[1]
        p2 = rect[2]
        p3 = rect[3]
        crop = image[int(p0[1]):int(p2[1]), int(p0[0]):int(p2[0])]
        # crop_image = Image.fromarray(crop)
        return crop

    def get_img_real_angle(self, img_path):
        ret_angle = 0
        image = cv2.imread(img_path)
        # ocr = PaddleOCR(use_angle_cls=True)
        # angle_cls = ocr.ocr(img_path, det=False, rec=False, cls=True)

        rect_list = self.ocr.ocr(image, rec=False)
        # print(rect_list)
        if rect_list != [[]]:
            try:
                real_angle_flag, rect_good = get_real_rotation_flag(rect_list)
                # rect_crop = choice(rect_good)
                rect_crop = rect_good[0]
                image_crop = crop_image(rect_crop, image)
                # ocr_angle = PaddleOCR(use_angle_cls=True)
                angle_cls = self.ocr_angle.ocr(image_crop, det=False, rec=False, cls=True)
                print(angle_cls)
            except:
                real_angle_flag = get_real_rotation_when_null_rect(rect_list)
                # ocr_angle = PaddleOCR(use_angle_cls=True)
                angle_cls = self.ocr_angle.ocr(image, det=False, rec=False, cls=True)
                print(angle_cls)
        else:
            return 0
        print('real_angle_flag:  {}'.format(real_angle_flag))
        if angle_cls[0][0] == '0':
            if real_angle_flag:
                ret_angle = 0
            else:
                ret_angle = 270
        if angle_cls[0][0] == '180':
            if real_angle_flag:
                ret_angle = 180
            else:
                ret_angle = 90
        return ret_angle

def get_files_path_2(file_dir):
    '''Gets the absolute path of all files with the specified suffix in the specified folder'''
    files_path = []
    # label = file_dir.split('/')[-1]
    for root, dirs, files in os.walk(file_dir):
        for file in files:
            path = os.path.join(root, file)
            files_path.append(path)
    return files_path

Q: why instantiate two paddleocrs?
A: when only one PaddleOCR is instantiated, the following warning will appear, so that the direction cannot be detected

[2021/07/03 12:51:32] root WARNING: Since the angle classifier is not initialized, the angle classifier will not be uesd during the forward process

It should be an internal problem of PaddleOCR. You can delve into it when you have time

test

from time import time
get_image_rotation = GetImageRotation()
image_path = get_files_path_2('/Users/zhangzc/Desktop/workplace/ocrtest/test')
count = 0
time_list = []
for path in image_path:
    if path == '/Users/Desktop/workplace/ocrtest/test/.DS_Store':
        continue
    t1 = time()
    angle = get_image_rotation.get_img_real_angle(path) 
    t2 = time()
    print('----'*10) 
    print(angle)
    print('cost time: {} s'.format(t2-t1))
    time_list.append(t2-t1)
    print('----'*10)
    if angle != 0:
        print('****'*10)
        print(path)
        print('****'*10)
        count +=1
print('print average cost time : {} s'.format(np.mean(time_list)))

Test results: 200 0-degree pictures, only 8 detection errors, 96% accuracy
Average time: 1.25s

summary

1. The accuracy is 96% higher than the previous 60%
2. The average time consumption decreased to 1.25s compared with the previous 2s
3. At present, it has only been tested on 0-degree pictures. For the picture test after rotation, interested students can test it by themselves
4. If you have a better optimization scheme, you are welcome to send a private letter at any time. Thank you very much

Related articles:

Some optimization on improving the accuracy of OCR recognition (I)
Some optimizations on improving the accuracy of OCR recognition (2)

Keywords: NLP OCR

Added by vronsky on Sat, 22 Jan 2022 10:35:05 +0200