In the previous article, we tried to use Python to realize text OCR recognition many times!

Today we are going to make an upgraded version: directly write an image character recognition OCR tool!


Recently, I talked about a demand for image and character recognition in the technical exchange group, which is often used in work and life, such as text extraction of bills, comics, scanned copies and photos.

Bloggers have written a desktop OCR tool based on PyQt + labelme + PaddleOCR, which is used to quickly realize automatic detection of text area in pictures + automatic text recognition.

The recognition effect is shown in the figure below:

All box selected areas are automatically detected by OCR algorithm, and the text content corresponding to each box is listed on the right; Click the text record in the "recognition result" on the right, and then click "copy to clipboard" to copy the text content.

Function list

  • Text area detection + text recognition
  • Text area visualization
  • Text content list
  • Image and folder loading
  • Image scroll wheel zoom view
  • Drawing area and editing area
  • Copy the selected text recognition result

OCR part

Image text detection + character recognition algorithm is mainly realized by paddleocr.

Create or select a virtual environment to install the required third-party libraries.

conda create -n ocr
conda activate ocr

Mounting frame

If you do not have NVIDIA GPU or the GPU does not support CUDA, you can install the CPU version:

# CPU version
pip install paddlepaddle==2.1.0 -i 

If your GPU has installed CUDA9 or CUDA10, cuDNN 7.6 +, you can choose the following GPU version:

# GPU version
python3 -m pip install paddlepaddle-gpu==2.1.0 -i

Install PaddleOCR

To install paddleocr:

pip install "paddleocr>=2.0.1" # Version 2.0.1 + is recommended

Layout parser needs to be installed for layout analysis:

pip3 install -U

Test for successful installation

After installation, test a picture – image_dir ./imgs/11.jpg, the whole process of Chinese and English detection + direction Classifier + recognition:

paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true --use_gpu false

Output a list:

Call in python

from paddleocr import PaddleOCR, draw_ocr

# The multilingual languages currently supported by Paddleocr can be switched by modifying the lang parameter
# For example, 'Ch', 'en', 'fr', 'German', 'Korean', 'Japan'`
ocr = PaddleOCR(use_angle_cls=True, lang="ch")  # need to run only once to download and load model into memory
img_path = './imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:

The output result is a list. Each item contains a text box, text and recognition confidence:

[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['Pure nutritional conditioner', 0.964739]] [[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['Product information/parameter', 0.98069626]] [[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45 element/Per kilogram (from 100kg)', 0.9676722]] ......

Interface part

The interface part is implemented based on pyqt5. For the introduction of pyqt GUI program development and environment configuration, see a blog (see the end of the article for details).

Main steps:

Interface layout design

Drag and drop the control in QtDesigner to complete the layout of the program interface and save * ui file.

Automatic generation of interface code by pyuic
Find *. In the project file structure of pycharm Right click the ui file - External Tools - pyuic, and the Python code of the ui interface will be automatically generated in the same level directory of the ui file.

Write interface business class

The business class MainWindow implements the program logic and algorithm functions, which are decoupled from the UI implementation generated in step 2 above, so as to avoid affecting the business code every time the UI file is modified. Controls on the UI interface can be accessed through self_ ui. Xxxobjectname access.

class MainWindow(QMainWindow):

 def __init__(self):
  super().__init__()  # Call the parent constructor to create a QWidget form
  self._ui = Ui_MainWindow()  # Creating ui objects
  self._ui.setupUi(self)  # Construct ui

  # Load default configuration
  config = get_config()
  self._config = config    
  # Radio button group
        self.checkBtnGroup = QButtonGroup(self)

Implement interface business logic

Connect the signal slot to the buttons, lists and drawing controls on the main interface. The custom slot function does not need to be specially declared. If it is a custom signal, it needs to be in the class__ init__ () preceded by yourSignal= pyqtSignal(args).

Here, take button response function and list response function as examples. The signal of button clicking is clicked, and the signal of listWidget list switching selection is itemSelectionChanged.

# Button response function
  1. Run to see the effect

Run Python main Py to start the GUI program.

Open the picture → select language model ch (Chinese) → select text detection + recognition → click start, and the detected text area will be automatically framed and displayed in the list on the right recognition result - text Tab page.

List of all detected text areas, on the recognition results - Area Tab page:

Software code

Due to the limited time, the detailed functions of the software need to be further improved. The code has been open source to gitee. Interested friends are welcome to submit pull request and jointly modify and improve it.

Source code address:

