It took an hour to write an OCR tool for image and character recognition in Python

Life is short, learn Python!

In the previous article, we tried to use Python to realize text OCR recognition many times!

Today we are going to make an upgraded version: directly write an image character recognition OCR tool!

introduction

Recently, I talked about a demand for image and character recognition in the technical exchange group, which is often used in work and life, such as text extraction of bills, comics, scanned copies and photos.

Bloggers have written a desktop OCR tool based on PyQt + labelme + PaddleOCR, which is used to quickly realize automatic detection of text area in pictures + automatic text recognition.

The recognition effect is shown in the figure below:

All box selected areas are automatically detected by OCR algorithm, and the text content corresponding to each box is listed on the right; Click the text record in the "recognition result" on the right, and then click "copy to clipboard" to copy the text content.

Function list

Text area detection + text recognition
Text area visualization
Text content list
Image and folder loading
Image scroll wheel zoom view
Drawing area and editing area
Copy the selected text recognition result

OCR part

Image text detection + character recognition algorithm is mainly realized by paddleocr.

Create or select a virtual environment to install the required third-party libraries.

conda create -n ocr
conda activate ocr

Mounting frame

If you do not have NVIDIA GPU or the GPU does not support CUDA, you can install the CPU version:

# CPU version
pip install paddlepaddle==2.1.0 -i https://mirror.baidu.com/pypi/simple

If your GPU has installed CUDA9 or CUDA10, cuDNN 7.6 +, you can choose the following GPU version:

# GPU version
python3 -m pip install paddlepaddle-gpu==2.1.0 -i https://mirror.baidu.com/pypi/simple

Install PaddleOCR

To install paddleocr:

pip install "paddleocr>=2.0.1" # Version 2.0.1 + is recommended

Layout parser needs to be installed for layout analysis:

pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl

Test for successful installation

After installation, test a picture – image_dir ./imgs/11.jpg, the whole process of Chinese and English detection + direction Classifier + recognition:

paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true --use_gpu false

Output a list:

Call in python

from paddleocr import PaddleOCR, draw_ocr

# The multilingual languages currently supported by Paddleocr can be switched by modifying the lang parameter
# For example, 'Ch', 'en', 'fr', 'German', 'Korean', 'Japan'`
ocr = PaddleOCR(use_angle_cls=True, lang="ch")  # need to run only once to download and load model into memory
img_path = './imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

The output result is a list. Each item contains a text box, text and recognition confidence:

[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['Pure nutritional conditioner', 0.964739]] [[[24.0, 80.0], [172.0, 80.0], [172.0, 104.0], [24.0, 104.0]], ['Product information/parameter', 0.98069626]] [[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0, 136.0]], ['(45 element/Per kilogram (from 100kg)', 0.9676722]] ......

Interface part

The interface part is implemented based on pyqt5. For the introduction of pyqt GUI program development and environment configuration, see a blog (see the end of the article for details).

Main steps:

Interface layout design

Drag and drop the control in QtDesigner to complete the layout of the program interface and save * ui file.

Automatic generation of interface code by pyuic
Find *. In the project file structure of pycharm Right click the ui file - External Tools - pyuic, and the Python code of the ui interface will be automatically generated in the same level directory of the ui file.

Write interface business class

The business class MainWindow implements the program logic and algorithm functions, which are decoupled from the UI implementation generated in step 2 above, so as to avoid affecting the business code every time the UI file is modified. Controls on the UI interface can be accessed through self_ ui. Xxxobjectname access.

class MainWindow(QMainWindow):
 FIT_WINDOW, FIT_WIDTH, MANUAL_ZOOM = 0, 1, 2

 def __init__(self):
  super().__init__()  # Call the parent constructor to create a QWidget form
  self._ui = Ui_MainWindow()  # Creating ui objects
  self._ui.setupUi(self)  # Construct ui
  self.setWindowTitle(__appname__)

  # Load default configuration
  config = get_config()
  self._config = config    
  
  # Radio button group
        self.checkBtnGroup = QButtonGroup(self)
        self.checkBtnGroup.addButton(self._ui.checkBox_ocr)
        self.checkBtnGroup.addButton(self._ui.checkBox_det)
        self.checkBtnGroup.addButton(self._ui.checkBox_recog)
        self.checkBtnGroup.addButton(self._ui.checkBox_layoutparser)
        self.checkBtnGroup.setExclusive(True)

Implement interface business logic

Connect the signal slot to the buttons, lists and drawing controls on the main interface. The custom slot function does not need to be specially declared. If it is a custom signal, it needs to be in the class__ init__ () preceded by yourSignal= pyqtSignal(args).

Here, take button response function and list response function as examples. The signal of button clicking is clicked, and the signal of listWidget list switching selection is itemSelectionChanged.

# Button response function
self._ui.btnOpenImg.clicked.connect(self.openFile)
self._ui.btnOpenDir.clicked.connect(self.openDirDialog)
self._ui.btnNext.clicked.connect(self.openNextImg)
self._ui.btnPrev.clicked.connect(self.openPrevImg)
self._ui.btnStartProcess.clicked.connect(self.startProcess)
self._ui.btnCopyAll.clicked.connect(self.copyToClipboard)
self._ui.btnSaveAll.clicked.connect(self.saveToFile)
self._ui.listWidgetResults.itemSelectionChanged.connect(self.onItemResultClicked)

Run to see the effect

Run Python main Py to start the GUI program.

Open the picture → select language model ch (Chinese) → select text detection + recognition → click start, and the detected text area will be automatically framed and displayed in the list on the right recognition result - text Tab page.

List of all detected text areas, on the recognition results - Area Tab page:

Software code

Due to the limited time, the detailed functions of the software need to be further improved. The code has been open source to gitee. Interested friends are welcome to submit pull request and jointly modify and improve it.

Source code address: https://gitee.com/signal926/ocr-gui-demo

Zero basic Python Learning Guide

So the blogger spent three months collecting and sorting out, and finally completed this complete set of Python zero basic learning resources,
If you are preparing to teach yourself Python or are learning, you should be able to use:

① Python learning roadmap in all directions, clear what to learn in all directions
② More than 100 Python course videos covering essential basics, crawlers and data analysis
③ More than 100 Python actual combat cases, learning is no longer just theory
④ Huawei produces an exclusive Python cartoon tutorial, which can be learned on mobile phones
⑤ Over the years, the real questions of Python interview for Internet enterprises are very convenient for review

The above information: the access address is at the end of the text

1, Python learning routes in all directions

All directions of Python is to sort out the commonly used technical points of Python and form a summary of knowledge points in various fields. Its purpose is that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.

2, Python course video

When we watch videos, we can't just move our eyes and brain without hands. The more scientific learning method is to use them after understanding. At this time, the hand training project is very suitable.

3, Python actual combat case

Optical theory is useless. We should learn to knock together and practice, so as to apply what we have learned to practice. At this time, we can make some practical cases to learn.

4, Python cartoon tutorial

Use easy to understand comics to teach you to learn Python, which makes it easier for you to remember and will not be boring.

5, Real interview questions for Internet enterprises

We must learn Python in order to find a high paying job. The following interview questions are the latest interview materials from front-line Internet manufacturers such as Alibaba, Tencent and byte, and Alibaba boss has given authoritative answers. After brushing this set of interview materials, I believe everyone can find a satisfactory job.

Scan the QR code below and add it

Keywords: Python Computer Vision Deep Learning

Added by sharke on Thu, 17 Feb 2022 18:19:48 +0200

Programming VIP