Automation artifact! Python reads ID card information in batches and writes it to Excel

Today, I share a practical skill to use Python to read ID card information in batches and write it into Excel.

read

Taking the ID card in the form of picture as an example, we use Baidu character recognition OCR to read information. Baidu interface provides a free quota, which is almost enough for daily use. Let's take a specific look at how to use Baidu character recognition.

SDK installation

Baidu cloud SDK provides support for Python, Java and other languages. The python version of SDK is easy to install. You can use PIP install Baidu AIP to support Python 2.7 + & 3.0 X version.

Create application

You need a Baidu or Baidu cloud account to create an application. The registered login address is: https://login.bce.baidu.com/?redirect=http%3A%2F%2Fcloud.baidu.com%2Fcampaign%2Fcampus-2018%2Findex.html, after login, move the mouse to the login avatar position, and click the user center in the pop-up menu, as shown in the figure:

Select the corresponding information for the first time, as shown in the figure:

Click save after selection.

Then move the mouse to the left > symbol position, select artificial intelligence, and click character recognition, as shown in the figure:

Click to enter the following figure:

Now, we can click create application, and then go to the following figure:

From the above figure, we can see that Baidu character recognition OCR can recognize many types of information, that is, not only your ID card, but also other information recognition needs can be quickly realized through it.

Here we fill in the application name and application description. After filling in, click Create immediately.

After creation, return to the application list, as shown in the following figure:

We need to record the three values appid & API key & secret key.

code implementation

The code implementation is very simple. It can be done in a few lines of Python code, as shown below:

from aip import AipOcr

APP_ID = 'own APP_ID'
API_KEY = 'own API_KEY'
SECRET_KEY = 'own SECRET_KEY'
# Create client object
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
# Open and read file contents
fp = open("idcard.jpg", "rb").read()
# res = client.basicGeneral(fp)  # ordinary
res = client.basicAccurate(fp)  # high-precision

It can be seen from the above code that the recognition function is divided into two modes: ordinary mode and high-precision mode. In order to improve the recognition rate, we use the high-precision mode here.

Take the following three fake ID cards I found online as an example:

Because there are multiple ID card pictures, we need to write a method to traverse. The code implementation is as follows:

def findAllFile(base):
    for root, ds, fs in os.walk(base):
        for f in fs:
            yield base + f

The format of the original ID card information obtained through the identification function is as follows:

{'words_result': [{'words': 'Name: Wei Xiaobao'}, {'words': 'Gender male nationality Han'}, {'words': 'Born on December 20, 1654'}, {'words': 'Address: No. 4, jingshanqian street, Dongcheng District, Beijing'}, {'words': 'Forbidden City Jingshi room'}, {'words': 'Citizenship ID number 11204416541220243 X'}], 'log_id': 1411522933129289151, 'words_result_num': 6}

write in

The writing of certificate information is realized by Pandas. Here, we also need to preprocess the obtained original certificate information in order to write it into Excel. We store the name... Address of the certificate in the array respectively. The processing code is as follows:

for tex in res["words_result"]:
    row = tex["words"]
    if "full name" in row:
        names.append(row[2:])
    elif "Gender" in row:
        genders.append(row[2:3])
        nations.append(row[5:])
    elif "birth" in row:
        births.append(row[2:])
    elif "address" in row:
        addr += row[2:]
    elif "Citizenship ID number" in row:
        ids.append(row[7:])
    else:
        addr += row

After that, you can easily write the information directly into Excel. The writing code is as follows:

df = pd.DataFrame({"full name": names, "Gender": genders, "nation": nations,
                       "birth": births, "address": address, "ID card No.": ids})
df.to_excel('idcards.xlsx', index=False)

Take a look at the writing effect:

So far, we have realized the batch reading and writing function of ID card information.

Source code in the official account Python small two back to obtain identity cards.

Keywords: Python

Added by MikeTyler on Sat, 22 Jan 2022 11:31:05 +0200