Python reads pictures in Excel

Follow the previous article Use Python to read pictures in Excel and map them to records , after practice, problems were found, and today's article came into being.

After practicing the code, we found that there were still problems. Some pictures still didn't correspond to the corresponding records, so we started a wave of debugger s. We found that it was not the pot of code, but the pot of drawing1.xml decompressed by Excel. Let's take a look at the code that parses XML:

def _f(subElementObj):
    for anchor in subElementObj:
        xdr_from = anchor.getElementsByTagName('xdr:from')[0]
        col = xdr_from.childNodes[0].firstChild.data  # Get data between tags
        row = xdr_from.childNodes[2].firstChild.data
        embed = anchor.getElementsByTagName('xdr:pic')[0].getElementsByTagName('xdr:blipFill')[0].getElementsByTagName('a:blip')[0].getAttribute('r:embed')  # get attribute
        image_info[(int(row), int(col))] = img_dict.get(int(embed.replace('rId', '')), {}).get(img_feature)

Part of the xml document to parse:

<xdr:pic>
  <xdr:blipFill>
    ...
    <a:blip r:embed="rId1" cstate="print">
      ...
    </a:blip>
  </xdr:blipFill>
</xdr:pic>

Get the R: embedded attribute of the < A: blip > element, that is, the serial number of the corresponding group. In fact, if the excel content is copied from other places, its serial number does not correspond to the serial number of the picture, which leads to the problem. Unfortunately, no reason is found. I don't know how to correspond in Excel. Interested students can study it.

6. Implementation of another way

Another way is to use openpyxl and openpyxl_image_loader library, read by line, and save the loader image. For the complete code, see: new_read_data.py.

7. New requirements

  • Loop reading multiple files in a directory)

    for root, dirs, files in os.walk(source_root):
      for file in files:
        print(os.path.join(root, file))
  • The leader requires that photos larger than 200K are not stored, so the function of compressing pictures is added. I separated the code of compressing pictures compress_image.py.

  • Record the problematic data and write it into Excel, so there is a code to write it into Excel.

    wb = Workbook()
    ws = wb.create_sheet("Problematic data", 0)
    index = 1
    for i in range(len(error_data)):
      index = index + 1
      arr_list = error_data[i].split("|")
      for j in range(len(arr_list)):
        ws.cell(row = index, column= j+1, value = arr_list[j])
    wb.save(target_root + 'Problematic data.xlsx')
  • Photos are named with phone numbers, and logs are generated and written to files.

8. Existing problems

Because there are records in the original data that have not been collected, but these records in the extracted data have corresponding photos, the original image_loader = SheetImageLoader(ws) will not empty the dictionary after reading each time, so the photo of the corresponding line in the previous file will be read to this line of the current file. After searching, it is found that it is a problem with openpyxl image loader. The relevant issues address is: images should not be static variable of SheetImageLoader . So at the end of each cycle_ Just clear the loader and add this line of code:

image_loader._images.clear()

9. Export pictures through VB

In fact, the extraction of pictures in Excel can be realized by VB. Right click [View Code] on the excel sheet directly, and then paste the code. The execution of the code will export the pictures and can be named by the value of any column.

Sub Export picture()
    On Error Resume Next
    MkDir ThisWorkbook.Path & "\picture"
    For Each pic In ActiveSheet.Shapes
        If pic.Type = 13 Then
            RN = pic.TopLeftCell.Offset(0, -3).Value
            pic.Copy
            With ActiveSheet.ChartObjects.Add(0, 0, pic.Width, pic.Height).Chart    'create picture
                .Parent.Select
                .Paste
                .Export ThisWorkbook.Path & "\picture\" & RN & ".jpg"
                .Parent.Delete
            End With
        End If
    Next
    MsgBox "Export picture completed! "
End Sub

10. Summary

After constant tossing, it is the truth to find that all roads lead to Rome. No matter how you realize it, finding and solving problems is the most important experience.

More articles please pay attention to WeChat official account: Zhi Heng.

>This article is written by the blog one article multi posting platform [openwrite]( https://openwrite.cn?from=article_bottom )Release!

Keywords: github

Added by shadow-x on Fri, 03 Dec 2021 12:43:41 +0200