xml module of Python stack-wide road Standard Library Series

Python's interfaces for processing XML are grouped in the xml package.

A delimited file has only two dimensions of data: rows and columns. If you want to exchange data structures between programs, you need a way to encode hierarchies, sequences, collections, and other structures into text.

XML is the most prominent markup format for handling this transformation. It uses tags to separate data, as shown in the example file menu.xml below:

What I don't know in the process of learning can be added to me?
python Learning Exchange Button qun,784758214
 There are good learning video tutorials, development tools and e-books in the group.
Share with you python Enterprises'Current Demand for Talents and How to Learn Well from Zero Foundation python,And what to learn
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Ansheng&#39;s Blog</title>
  <subtitle>Good time!</subtitle>
  <link href="/atom.xml" rel="self"/>

  <link href="https://blog.ansheng.me/"/>
  <updated>2016-05-24T15:29:19.000Z</updated>
  <id>https://blog.ansheng.me/</id>

  <author>
    <name>Ansheng</name>
  </author>
</feed>

Some important features of XML

  1. The tag begins with a < character, such as feed, title, subtitle, author in the instance.
  2. Ignore spaces
  3. Usually a start tag is followed by a section of other content, followed by a final matching end tag, such as Good Times!
  4. There can be multi-level nesting between tags
  5. Optional attributes can appear in the start tag
  6. Labels can contain values
  7. If there is no content or sub-label in a label named thing, it can be represented by a simple label with a slash in front of the right angular bracket, for example, instead of a label that exists both at the beginning and at the end.
  8. The location of the data can be arbitrary - attributes, values, or sublabels.

XML is commonly used for data transmission and messaging. It has some sub-formats, such as RSS and Atom, such as https://blog.ansheng.me/atom.xml.

The easiest way to parse XML in Python is to use ElementTree.

Modular Explain
xml.etree.ElementTree the ElementTree API, a simple and lightweight XML processor

Create xml files

Import the ElementTree method with an individual name ET

>>> from xml.etree import ElementTree as ET

Create top-level labels

>>> level_1 = ET.Element("famliy")

Create secondary tags, tag name, attrib tag attributes

>>> level_2 = ET.SubElement(level_1, "name", attrib={"enrolled":"yes"}) 

Create three-level labels

>>> level_3 = ET.SubElement(level_2, "age", attrib={"checked":"no"})

Generate documents

>>> tree = ET.ElementTree(level_1)

Write to a file

>>> tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

Import the os module and execute the shell command with the system method in the os module to view the oooo.xml file just created

>>> import os
>>> os.system("cat oooo.xml")
# The generated document has no newline
<famliy><name enrolled="yes"><age checked="no"></age></name></famliy>0

Download the file you just generated to the local location, and then open it with a browser to see the hierarchy.

Create a newline XML file

Code

from xml.etree import ElementTree as ET
from xml.dom import minidom

root = ET.Element('level1',{"age":"1"})
son = ET.SubElement(root,"level2",{"age":"2"})
ET.SubElement(son, "level3", {"age":"3"})

# tree = ET.ElementTree(root)
# tree.write("abc.xml", encoding="utf-8",xml_declaration=True,short_empty_elements=False)

def prettify(root):
    rough_string = ET.tostring(root, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

new_str = prettify(root)
f = open("new_out.xml", "w")
f.write(new_str)
f.close()

Generated xml file

<?xml version="1.0" ?>
<level1 age="1">
    <level2 age="2">
        <level3 age="3"/>
    </level2>
</level1>

Parsing XML

The contents of the first.xml file are as follows:

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year age="19">2025</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year age="19">2028</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year age="19">2028</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

The first.xml file is in the / root directory

Using ElementTree.XML to parse strings into xml objects

>>> from xml.etree import ElementTree as ET
# Open the file, read the XML content, parse the string into XML special objects, root refers to the root node of the XML file 
>>> root = ET.XML(open('first.xml', 'r').read())
>>> root.tag
'data'
>>> for node in root:
...  print(node.tag, node.attrib)
... 
('country', {'name': 'Liechtenstein'})
('country', {'name': 'Singapore'})
('country', {'name': 'Panama'})
>>> print(node.find('rank').text)
69

Using ElementTree.parse to parse files directly into xml objects

>>> from xml.etree import ElementTree as ET
# Direct parsing of xml files
>>> tree = ET.parse("first.xml")
# Get the root node of the xml file
>>> root = tree.getroot()
>>> root.tag
'data'

Traversing the specified nodes in XML

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot()
>>> for node in root.iter('year'):
        # Output tag and content of node
...     print(node.tag, node.text)
...
year 2025
year 2028
year 2028 

Add, delete and change XML

Adding attributes to nodes

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot()
>>> for node in root.iter("year"):
        # View the original properties
...     print(node.attrib)
...
{}
{}
{}
>>> for node in root.iter("year"):
       # Adding attributes
...    node.set("OS","Linux")
...
>>> for node in root.iter("year"):
        # View the added properties
...     print(node.attrib) 
...
{'OS': 'Linux'}
{'OS': 'Linux'}
{'OS': 'Linux'}
# Write content to a file
>>> tree.write("first.xml") 

Delete node attributes

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot()
>>> for node in root.iter("year"):
        # Delete OS attributes of nodes
...     del node.attrib['OS']
...
# Write to a file
>>> tree.write("first.xml") 

View Properties

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot()
>>> for node in root.iter("year"):
...  print(node.attrib)
...
# Node content is empty
{}
{}
{}

Modify node content

Modify the number in year by adding 1

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot() 
>>> for node in root.iter("year"):
        # Output the contents of the original year
...     print(node.text)
        # The original value adds itself+
...     new_year = int(node.text) + 1
...     node.text = str(new_year)
...
2025
2028
2028 
# Write to a file
>>> tree.write("first.xml")
>>> for node in root.iter("year"):
        # Output the contents of year after writing to a file
...     print(node.text)
...
2026
2029
2029

Method of Operating on Nodes

A Method of Obtaining Nodes

>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse("first.xml")
>>> root = tree.getroot()
>>> print(dir(root))
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'extend', 'find', 'findall', 'findtext', 'get', 'getchildren', 'getiterator', 'insert', 'items', 'iter', 'iterfind', 'itertext', 'keys', 'makeelement', 'remove', 'set']  

There are so many ways, so what we usually use is the following.

Method name Explain
tag Get tag tag name
attrib Get the properties of the node
find Get the content of the node
iter Iteration
set set a property
get get attribute

Example

Determine whether QQ is online

Tencent provides API to check whether QQ number is online, Y = online; N = offline; E=QQ number error; A = failure of business user authentication; V = more than the number of free users

>>> import requests
>>> from xml.etree import ElementTree as ET
>>> r = requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=6087414")
>>> result = r.text
>>> from xml.etree import ElementTree as ET
>>> node = ET.XML(result)
>>> if node.text == "Y":
print("online")
... else:
print("offline")
...
On-line

Obtaining the Start and Stop Time of Trains

Code

r = requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=K234&UserID=")
result = r.text
root = ET.XML(result)
for node in root.iter('TrainDetailInfo'):
    print(node.find('TrainStation').text,node.find('ArriveTime').text,node.find("StartTime").text)

If you are still confused in the world of programming, you can join our Python learning button qun: 784758214 to see how our predecessors learned! Exchange experience! I am a senior Python development engineer, from basic Python script to web development, crawler, django, data mining and so on, zero-based to the actual project data have been collated. To every Python buddy! Share some learning methods and small details that need attention. Click to join us. python learner gathering place

results of enforcement

C:\Python35\python.exe F:/Python_code/sublime/Week5/Day01/xml_mod.py
 Shanghai (train number: K234_K235) None 11:12:00
 # Location Stops Starting
 Kunshan 11:45:00 11:48:00
 Suzhou 12:12:00 12:16:00
 Wuxi 12:44:00 12:55:00
 Changzhou 13:22:00 13:26:00
 Zhenjiang 14:13:00 14:16:00
 Nanjing 15:04:00 15:16:00
 Bengbu 17:27:00 17:50:00
 Xuzhou 19:38:00 19:58:00
 Shangqiu 22:12:00 22:17:00
 Kaifeng 23:49:00 23:53:00
 Zhengzhou 00:37:00 01:14:00
 Xinxiang 02:20:00 02:22:00
 Hebi 03:01:00 03:03:00
 Anyang 03:33:00 03:36:00
 Handan 04:11:00 04:16:00
 Xingtai 04:47:00 04:51:00
 Shijiazhuang 06:05:00 None

Process finished with exit code 0

Keywords: xml Python Linux encoding

Added by jabbaonthedais on Mon, 19 Aug 2019 09:12:05 +0300