Understand XML technology in one article

Learning objectives

I XML overview

1 Concept
XML (Extensible Markup Language): Extensible Markup Language
Extensible: all labels are customized.
2 development process
Both HTML and XML are standards formulated by W3C (World Wide Web Consortium). At first, the syntax of HTML was too loose, so W3C formulated a stricter XML Syntax Standard, hoping to replace HTML. But programmers and browser manufacturers I don't like using XML, so now XML is more used for configuration files, data transmission and other functions.
configuration file : in the future development process, we will frequently use the framework (framework: semi-finished software). When using the framework, we need to write a configuration file to configure relevant parameters to make the framework meet our development needs. And the configuration file we wrote One file type in is XML.
Transmit data : Java objects cannot be transmitted when transmitting data in the network, so we need to convert Java objects into strings for transmission. One way is to convert objects into XML strings.
3. Difference between XML and html:
  • xml syntax is strict and html syntax is loose
  • xml tags are customized and html tags are predefined
4 basic XML syntax
  • The file suffix is xml
  • The first line must be a document declaration
  • There is only one root tag
  • Attribute values must be enclosed in quotation marks (either single or double)
  • The label must be closed correctly
  • Label names are case sensitive
5 XML components
5.1 document declaration
The document declaration must be placed on the first line in the format: <? XML attribute list? >
Attribute list:
Version: version number (required)
Encoding: encoding method (utf-8 encoding is used by default when not writing)
5.2 label:
The tag name in XML is user-defined, and the tag name has the following requirements:
  • Contains numbers, letters and other characters (any character is OK)
  • Cannot start with numbers and punctuation (except underscores and $)
  • Cannot contain spaces
5.3 instruction (understanding)
Instructions are used in combination with CSS, but now XML is generally not combined with CSS. The syntax is:
<?xml-stylesheet type="text/css" href="a.css" ?>
5.4 properties
Attribute values must be enclosed in quotation marks (either single or double)
5.5 text
If you want to display the text as it is, you need to set the CDATA area in the format: <! [CDATA [text you want to show as is]] >

 

II constraint

1. What are constraints

A constraint is a file that can specify the writing rules of an xml document. As users of the framework, we do not need to be able to write constraint files, as long as we can introduce constraint documents into xml and simply read constraint documents. XML has two kinds of conventions Bundle file types: DTD and Schema.

2.DTD constraints

DTD is a relatively simple constraint technology
introduce:
Local: <! DOCTYPE root signature SYSTEM "dtd file location" >
Network: <! DOCTYPE root signature PUBLIC "dtd file location" "dtd file path" >
Example:
student.dtd:
<!ELEMENT students (student*) >
<!ELEMENT student (name,age,sex)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>
<!ATTLIST student number ID #REQUIRED>

Explanation:

<!ELEMENT students (student*) > 
*Represents multiple students. There are multiple student nodes below a student node. Putting ELEMENT on the top means that students is the root node

<!ELEMENT student (name,age,sex)>
A name node, an age node and a sex node can be placed under a student node, and the order can not be disordered!

<!ELEMENT name (#PCDATA)>
PCDATA means text, that is, text is placed in the name node, and the following age and sex are the same
<!ELEMENT age (#PCDATA)>
<!ELEMENT sex (#PCDATA)>

<!ATTLIST student number ID #REQUIRED>

The student node has a number attribute of ID type, and the ID cannot be repeated, #REQUIRED means it must be.

Students written according to the above constraints xml:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE students SYSTEM "student.dtd">
<students>
    <student number="baizhan001">
        <name>lazy sheep</name>
        <age>10</age>
        <sex>male</sex>
    </student>
    <student number="baizhan002">
        <name>Beautiful sheep</name>
        <age>8</age>
        <sex>female</sex>
    </student>
</students>

3.Schema constraints

DTD constraints are not commonly used, while Schema (xsd) is more commonly used.
introduce:
(1) Write the root tag of the xml document
(2) Introduce xsi prefix: determine the version of the Schema file.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
(3) Import Schema file
XSI: schemalocation = "namespace defined by Schema file, specific path of Schema file"
(4) Declare a prefix for the label of a Schema constraint
xmlns: prefix = "namespace defined by Schema file"
For example:
xsd file is essentially an xml file, that is Schema constraint is to use one xml file to constrain another xml file.

Enumeration is an enumeration type, which means that only those enumeration options can be selected.

\d means number, so \ d{4} means four digits, which means four digits.

targetNamespace is the domain name space.  

student.xsd:

<?xml version="1.0"?>
<xsd:schema xmlns="http://www.itbaizhan.cn/xml"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.itbaizhan.cn/xml" elementFormDefault="qualified">
    <!--students label-->
    <xsd:element name="students" type="studentsType"/>
    <xsd:complexType name="studentsType">
        <xsd:sequence>
            <!--student label, students Lower student There are at least 0 Tags and no upper limit at most-->
            <xsd:element name="student" type="studentType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

    <xsd:complexType name="studentType">
        <xsd:sequence>
            <!--student There are three labels under the, which are name,age,sex-->
            <xsd:element name="name" type="xsd:string"/>
            <xsd:element name="age" type="ageType" />
            <xsd:element name="sex" type="sexType" />
        </xsd:sequence>
        <!--student There's another one next number Property is required-->
        <xsd:attribute name="number" type="numberType" use="required"/>
    </xsd:complexType>
    <!--yes sexType Type constraints-->
    <xsd:simpleType name="sexType">
        <!--Is a string-->
        <xsd:restriction base="xsd:string">
            <!--Enumeration type, or male,Either female-->
            <xsd:enumeration value="male"/>
            <xsd:enumeration value="female"/>
        </xsd:restriction>
    </xsd:simpleType>
    <!--yes ageType Type constraints-->
    <xsd:simpleType name="ageType">
        <!--Is an integer-->
        <xsd:restriction base="xsd:integer">
            <!--The minimum value is 0 and the maximum value is 256-->
            <xsd:minInclusive value="0"/>
            <xsd:maxInclusive value="256"/>
        </xsd:restriction>
    </xsd:simpleType>
    <!--yes numberType Type constraints-->
    <xsd:simpleType name="numberType">
        <!--Is a string-->
        <xsd:restriction base="xsd:string">
            <!--4 Bit integer-->
            <xsd:pattern value="baizhan_\d{4}"/>
        </xsd:restriction>
    </xsd:simpleType>
</xsd:schema> 

Students written according to the above constraints xml:

<?xml version="1.0"?>
<a:students
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.itbaizhan.cn/xml student.xsd"
        xmlns:a="http://www.itbaizhan.cn/xml">
    <!--The function of prefix is to solve the problem of label duplication when there are many configuration files-->
    <a:student number="baizhan_0001">
        <a:name>Pleasant Sheep</a:name>
        <a:age>10</a:age>
        <a:sex>male</a:sex>
    </a:student>

</a:students>

III XML parsing

1.XML parsing idea

XML parsing is to read and write the data in the XML document (we want to extract). The developer of the framework reads the parameter information configured by the user of the framework through XML parsing, and the developer can also read the data transmitted from the network through XML parsing.
dom: load the markup language document into memory at one time and form a dom tree in memory
  • Advantages: it is easy to operate and can perform all operations of CRUD (addition, deletion, modification and query) on the document
  • Disadvantages: occupy memory
SAX: read line by line, event driven.
  • Advantages: it does not occupy memory and is generally used to read XML in mobile APP development
  • Disadvantages: it can only be read and cannot be added, deleted or modified

DOM is used more in Java development, and SAX is used more in mobile APP.

DOM occupies more memory, because each node needs to generate a Java object; SAX is read line by line, so only one line of data objects are stored, so it does not occupy memory.

2 common parsers
JAXP: the parser provided by SUN company supports DOM and SAX
DOM4J: a very excellent parser, which mainly supports the idea of DOM
Jsoup:
Jsup is a Java HTML parser, which can directly parse a URL address and HTML text content. It provides a very labor-saving API, which can be used through DOM, CSS and operators similar to jQuery Method to extract and manipulate data.
PULL: the built-in parser of Android operating system supports SAX idea

2. Jsup parser

(1) Quick start

Steps:
(1) Import jar package
Create a new lib directory under the project and add jsoup-1.11.2 Jar, right-click Add as Library, and then
Select Level Module Library
(2) Load the XML Document into memory and get the DOM tree object Document
(3) Get the corresponding label Element object
(4) Get data

bug:

A bug encountered when obtaining the Jedi path reported that the specified file of the system could not be found. I looked at a file in my path whose name was Idea 2019, but the space in the middle was resolved to% 20

terms of settlement, Refer to this article

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class Demo1 {
    //Get all student names in XML
    public static void main(String[] args) throws IOException {
        /*
        (2) Load the XML Document into memory and get the DOM tree object Document
        2.1 Find the absolute path of the XML document
        Use the classloader to find the absolute path in the hard disk through the relative path of the file in the project
        */
        //Class loader
        ClassLoader classLoader=Demo1.class.getClassLoader();
        //Get absolute path
        //Use replace to restore% 20 due to spaces in the path to the original spaces
       String path=classLoader.getResource("com/baizhan/xml/xsd/student.xml").getPath().replace("%20"," ");
        //System.out.println(path);

        //2.2 according to the path of XML document, load the XML document into memory and parse it into Dom tree object
        Document document= Jsoup.parse(new File(path),"utf-8");
        //The output file content is an html document. Because jsoup parses the html document at the beginning, it parses and encapsulates the document object into an html document
        //System.out.println(document);

        /*
        (3) Get the corresponding label Element object
        The method is similar to js
        */
        //Element s is a collection of Elements
        //When you get the label according to the label name, prefix it. Don't forget to write the prefix
        Elements name=document.getElementsByTag("a:name");
        //(4) Get data
        for (Element element:name){
            String text=element.text();
            System.out.println(text);
        }
    }
}

(2) Common objects

Jsoup: Parse xml or html to form dom tree objects.
Common methods:
One parsing method, three overloads
static Document parse(File in, String charsetName): parses local files
static Document parse(String html): parses html or xml strings
Static document parse (URL, int timeoutmillis): parse the web page source file
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.File;
import java.io.IOException;
import java.net.URL;

public class Demo2 {
    public static void main(String[] args) throws IOException {
        //Parsing local files
        //Get absolute path
//        String path=Demo2.class.getClassLoader().getResource(
//                "com/baizhan/xml/xsd/student.xsd").getPath().replace("%20"," ");
//        Document document= Jsoup.parse(new File(path),"utf-8");
//        System.out.println(document);

        //Parse string
//        Document document1=Jsoup.parse("<a:student number=\"baizhan_0001\">\n" +
//                "< A: name > pleasant goat < / A: name > \ n"+
//                "        <a:age>10</a:age>\n" +
//                "        <a:sex>male</a:sex>\n" +
//                "    </a:student>\n" +
//                "    <a:student number=\"baizhan_0002\">\n" +
//                "< A: name > meiyang < / A: name > \ n"+
//                "        <a:age>15</a:age>\n" +
//                "        <a:sex>female</a:sex>\n" +
//                "    </a:student>");
//        System.out.println(document1);

        //Parsing network resources
        //The first parameter is the access network address, and the second parameter is the maximum waiting time. If it exceeds this time, the default is access timeout
        Document document=Jsoup.parse(new URL("http://www.baidu.com"),5000);//5s
        System.out.println(document);
    }
}
Document: dom tree object of xml
Common methods:
Element getElementById(String id): get the element by id
Elements getElementsByTag(String tagName): get elements according to tag names
Elements getElementsByAttribute(String key): get elements according to attributes
Elements getElementsByAttributeValue(String key,String value): get elements according to attribute name = attribute value.
Elements select(Sting cssQuery): select elements according to the selector.

student.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<students>
    <student number="baizhan_0001">
        <name>Pleasant Sheep</name>
        <age id="a1">10</age>
        <sex class="hh">male</sex>
    </student>
    <student number="baizhan_0002">
        <name>Beautiful sheep</name>
        <age>8</age>
        <sex>female</sex>
    </student>
</students>

Demo3.java :

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class Demo3 {
    public static void main(String[] args) throws IOException {
        String path=Demo3.class.getClassLoader().getResource(
                "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," ");
        Document document= Jsoup.parse(new File(path),"utf-8");

        //Get element by id
        Element element1=document.getElementById("a1");
        System.out.println(element1.text());
        System.out.println("-----------------------");
        //Get element by tag name
        Elements elements=document.getElementsByTag("age");
        for (Element element:elements){
            System.out.println(element.text());
        }
        System.out.println("-----------------------");
        //Get elements based on attributes
        Elements elements1=document.getElementsByAttribute("number");
        for (Element element:elements1){
            System.out.println(element);
        }
        System.out.println("-----------------------");
        //Get element according to attribute name = attribute value
        Elements elements2=document.getElementsByAttributeValue("number","baizhan_0001");
        for (Element element:elements2){
            System.out.println(element);
        }
        System.out.println("------------------------------------------");
        //Select elements based on selector (fastest)
        //css is selected according to id
        Elements elements3=document.select("#a1");
        //text() is the value in the tag
        System.out.println(elements3.text());
        System.out.println("-----------------------");
        //Select according to class in css
        Elements elements4=document.select(".hh");
        System.out.println(elements4);
        System.out.println("-----------------------");
        //css is selected according to the tag name
        Elements elements5=document.select("name");
        System.out.println(elements5);
    }
}
Element: element object
Common methods:
String text(): get the plain text contained in the element.
String html(): get the tagged text contained in the element.
String attributer (string attributekey): get the attribute value of the element.  
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.File;
import java.io.IOException;

public class Demo4 {
    public static void main(String[] args) throws IOException {
        String path=Demo4.class.getClassLoader().getResource(
                "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," ");
        Document document= Jsoup.parse(new File(path),"utf-8");
        Elements elements=document.getElementsByAttributeValue("number","baizhan_0001");
        for (Element element:elements){
            //Gets the plain text contained in the element
            System.out.println(element.text());
            System.out.println("------------------------");
            //Gets the tagged text contained in the element
            System.out.println(element.html());
            System.out.println("------------------------");
            //Gets the attribute value of the element.
            System.out.println(element.attr("number"));
        }
    }
}
3 XPath parsing
Click here to learn XPath rules
XPath is the XML path language, which is a language used to determine the location of a part in a markup language document.
usage method:
1. Import the jar package of Xpath
Jsoupxpath - 0.3.2 Jar, right-click Add as Library, and then select Level Module Library
2. Get Document object
3. Convert Document object to JXDocument object
4. JXDocument calls selN(String xpath) to get the list < jxnode > object.
5. Traverse list < JXNode >, call getElement() of JXNode, and turn it into an Element object.
6. Process the Element object.
 
import cn.wanghaomiao.xpath.exception.XpathSyntaxErrorException;
import cn.wanghaomiao.xpath.model.JXDocument;
import cn.wanghaomiao.xpath.model.JXNode;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

import java.io.File;
import java.io.IOException;
import java.util.List;

public class XPathDemo {
    public static void main(String[] args) throws IOException, XpathSyntaxErrorException {
        //2. Get Document object
        String path=XPathDemo.class.getClassLoader().getResource(
                "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," ");
        Document document= Jsoup.parse(new File(path),"utf-8");
        // 3. Convert Document object to JXDocument object
        JXDocument jxDocument=new JXDocument(document);
        // 4. cn.wanghaomiao.xpath.model.JXDocument calls selN(String xpath) to get the list < jxnode > object.
        //Get all the name tags and go to W3C to see the specific rules of XPath
        //List<JXNode> jxNodes=jxDocument.selN("//name");
        //Find the attribute number Baizhan_ Label element of student of 0002
        //List<JXNode> jxNodes=jxDocument.selN("//student[@number='baizhan_0002']");
        //Want to get baizhan_0001. Find the attribute number Baizhan_ age tag under the tag element of student of 0001
        List<JXNode> jxNodes=jxDocument.selN("//student[@number='baizhan_0001']/age");
        // 5. Traverse list < JXNode >, call getElement() of JXNode, and turn it into an Element object.
        for (JXNode jxNode:jxNodes){
            Element element=jxNode.getElement();
            // 6. Process the Element object.
            System.out.println(element.text());
        }
    }
}

 

Keywords: JavaEE xml xpath

Added by pnj on Fri, 04 Feb 2022 14:27:07 +0200