I XML overview
- xml syntax is strict and html syntax is loose
- xml tags are customized and html tags are predefined
- The file suffix is xml
- The first line must be a document declaration
- There is only one root tag
- Attribute values must be enclosed in quotation marks (either single or double)
- The label must be closed correctly
- Label names are case sensitive
- Contains numbers, letters and other characters (any character is OK)
- Cannot start with numbers and punctuation (except underscores and $)
- Cannot contain spaces
II constraint
1. What are constraints
2.DTD constraints
<!ELEMENT students (student*) > <!ELEMENT student (name,age,sex)> <!ELEMENT name (#PCDATA)> <!ELEMENT age (#PCDATA)> <!ELEMENT sex (#PCDATA)> <!ATTLIST student number ID #REQUIRED>
Explanation:
<!ELEMENT students (student*) > *Represents multiple students. There are multiple student nodes below a student node. Putting ELEMENT on the top means that students is the root node <!ELEMENT student (name,age,sex)> A name node, an age node and a sex node can be placed under a student node, and the order can not be disordered! <!ELEMENT name (#PCDATA)> PCDATA means text, that is, text is placed in the name node, and the following age and sex are the same <!ELEMENT age (#PCDATA)> <!ELEMENT sex (#PCDATA)> <!ATTLIST student number ID #REQUIRED>
The student node has a number attribute of ID type, and the ID cannot be repeated, #REQUIRED means it must be.
Students written according to the above constraints xml:
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE students SYSTEM "student.dtd"> <students> <student number="baizhan001"> <name>lazy sheep</name> <age>10</age> <sex>male</sex> </student> <student number="baizhan002"> <name>Beautiful sheep</name> <age>8</age> <sex>female</sex> </student> </students>
3.Schema constraints
Enumeration is an enumeration type, which means that only those enumeration options can be selected.
\d means number, so \ d{4} means four digits, which means four digits.
targetNamespace is the domain name space.
student.xsd:
<?xml version="1.0"?> <xsd:schema xmlns="http://www.itbaizhan.cn/xml" xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.itbaizhan.cn/xml" elementFormDefault="qualified"> <!--students label--> <xsd:element name="students" type="studentsType"/> <xsd:complexType name="studentsType"> <xsd:sequence> <!--student label, students Lower student There are at least 0 Tags and no upper limit at most--> <xsd:element name="student" type="studentType" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="studentType"> <xsd:sequence> <!--student There are three labels under the, which are name,age,sex--> <xsd:element name="name" type="xsd:string"/> <xsd:element name="age" type="ageType" /> <xsd:element name="sex" type="sexType" /> </xsd:sequence> <!--student There's another one next number Property is required--> <xsd:attribute name="number" type="numberType" use="required"/> </xsd:complexType> <!--yes sexType Type constraints--> <xsd:simpleType name="sexType"> <!--Is a string--> <xsd:restriction base="xsd:string"> <!--Enumeration type, or male,Either female--> <xsd:enumeration value="male"/> <xsd:enumeration value="female"/> </xsd:restriction> </xsd:simpleType> <!--yes ageType Type constraints--> <xsd:simpleType name="ageType"> <!--Is an integer--> <xsd:restriction base="xsd:integer"> <!--The minimum value is 0 and the maximum value is 256--> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="256"/> </xsd:restriction> </xsd:simpleType> <!--yes numberType Type constraints--> <xsd:simpleType name="numberType"> <!--Is a string--> <xsd:restriction base="xsd:string"> <!--4 Bit integer--> <xsd:pattern value="baizhan_\d{4}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
Students written according to the above constraints xml:
<?xml version="1.0"?> <a:students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.itbaizhan.cn/xml student.xsd" xmlns:a="http://www.itbaizhan.cn/xml"> <!--The function of prefix is to solve the problem of label duplication when there are many configuration files--> <a:student number="baizhan_0001"> <a:name>Pleasant Sheep</a:name> <a:age>10</a:age> <a:sex>male</a:sex> </a:student> </a:students>
III XML parsing
1.XML parsing idea
- Advantages: it is easy to operate and can perform all operations of CRUD (addition, deletion, modification and query) on the document
- Disadvantages: occupy memory
- Advantages: it does not occupy memory and is generally used to read XML in mobile APP development
- Disadvantages: it can only be read and cannot be added, deleted or modified
DOM is used more in Java development, and SAX is used more in mobile APP.
DOM occupies more memory, because each node needs to generate a Java object; SAX is read line by line, so only one line of data objects are stored, so it does not occupy memory.
2. Jsup parser
(1) Quick start
bug:
A bug encountered when obtaining the Jedi path reported that the specified file of the system could not be found. I looked at a file in my path whose name was Idea 2019, but the space in the middle was resolved to% 20
terms of settlement, Refer to this article
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; public class Demo1 { //Get all student names in XML public static void main(String[] args) throws IOException { /* (2) Load the XML Document into memory and get the DOM tree object Document 2.1 Find the absolute path of the XML document Use the classloader to find the absolute path in the hard disk through the relative path of the file in the project */ //Class loader ClassLoader classLoader=Demo1.class.getClassLoader(); //Get absolute path //Use replace to restore% 20 due to spaces in the path to the original spaces String path=classLoader.getResource("com/baizhan/xml/xsd/student.xml").getPath().replace("%20"," "); //System.out.println(path); //2.2 according to the path of XML document, load the XML document into memory and parse it into Dom tree object Document document= Jsoup.parse(new File(path),"utf-8"); //The output file content is an html document. Because jsoup parses the html document at the beginning, it parses and encapsulates the document object into an html document //System.out.println(document); /* (3) Get the corresponding label Element object The method is similar to js */ //Element s is a collection of Elements //When you get the label according to the label name, prefix it. Don't forget to write the prefix Elements name=document.getElementsByTag("a:name"); //(4) Get data for (Element element:name){ String text=element.text(); System.out.println(text); } } }
(2) Common objects
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import java.io.File; import java.io.IOException; import java.net.URL; public class Demo2 { public static void main(String[] args) throws IOException { //Parsing local files //Get absolute path // String path=Demo2.class.getClassLoader().getResource( // "com/baizhan/xml/xsd/student.xsd").getPath().replace("%20"," "); // Document document= Jsoup.parse(new File(path),"utf-8"); // System.out.println(document); //Parse string // Document document1=Jsoup.parse("<a:student number=\"baizhan_0001\">\n" + // "< A: name > pleasant goat < / A: name > \ n"+ // " <a:age>10</a:age>\n" + // " <a:sex>male</a:sex>\n" + // " </a:student>\n" + // " <a:student number=\"baizhan_0002\">\n" + // "< A: name > meiyang < / A: name > \ n"+ // " <a:age>15</a:age>\n" + // " <a:sex>female</a:sex>\n" + // " </a:student>"); // System.out.println(document1); //Parsing network resources //The first parameter is the access network address, and the second parameter is the maximum waiting time. If it exceeds this time, the default is access timeout Document document=Jsoup.parse(new URL("http://www.baidu.com"),5000);//5s System.out.println(document); } }
student.xml:
<?xml version="1.0" encoding="UTF-8" ?> <students> <student number="baizhan_0001"> <name>Pleasant Sheep</name> <age id="a1">10</age> <sex class="hh">male</sex> </student> <student number="baizhan_0002"> <name>Beautiful sheep</name> <age>8</age> <sex>female</sex> </student> </students>
Demo3.java :
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; public class Demo3 { public static void main(String[] args) throws IOException { String path=Demo3.class.getClassLoader().getResource( "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," "); Document document= Jsoup.parse(new File(path),"utf-8"); //Get element by id Element element1=document.getElementById("a1"); System.out.println(element1.text()); System.out.println("-----------------------"); //Get element by tag name Elements elements=document.getElementsByTag("age"); for (Element element:elements){ System.out.println(element.text()); } System.out.println("-----------------------"); //Get elements based on attributes Elements elements1=document.getElementsByAttribute("number"); for (Element element:elements1){ System.out.println(element); } System.out.println("-----------------------"); //Get element according to attribute name = attribute value Elements elements2=document.getElementsByAttributeValue("number","baizhan_0001"); for (Element element:elements2){ System.out.println(element); } System.out.println("------------------------------------------"); //Select elements based on selector (fastest) //css is selected according to id Elements elements3=document.select("#a1"); //text() is the value in the tag System.out.println(elements3.text()); System.out.println("-----------------------"); //Select according to class in css Elements elements4=document.select(".hh"); System.out.println(elements4); System.out.println("-----------------------"); //css is selected according to the tag name Elements elements5=document.select("name"); System.out.println(elements5); } }
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.File; import java.io.IOException; public class Demo4 { public static void main(String[] args) throws IOException { String path=Demo4.class.getClassLoader().getResource( "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," "); Document document= Jsoup.parse(new File(path),"utf-8"); Elements elements=document.getElementsByAttributeValue("number","baizhan_0001"); for (Element element:elements){ //Gets the plain text contained in the element System.out.println(element.text()); System.out.println("------------------------"); //Gets the tagged text contained in the element System.out.println(element.html()); System.out.println("------------------------"); //Gets the attribute value of the element. System.out.println(element.attr("number")); } } }
import cn.wanghaomiao.xpath.exception.XpathSyntaxErrorException; import cn.wanghaomiao.xpath.model.JXDocument; import cn.wanghaomiao.xpath.model.JXNode; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import java.io.File; import java.io.IOException; import java.util.List; public class XPathDemo { public static void main(String[] args) throws IOException, XpathSyntaxErrorException { //2. Get Document object String path=XPathDemo.class.getClassLoader().getResource( "com/baizhan/xml/jsoup/student.xml").getPath().replace("%20"," "); Document document= Jsoup.parse(new File(path),"utf-8"); // 3. Convert Document object to JXDocument object JXDocument jxDocument=new JXDocument(document); // 4. cn.wanghaomiao.xpath.model.JXDocument calls selN(String xpath) to get the list < jxnode > object. //Get all the name tags and go to W3C to see the specific rules of XPath //List<JXNode> jxNodes=jxDocument.selN("//name"); //Find the attribute number Baizhan_ Label element of student of 0002 //List<JXNode> jxNodes=jxDocument.selN("//student[@number='baizhan_0002']"); //Want to get baizhan_0001. Find the attribute number Baizhan_ age tag under the tag element of student of 0001 List<JXNode> jxNodes=jxDocument.selN("//student[@number='baizhan_0001']/age"); // 5. Traverse list < JXNode >, call getElement() of JXNode, and turn it into an Element object. for (JXNode jxNode:jxNodes){ Element element=jxNode.getElement(); // 6. Process the Element object. System.out.println(element.text()); } } }