Four common ways of parsing XML

Four common ways of parsing XML
1. Introduction to XML Language

XML is an extensible markup language. It can define semantic markup (tag), which is a meta-markup language. Unlike HTML, which is a hypertext markup language, XML can only use specified tags. For XML, users can define tags they need. Tree model.
XML (eXtensible Markup Language) and HTML (Hyper Text Markup Language) are the same.
Reasons for using XML: data communication between different software (booking and payment), between different platforms (Mac and Windows), data sharing between different platforms (website and mobile APP), linking different things with the same XML file.
2. Four ways of parsing XML files

2.1 DOM Analysis

DOM, Document Object Model, Document Object Model. DOM is the programming interface specification of html and XML documents, which is independent of platform and language. Using the DOM specification, it can realize the conversion between DOM document and xml, traverse and operate the content of corresponding DOM document. The core of DOM specification is tree model. All read before parsing

2.2 JDOM parsing

JDOM is a combination of Java and DOM. JDOM is committed to building a complete Java-based platform to access, manipulate and output XML data through Java code. JDOM is a new API function for reading, writing and manipulating XML in Java language. Simple, efficient and optimized.
2.3 SAX parsing
SAX, Simple API For XML. Non-W3C official standards, "non-governmental" factual standards. SAX is completely different from DOM in concept. Non-document-driven, event-driven. Event-driven: A method of program operation based on callback mechanism. From the outside to the inside layer by layer.

2.4 DOM4j parsing

dom4j is a Java XML API, similar to jdom, used to read and write XML files. Excellent performance, powerful, simple and easy to use open source code.

2.5 Purpose

Get all the data in the XML file.
NodeType Named Content NoeName Return Value NoeValue Return Value
Element 1 ELEMENT_NODE Element name null
Attr 2 ATTRIBUTE_NODE Attribute Name Attribute Value
Text 3 TEXT_NODE#text node content

3. Specific examples of XML parsing

3.1 DOM Analytical Example

1. Create an object of DocumentBuilderFactory
2. Create a DocumentBuilder object to handle exceptions
3. Parsing xml files by the parse (String file Name) method of Document Builder
4. Return an object that returns org.w3c.dom.Document
5. Get all book nodes in xml through getElements ByTagName - > Booklist
6. Get the number of book nodes by nodelist's getLength method
7. Traversing through each book node{
Getting each node through the item method of nodelist
Get all attribute values for each node
Traversing through all attribute values
}
8. Or get the attribute value directly through ELement, if you know the attribute name

Parsing to Get Attribute Values
See Code: code-1
Parsing to get sub-nodes
See Code: code-2

3.2 JDOM parsing instance (non-JAVA official parsing)
1.JDOM needs to import the corresponding jar package
2. Create a SAXBuilder object
3. Create an input stream and load the books.xml file into the input stream
4. Loading the input stream into saxBuilder through the build method of saxBuilder
5. Get the root node of the xml file through the getRootElement of Document
6. Get the list set of the child nodes of the root node through getChildren of root

Parsing to get attributes and child nodes
See Code: code-4

The problem of scrambling when parsing with JDOM:
First, modify encoding= "UTF-8" for XML files
Fix encoding problems without modifying XML files: Using InputStreamReader input stream
InputStreamReader isr = new InputStreamReader(in,"UTF-8");
3.3 Examples of SAX parsing
handler–startElement–endElement
Create factory instance by static new Instance method of SAXParserFactory
Create parse instance through factory's newSAXParser() method
Create a class that inherits the DefaultHandler rewrite method for business processing and create an instance
Pass an instance into the method

Parsing to get attributes and child nodes
See Code: code-3

3.4 DOM4j parsing example
1.DOM4j is an unofficial parsing method to import jar packages
2. Create SAXReader objects
3. Loading books.xml file by reader's read method
4. Getting the root node
5. Getting child nodes
6. Get the attributes and attribute values of the child nodes
7. Get the node and node values of the child node

See Code-5 for the code

4. Comparisons of Four XML Parsing Methods
Basic methods: DOM and SAX, without importing jar packages
DOM: Platform independent, parsing begins to read all XML files into memory
SAX: Event-driven parsing, Content-triggered parsing of XML
Extension method: JDOM and DOM4j, need to import jar package, based on java platform

DOM advantages: tree structure, intuitive, easy to understand, easy to write code
In the parsing process, the tree structure is kept in memory for easy modification.
DOM Disadvantage: When the XML file is large, it consumes a lot of memory and easily affects parsing performance memory overflow.
SAX Advantages: Event-driven mode, low memory consumption, suitable for data only in XML
SAX Disadvantage: It's not easy to code, and it's difficult to access multiple data in the same XML file at the same time.

Code-1:DOM analysis XML Getting attribute values (two methods)
DOMTest.java
package com;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class DOMTest {
    public static void main(String[] args) {
        /**
         * @author Stella
         */
        //Create an object for DocumentBuilderFactory
        DocumentBuilderFactory bdf = DocumentBuilderFactory.newInstance();
        //Create a DocumentBuilder object to handle exceptions
        try {
            DocumentBuilder bd = bdf.newDocumentBuilder();
            //Parsing xml files through the parse (String file Name) method of Document Builder
            //Returns the object of org.w3c.dom.Document
            Document doc = bd.parse("web/books.xml");
            //Get all book nodes in xml through getElementsByTagName - > Booklist
            NodeList booklist = doc.getElementsByTagName("book");
            //Getting the Number of book Nodes by nodelist's getLength Method
            System.out.println("Total"+booklist.getLength()+"This book");
            //Traversing through each book node
            for (int i = 0;i < booklist.getLength(); i++){
                //Getting each node through the item method of nodelist
                Node node = booklist.item(i);
                //Get all attribute values for each node
                NamedNodeMap attr = node.getAttributes();
                //Traversing through all attribute values
                System.out.println("The first"+(i+1)+"This book contains"+attr.getLength()+"Attributes");
                for (int j = 0; j < attr.getLength(); j++){
                    //Obtaining node attributes through item method, you can see that the return value is still a node.
                    //Element, attr, Text are nodes
                    Node att = attr.item(j);
                    String name = att.getNodeName();
                    String value = att.getNodeValue();
                    System.out.println("Property name:"+name+"----Attribute value:"+value);
                }
                //Get the attribute value directly through ELement, if you know the attribute name
                Element attrELe = (Element) booklist.item(i);
                String eleValue = attrELe.getAttribute("id");
                System.out.println("attribute ID The attribute value of:"+eleValue);
                Element attrELe1 = (Element) booklist.item(i);
                String eleValue1 = attrELe1.getAttribute("id");
                System.out.println("attribute name The attribute value of:"+eleValue1);
            }
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        }catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}


Code-2:java analysis XML Get the node name and node value
DOMTest .java 
package com;

import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class DOMTest {
    public static void main(String[] args) {
        //Create a Document Builder Factory
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            Document doc = db.parse("web/books.xml");
            NodeList booklist = doc.getElementsByTagName("book");
            //Traversing through each book child node
            for (int i = 0; i < booklist.getLength(); i++){
                Node book = booklist.item(i);
                NodeList childNodes = book.getChildNodes();
                System.out.println("The first"+(i+1)+"The number of child nodes in this book:"+childNodes.getLength());
                for (int j = 0; j < childNodes.getLength(); j++){
                    Node child = childNodes.item(j);
                    if(child.getNodeType()==Node.ELEMENT_NODE){
                        //Get the type name of the ELement type
                        String name = child.getNodeName();
                        //The getNodeValue return value of Element type is null
                        String value =  child.getTextContent();
                        System.out.println("Name of child node:"+name+"      Values of subnodes:"+value);
                        //You can also get the first node of a child node
                        String valueContent = child.getFirstChild().getNodeValue();
                        System.out.println("Name of child node:"+name+"      Values of subnodes:"+valueContent);

                    }
                }
            }

        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        }catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }   
    }
}


Code-3: SAX analysis XML Get the attribute name, node name, and node value
SAXTest.java
package com;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.SAXException;

import com.sun.handler.SAXParseHandler;

public class SAXTest {
    public static void main(String[] args) {
        //1 First, get an instance factory of SAXFactory.
        SAXParserFactory factory = SAXParserFactory.newInstance();
        //2. Obtaining an instance of SAXParser through factory
        //2 Create an instance of SAXParseHandler
        SAXParseHandler phandler = new SAXParseHandler();
        try {
            SAXParser parser = factory.newSAXParser();
            parser.parse("web/books.xml", phandler);
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}


SAXParseHandler.java

package com.sun.handler;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParseHandler extends DefaultHandler{
    int bookIndex = 0;
    //Rewriting the Start Label Method for Traversing xml Files//Parsing xml Elements
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        //Call the startElement method of the parent DefaultHandler
        super.startElement(uri, localName, qName, attributes);
        if(qName.equals("book")){
            /**
             *  Given the attribute name, get the attribute value according to the attribute name
             * String value = attributes.getValue("id");
             * System.out.println("book The attribute value of'+value';
             */
            bookIndex++;
            System.out.println("The first" + bookIndex + "Beginning of book traversal");
            //Do not know the name and number of attributes, through the attributes method to get
            for (int i = 0; i < attributes.getLength(); i++){
                String name = attributes.getQName(i);
                String value = attributes.getValue(i);
                System.out.println("The first" + ( i + 1 ) + "Property name:" + name + "Attribute value" + value);
            }
        }else if(!(qName.equals("book")||qName.equals("bookstore"))){
            System.out.print("Node name:" + qName);
        }
    }
    //End Label Method for Rewriting Traversing xml Files
    public void endElement(String uri, String localName, String qName) throws SAXException {
        super.endElement(uri, localName, qName);
        //Judging whether a book is over
        if(qName.equals("book")){
            System.out.println("The first" + bookIndex + "End of book traversal");
        }
    }
    //Rewrite identification xml file parsing start method
    public void startDocument() throws SAXException {
        super.startDocument();
        System.out.println("SAX Start of parsing");
    }
    //Rewrite Identification Method for Ending xml File Parsing
    public void endDocument() throws SAXException {
        super.endDocument();
        System.out.println("SAX End of parsing");
    }
    //Rewriting Method for Obtaining Node Value
    public void characters(char[] ch, int start, int length) throws SAXException {
        super.characters(ch, start, length);
        //ch is the whole book. XML content
        String value = new String(ch, start, length);
        if(!value.trim().equals("")){
            System.out.println("Node values:" + value.trim());
        }
    }
}


//The tree structure of xml files is preserved when parsing xml files through java classes:
//Create Book.java
package com.sun.handler;

public class Book {
    private String id;
    private String name;
    private String author;
    private String year;
    private String price;
    private String language;
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getAuthor() {
        return author;
    }
    public void setAuthor(String author) {
        this.author = author;
    }
    public String getYear() {
        return year;
    }
    public void setYear(String year) {
        this.year = year;
    }
    public String getPrice() {
        return price;
    }
    public void setPrice(String price) {
        this.price = price;
    }
    public String getLanguage() {
        return language;
    }
    public void setLanguage(String language) {
        this.language = language;
    }
    public String toString() {
        return super.toString();
    }
}
//Add content to SAXParseHandler
SAXParseHandler.java
package com.sun.handler;

import java.util.ArrayList;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParseHandler extends DefaultHandler{
    int bookIndex = 0;
    String value = new String();
    Book book = null;
    private ArrayList<Book> booklist = new ArrayList<Book>();
    public ArrayList<Book> getBooklist() {
        return booklist;
    }
    //Rewriting the Start Label Method for Traversing xml Files//Parsing xml Elements
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        //Call the startElement method of the parent DefaultHandler
        super.startElement(uri, localName, qName, attributes);
        if(qName.equals("book")){
            book = new Book();
            /**
             *  Given the attribute name, get the attribute value according to the attribute name
             * String value = attributes.getValue("id");
             * System.out.println("book The attribute value of'+value';
             */
            bookIndex++;
            System.out.println("The first" + bookIndex + "Beginning of book traversal");
            //Do not know the name and number of attributes, through the attributes method to get
            for (int i = 0; i < attributes.getLength(); i++){
                String name = attributes.getQName(i);
                String value = attributes.getValue(i);
                System.out.println("The first" + ( i + 1 ) + "Property name:" + name + "Attribute value" + value);
                if(attributes.getQName(i).equals("id")){
                    book.setId(value);
                }
            }
        }else if(!(qName.equals("book")||qName.equals("bookstore"))){
            System.out.print("Node name:" + qName);
        }
    }
    //End Label Method for Rewriting Traversing xml Files
    public void endElement(String uri, String localName, String qName) throws SAXException {
        super.endElement(uri, localName, qName);
        //Judging whether a book is over
        if(qName.equals("book")){
        //End the previous book node and empty the contents of the global book to facilitate the recording of the next book
        //How to save the content of a book?
        //Save globally with arraylist
        booklist.add(book);
        book = null;
        System.out.println("The first" + bookIndex + "End of book traversal");
        }else if(qName.equals("name")){
            book.setName(value);
        }else if(qName.equals("author")){
            book.setAuthor(value);
        }else if(qName.equals("year")){
            book.setYear(value);
        }else if(qName.equals("language")){
            book.setLanguage(value);
        }else if(qName.equals("price")){
            book.setPrice(value);
        }else if(qName.equals("id")){
            book.setId(value);
        }

    }
    //Rewrite identification xml file parsing start method
    public void startDocument() throws SAXException {
        super.startDocument();
        System.out.println("SAX Start of parsing");
    }
    //Rewrite Identification Method for Ending xml File Parsing
    public void endDocument() throws SAXException {
        super.endDocument();
        System.out.println("SAX End of parsing");
    }
    //Rewriting Method for Obtaining Node Value
    public void characters(char[] ch, int start, int length) throws SAXException {
        super.characters(ch, start, length);
        //ch is the whole book. XML content
        value = new String(ch, start, length);
        if(!value.trim().equals("")){
            System.out.println("----Node values:" + value.trim());
        }
    }
}


Code-4: 
package com.JDOMtest;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.text.AttributedCharacterIterator.Attribute;
import java.util.List;

import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;

public class JDOMTest {
    public static void main(String[] args) {
        //1. Create a SAXBuilder object
        SAXBuilder saxBuilder = new SAXBuilder();
        File file = new File("web/books.xml");
        try {
            if(!file.exists()){
                file.createNewFile();
            }
            //2. Create an input stream and load the books.xml file into the input stream
            FileInputStream in = new FileInputStream(file);
            //3. Loading the input stream into saxBuilder through the build method of saxBuilder
            Document doc = saxBuilder.build(in);
            //4. Get the root node of the xml file through the getRootElement of Document
            Element root = doc.getRootElement();
            //5. Get the child nodes of the root node through getChildren of root
            List<Element> bookList = root.getChildren();
            System.out.println("Total" + bookList.size() + "This book");
            //Get the properties of book
            for (int i = 0; i < bookList.size(); i++){
                Element book = bookList.get(i);
                List<org.jdom2.Attribute> attr = book.getAttributes();
                System.out.println("The first" + (i + 1) +"This book" + attr.size() + "Attributes");
                for (int j = 0 ;j < attr.size(); j++){
                    System.out.print("Property name:" + attr.get(j).getName());
                    System.out.println("----Attribute value:" + attr.get(j).getValue());
                }
            }   
            //Get the node and node values of book
            for (int i = 0; i < bookList.size(); i++){
                Element book = bookList.get(i);
                List<Element> bookElement = book.getChildren();
                System.out.println("The first" + (i + 1) + "This book contains" + bookElement.size() + "Number of nodes");
                for (int j = 0; j < bookElement.size(); j++){
                    String name = bookElement.get(j).getName();
                    String value = bookElement.get(j).getValue();
                    System.out.print("  The first" + (j+1) + "individual----Node name:" + name);
                    System.out.println("----Node value:" + value);

                }
            }
            //foreach loop analysis
            for (Element book:bookList){
                System.out.println("\n----Start parsing section" + (bookList.indexOf(book) + 1) + "This book");
                List<org.jdom2.Attribute> attrs = book.getAttributes();
                for (org.jdom2.Attribute attr:attrs){
                    //Get the property name and value
                    String name = attr.getName();
                    String value = attr.getValue();
                    System.out.print("  The first" + (attrs.indexOf(attr) + 1) + "individual----Property name:" + name);
                    System.out.println("----Node value:" + value);
                }
                List<Element> ele = book.getChildren();
                for (Element o:ele){
                    String name = o.getName();
                    String value = o.getValue();
                    System.out.print("  The first" + (ele.indexOf(o) + 1) + "individual----Node name:" + name);
                    System.out.println("----Node value:" + value);
                }
                System.out.println("\n----End parsing section" + (bookList.indexOf(book) + 1) + "This book");
            }
        } catch (JDOMException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}


Code-5:
package com.DOM4j;

import java.io.File;
import java.util.Iterator;
import java.util.List;

import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

public class DOM4jTest {
    public static void main(String[] args) {
        //1.DOM4j is an unofficial parsing method to import jar packages
        //2. Create SAXReader objects
        SAXReader reader= new SAXReader();
        try {
            //Loading books.xml file through reader's read method
            Document doc = reader.read(new File("web/books.xml"));
            Element root = doc.getRootElement();
            List<Element> books = root.elements();
            int bookSize = books.size();
            System.out.println("Total" + bookSize + "This book");
            for (Element book:books){
                System.out.println("The first" + (books.indexOf(book) + 1) + "Beginning of Book Analysis");
                //Get all attributes
                List<Attribute> attrs = book.attributes();
                int attrSize = attrs.size();
                System.out.println("\t Total" + attrSize + "Attributes:");
                for (Attribute attr:attrs){
                    String name = attr.getName();
                    String value = attr.getValue();
                    System.out.println("\t The first" + (attrs.indexOf(attr)+1)+"Attribute names of attributes:" + name + "-----Attribute value:" + value);
                }
                //Get all nodes
                List<Element> bookEles = book.elements();
                int bookEleSize = bookEles.size();
                System.out.println("\t Total" + bookEleSize + "Each node:");
                for (Element ele:bookEles){
                    String name = ele.getName();
                    String value = ele.getStringValue();
                    System.out.println("\t The first" + (bookEles.indexOf(ele)+1)+"Node name of each node:" + name + "-----Node value:" + value);
                }
                System.out.println("The first" + (books.indexOf(book) + 1) + "End of parsing");
            }
            //Get all book s through iterators
            Iterator<Element> it = root.elementIterator();
            while(it.hasNext()){                
            }
        } catch (DocumentException e) {
            e.printStackTrace();
        }
    }
}


Code-5: 
package com.DOM4j;

import java.io.File;
import java.util.Iterator;
import java.util.List;

import org.dom4j.Attribute;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

public class DOM4jTest {
    public static void main(String[] args) {
        //1.DOM4j is an unofficial parsing method to import jar packages
        //2. Create SAXReader objects
        SAXReader reader= new SAXReader();
        try {
            //Loading books.xml file through reader's read method
            Document doc = reader.read(new File("web/books.xml"));
            Element root = doc.getRootElement();
            List<Element> books = root.elements();
            int bookSize = books.size();
            System.out.println("Total" + bookSize + "This book");
            for (Element book:books){
                System.out.println("The first" + (books.indexOf(book) + 1) + "Beginning of Book Analysis");
                //Get all attributes
                List<Attribute> attrs = book.attributes();
                int attrSize = attrs.size();
                System.out.println("\t Total" + attrSize + "Attributes:");
                for (Attribute attr:attrs){
                    String name = attr.getName();
                    String value = attr.getValue();
                    System.out.println("\t The first" + (attrs.indexOf(attr)+1)+"Attribute names of attributes:" + name + "-----Attribute value:" + value);
                }
                //Get all nodes
                List<Element> bookEles = book.elements();
                int bookEleSize = bookEles.size();
                System.out.println("\t Total" + bookEleSize + "Each node:");
                for (Element ele:bookEles){
                    String name = ele.getName();
                    String value = ele.getStringValue();
                    System.out.println("\t The first" + (bookEles.indexOf(ele)+1)+"Node name of each node:" + name + "-----Node value:" + value);
                }
                System.out.println("The first" + (books.indexOf(book) + 1) + "End of parsing");
            }
            //Get all book s through iterators
            Iterator<Element> it = root.elementIterator();
            while(it.hasNext()){                
            }
        } catch (DocumentException e) {
            e.printStackTrace();
        }
    }
}

Keywords: xml Attribute Java encoding

Added by kiltannen on Sun, 30 Jun 2019 02:13:42 +0300