There are wide usages of XML in the representation of arbitrary data structures such as those used in web service. XML and its extensions are often criticized for verbosity and complexity. There are other akternatives such as JSON, YAML, and S-Expression which concentrate on representing highly structured data. Here, I'm gonna briefly explain XML and demonstrate how to use Java to validate XML with a XML schema. Code can be retrieved at xml and java code
XML and XML schema brief
XML (extensible markup language) is a markup language which is a system for annotating a document in a way that is syntactically distinguishable from the text and defines some rules for formatting documents in human-readable and machine-readable formats. The markup can be transformed into HTML, PDF, and Rich Text Format using a programming language or XSL.
The XML specification defines an XML document as a well-formed text when the following conditions are met:
properly encoded unicode characters.
< and & appear only when markup.
Elements are correctly nested without overlapping.
Tags are case-sensitive.
There's only a single root element that contains all the other elements.
Here's a simple XML document called note.xml
<?xml version="1.0"?><notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd"><to>Fiona</to><from>Ernie</from><heading>Reminder</heading><body>Don't forget to exercise!</body></note>
A valid XML document as a well-formed XML document also contains a reference to a DTD(Document Type Definition) or XML schema. Here's an example of XML schema called note.xsd that defines the elements of the XML document above.
Here, I use dom4j-1.6.1 as the SAX (Simple API for XML) reader which is an event-driven for parsing XML. SAX that operates on each piece of the XML document sequentially is an alternatives to DOM (Document Object Model) that operates on the document as a whole.
Specify the XML document and XML schema.
// specify the system id (url or path) of the schema sourcestaticSourcenoteSchema=newStreamSource("note.xsd");// XML file to be validatedstaticFilefile=newFile("note.xml");
Create SAX factory to obtain and set a SAX parser and schema factory to gain and configure a XML schema parser.
// a factory to acquire and configure a SAX based parser to parse XMLSAXParserFactoryfactory=SAXParserFactory.newInstance();// a factory to acquire and configure a parser to parse XML schema based on the specified schema languageSchemaFactoryschemaFactory=SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");// an array of schemas may be use to generate a Schema objectSource[]schemaSources=newSource[]{noteSchema};
Set the schema for the SAX parser and create the SAX reader.
// use the source array to generate a Schema objectSchemaschema=schemaFactory.newSchema(schemaSources);// set the schema to be used by SAX based parser created by this factoryfactory.setSchema(schema);// create a new object of SAX parserSAXParserparser=factory.newSAXParser();// use interface for reading XML document callbacks to initialize the SAX reader// that creates a DOM4J tree from SAX parsing eventsSAXReaderreader=newSAXReader(parser.getXMLReader());// turn off the validation modereader.setValidation(false);
Create the callback function for the SAX reader so that everytime a SAX error occurs, the error would be printed out.
// Callback functions for SAX errors that would print the line number and the error messageclassLineNumberErrorHandlerimplementsErrorHandler{@Overridepublicvoidwarning(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());}@Overridepublicvoiderror(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());;}@OverridepublicvoidfatalError(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());}}
Set the callback functions and read a XML file.
// set the callback functions as a error handler that would print the line number// and error messagereader.setErrorHandler(newLineNumberErrorHandler());reader.read(file);System.out.println("Finish validation");
Full piece of code is shown below.
packagexmlschema;importjava.io.File;importjavax.xml.parsers.SAXParser;importjavax.xml.parsers.SAXParserFactory;importjavax.xml.transform.Source;importjavax.xml.transform.stream.StreamSource;importjavax.xml.validation.Schema;importjavax.xml.validation.SchemaFactory;importorg.dom4j.io.SAXReader;importorg.xml.sax.ErrorHandler;importorg.xml.sax.SAXException;importorg.xml.sax.SAXParseException;/**
* @author ernie
*
*/publicclassXMLValidation{// specify the system id (url or path) of the schema sourcestaticSourcenoteSchema=newStreamSource("note.xsd");// XML file to be validatedstaticFilefile=newFile("note.xml");/**
* @param args
*/publicstaticvoidmain(String[]args){// a factory to acquire and configure a SAX based parser to parse XMLSAXParserFactoryfactory=SAXParserFactory.newInstance();// a factory to acquire and configure a parser to parse XML schema based on the specified schema languageSchemaFactoryschemaFactory=SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");// an array of schemas may be use to generate a Schema objectSource[]schemaSources=newSource[]{noteSchema};try{// use the source array to generate a Schema objectSchemaschema=schemaFactory.newSchema(schemaSources);// set the schema to be used by SAX based parser created by this factoryfactory.setSchema(schema);// create a new object of SAX parserSAXParserparser=factory.newSAXParser();// use interface for reading XML document callbacks to initialize the SAX reader// that creates a DOM4J tree from SAX parsing eventsSAXReaderreader=newSAXReader(parser.getXMLReader());// turn off the validation modereader.setValidation(false);// set the callback functions as a error handler that would print the line number// and error messagereader.setErrorHandler(newLineNumberErrorHandler());reader.read(file);System.out.println("Finish validation");}catch(Exceptione){e.printStackTrace();}}}// Callback functions for SAX errors that would print the line number and the error messageclassLineNumberErrorHandlerimplementsErrorHandler{@Overridepublicvoidwarning(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());}@Overridepublicvoiderror(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());;}@OverridepublicvoidfatalError(SAXParseExceptionexception)throwsSAXException{System.out.println("Line: "+exception.getLineNumber()+") "+exception.getMessage());}}
Demonstration
Download the jar or use maven to manage the jar of dom4j
Compile the java code with the specufied class path (the dom4j jar downloaded)
Replace the note.xml with a invalid content that change the "to" tag to the "for" tag.
<?xml version="1.0"?><notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd"><for>Fiona</for><from>Ernie</from><heading>Reminder</heading><body>Don't forget to exercise!</body></note>
Run the program again and the error would be shown.
$ java -cp ../Documents/OneDrive/Jar/dom4j-1.6.1/dom4j-1.6.1.jar: xmlschema/XMLValidation
Line: 6) cvc-complex-type.2.4.a: Invalid content was found starting with element 'for'. One of '{"http://www.w3schools.com":to}' is expected.
Finish validation
Conclusion
There are many alternatives in this piece of code, such as multiple source of schema, different schema language, and customized error handlers. To make full use of the javax.xml and dom4j, please reference the links below to explore the full power of validating XML using Java.