This package supports in-memory XML documents in the form of a parse tree compliant with the W3C DOM Level 1 Core Recommendation, with extensions including support for XML Namespaces as defined by the current XML proposed recommendation. (Only the DOM Core APIs are used for XML; there are additional HTML-specific features, which are optional.) The package is extended with support for printing XML and for customizing DOM Documents used as parse trees with DOM element subclasses.

The normal navigational metaphor for these documents is that of a tree, with array-like accessors are available for child nodes. Documents are factories for the nodes which may be stored within them, for use by programs which construct documents node by node rather than parsing them.

DOM methods are not defined as being suited for multithreaded use without external application-specific synchronization policies. For example, if an application treats a document as readonly, then no synchronization problems will exist; or multiple threads could synchronize on each node's ownerDocument while accessing or modifying a given document.

Note that not all implementation classes are exposed here. You must use the DOM methods (typically a factory method on an XmlDocument instance) to create such nodes, such as Text and Comment nodes. Only node types which need to be public for purposes of subclassing or access to extended functionality are currently exposed.

Note that this package supports various extensions to the DOM Level 1 core specification, as required for "real" applications. Look at the specifications of the interfaces in this package to get the best overview of that functionality.

Reading an XML Document

The XmlDocument class may be thought of as the root of a tree of XML data. It's easy to get one of these either by parsing an XML document, or by instantiating an empty one directly. The document has a single ElementNode, optionally preceded and followed by CommentNode and PINode values. Documents may also have a Document Type Definition (DTD), and may optionally be validated as they are parsed.
    XmlDocument    document;
    Element        rootElement;

    document = XmlDocument.createXmlDocument (
	    "http://www.w3.org/TR/1998/REC-xml-19980210.xml",
	    false);
    rootElement = (ElementNode) document.getDocumentElement ();

The most flexible way to create an XML document involves direct use of the XmlDocumentBuilder class with a SAX parser. It is a SAX document handler, which constructs documents from parser callbacks.

Writing XML Documents and Nodes

To save a document or node, get a Writer, preferably one using an efficient loss-free encoding such as UTF-8. Then just use the write(Writer) method to save that document; all the node types in this package support such methods, so you can write each node and any of its children with a single method call.

The XmlDocument class has two additional methods. If you write using an OutputStream it is automatically encoded using the UTF-8 encoding. Or you may describe the character encoding being used with your Writer, to ensure that the XML declaration written out is described as using that encoding.

If you want to write a document or node using some output format other than XML, you can override its write(Writer) method. The implementation of such methods involves calls to writeXml methods. You can customize your tree to include only nodes that know how to write themselves as HTML, or some other output format.

XML text is normally pretty-printed. This facilitates human use of the text, such as diagnosing problems that could be masked by documents consisting of a single line of text. To avoid such pretty printing, use writeXml with a write context set up to not use prettyprinting.

Navigating a Document

A number of classes are used to represent nodes in a document. These conform with current DOM APIs, in some cases providing additional methods. Many applications will use only the XmlDocument, ElementNode, and TextNode classes. The class which represents an XML "Processing Instruction" (PINode) is also used by some XML applications to control their processing.

All nodes support the notion of siblings and parents. In addition, element (and document, and the editor-oriented document fragment) nodes also support children. You can access children using an array-like model, or by getFirstChild and then traversing its siblings using getNextSibling. Of course, the array-like model is not stable if you're editing the tree, because the indices are subject to change. However it is efficient, and is very convenient to use when that's not an issue.

Constructing a Document Programmatically

Once you create an XML Document, you use it as a factory to create the nodes a parser would, such as DOM ProcessingInstruction, Text, Comment, CDATASection, and of course Element. As described below, you may configure the XmlDocument class (or potentially a subclass) to return element nodes which add application-specific behaviors.

After you create nodes, you will normally use the DOM Element.appendChild method to append the node to some element. Other primitives also exist, and you may delete nodes from the tree, or insert them before other nodes.

If you wish the document to be written out in a form that is relatively readable by humans, you may wish to insert text nodes with whitespace to perform simple formatting. For example, a Text node with a newline, following each element.

Custom Element Classes in Document

You can configure XmlDocumentBuilder (and also XmlDocument instances) with factories returning element classes that are specialized for a given element type. This lets you easily transform between externalized XML data formats and in-memory data structures which:

For example, the classes could support the HTML DOM methods, or provide methods used to drive an XSL implementation (using the namespace-aware factory infrastructure). Your classes could implement interfaces used to integrate with frameworks for your server side web application; or implement a model to be viewed with Swing; or they could automatically convert from older external formats to the most up-to-date internal one. They could also be used to bind XML nodes to existing components, including "legacy" business data and objects in an existing application kernel. Such objects might require use of the Java Native Interface (JNI) to call them from Java.

Subclassing ElementNode

Since DOM does not support mixing classes from different implementations, such implementations must be associated with a particular implementation of the DOM core classes. For this implementation, that means that only ElementNode is permitted as a base class for custom element nodes. (You must provide a publicly accessible default constructor.)

Customized Element classes can intercept parsing events repored through the XmlReader interface. For example, a node might normalize whitespace, or might convert some attributes or elements to object properties. In general, such nodes can transform a data model exposed in XML to one that better matches application's modeling requirements, and vice versa.

Customized Element classes may need to change how they handle the writeXml method, perhaps writing out their XML start and end tags specially or controlling how ElementNode.writeChildrenXml presents child nodes.

Element Factories

Element factories create new elements based on element tag names, optionally considering the XML Namespace associated with the element. There are two basic ways to use element factories:

In the future, a declarative syntax for configuring the standard factory could be suppported. Such a syntax would be embeddable in XML documents, so that documents themselves may optionally be the source of such bindings.

Since the mappings between XML element types and classes are not necessarily part of the document, you can use different mappings in different environments or when different roles are required. The behaviors of a message sender, for example, will usually differ from those of the recipient. Clients often need to support interactions based on graphical user interfaces, which aren't appropriate on servers. Such differences can be controlled by using different mappings in different environments.

Subclassing XmlDocument

You may also wish to subclass XmlDocument in order to provide specialized behaviors. Such behaviors could include using some particular factory configuration by default (e.g. to support all of the HTML DOM interfaces) interpreting particular processing instructions (albeit without access to any current element), and more.

If you do this, it will be important to also define a subclass of XmlDocumentBuilder which returns an instance of this class when it parses documents. To do so, override the createDocument method.

XML Namespaces

By default, an XmlDocumentBuilder supports the November 1998 version of XML namespaces during parsing. Element and attribute names may be explicitly or implicitly qualified according to the naming context (here called "namespace") in which they are bound. A natural model is lexical scoping, as declared in a document's DTD.

Responsibility for enforcing namespace constraints is entirely in this builder. Accordingly, you should use a Sun XML parser, which reports additional DTD events required to enforce the additional requirements of the XML namespaces spec. Use the setParser method to establish the bidirectional linkage between the parser and builder.

You can disable namespace error checking during parsing if you wish, through the disableNamespaces property on the builder. You only need to do this if you are working with documents which use colons in their names ("reserved for namespace experiments") but do not conform to the syntax defined in the XML namespace draft.