LSParser

public interface LSParser

An interface to an object that is able to build, or augment, a DOM tree from various input sources.

LSParser provides an API for parsing XML and building the corresponding DOM document structure. A LSParser instance can be obtained by invoking the DOMImplementationLS.createLSParser() method.

As specified in [DOM Level 3 Core] , when a document is first made available via the LSParser:

  • there will never be two adjacent nodes of type NODE_TEXT, and there will never be empty text nodes.
  • it is expected that the value and nodeValue attributes of an Attr node initially return the XML 1.0 normalized value. However, if the parameters " validate-if-schema" and " datatype-normalization" are set to true, depending on the attribute normalization used, the attribute values may differ from the ones obtained by the XML 1.0 attribute normalization. If the parameters " datatype-normalization" is set to false, the XML 1.0 attribute normalization is guaranteed to occur, and if the attributes list does not contain namespace declarations, the attributes attribute on Element node represents the property [attributes] defined in [XML Information Set] .

Asynchronous LSParser objects are expected to also implement the events::EventTarget interface so that event listeners can be registered on asynchronous LSParser objects.

Events supported by asynchronous LSParser objects are:

load
The LSParser finishes to load the document. See also the definition of the LSLoadEvent interface.
progress
The LSParser signals progress as data is parsed. This specification does not attempt to define exactly when progress events should be dispatched. That is intentionally left as implementation-dependent. Here is one example of how an application might dispatch progress events: Once the parser starts receiving data, a progress event is dispatched to indicate that the parsing starts. From there on, a progress event is dispatched for every 4096 bytes of data that is received and processed. This is only one example, though, and implementations can choose to dispatch progress events at any time while parsing, or not dispatch them at all. See also the definition of the LSProgressEvent interface.

Note: All events defined in this specification use the namespace URI "http://www.w3.org/2002/DOMLS".

While parsing an input source, errors are reported to the application through the error handler (LSParser.domConfig's " error-handler" parameter). This specification does in no way try to define all possible errors that can occur while parsing XML, or any other markup, but some common error cases are defined. The types (DOMError.type) of errors and warnings defined by this specification are:

"check-character-normalization-failure" [error]
Raised if the parameter " check-character-normalization" is set to true and a string is encountered that fails normalization checking.
"doctype-not-allowed" [fatal]
Raised if the configuration parameter "disallow-doctype" is set to true and a doctype is encountered.
"no-input-specified" [fatal]
Raised when loading a document and no input is specified in the LSInput object.
"pi-base-uri-not-preserved" [warning]
Raised if a processing instruction is encountered in a location where the base URI of the processing instruction can not be preserved. One example of a case where this warning will be raised is if the configuration parameter " entities" is set to false and the following XML file is parsed:
 <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]>
 <root> &e; </root>
And subdir/myentity.ent contains:
<one> <two/> </one> <?pi 3.14159?>
 <more/>
"unbound-prefix-in-entity" [warning]
An implementation dependent warning that may be raised if the configuration parameter " namespaces" is set to true and an unbound namespace prefix is encountered in an entity's replacement text. Raising this warning is not enforced since some existing parsers may not recognize unbound namespace prefixes in the replacement text of entities.
"unknown-character-denormalization" [fatal]
Raised if the configuration parameter "ignore-unknown-character-denormalizations" is set to false and a character is encountered for which the processor cannot determine the normalization properties.
"unsupported-encoding" [fatal]
Raised if an unsupported encoding is encountered.
"unsupported-media-type" [fatal]
Raised if the configuration parameter "supported-media-types-only" is set to true and an unsupported media type is encountered.

In addition to raising the defined errors and warnings, implementations are expected to raise implementation specific errors and warnings for any other error and warning cases such as IO errors (file not found, permission denied,...), XML well-formedness errors, and so on.

See also the Document Object Model (DOM) Level 3 Load and Save Specification.

Constant Summary

short ACTION_APPEND_AS_CHILDREN Append the result of the parse operation as children of the context node.
short ACTION_INSERT_AFTER Insert the result of the parse operation as the immediately following sibling of the context node.
short ACTION_INSERT_BEFORE Insert the result of the parse operation as the immediately preceding sibling of the context node.
short ACTION_REPLACE Replace the context node with the result of the parse operation.
short ACTION_REPLACE_CHILDREN Replace all the children of the context node with the result of the parse operation.

Public Method Summary

abstract void
abort()
Abort the loading of the document that is currently being loaded by the LSParser.
abstract boolean
getAsync()
true if the LSParser is asynchronous, false if it is synchronous.
abstract boolean
getBusy()
true if the LSParser is currently busy loading a document, otherwise false.
abstract DOMConfiguration
getDomConfig()
The DOMConfiguration object used when parsing an input source.
abstract LSParserFilter
getFilter()
When a filter is provided, the implementation will call out to the filter as it is constructing the DOM tree structure.
abstract Document
parse(LSInput input)
Parse an XML document from a resource identified by a LSInput.
abstract Document
parseURI(String uri)
Parse an XML document from a location identified by a URI reference [IETF RFC 2396].
abstract Node
parseWithContext(LSInput input, Node contextArg, short action)
Parse an XML fragment from a resource identified by a LSInput and insert the content into an existing document at the position specified with the context and action arguments.
abstract void
setFilter(LSParserFilter filter)
When a filter is provided, the implementation will call out to the filter as it is constructing the DOM tree structure.

Constants

public static final short ACTION_APPEND_AS_CHILDREN

Append the result of the parse operation as children of the context node. For this action to work, the context node must be an Element or a DocumentFragment.

Constant Value: 1

public static final short ACTION_INSERT_AFTER

Insert the result of the parse operation as the immediately following sibling of the context node. For this action to work the context node's parent must be an Element or a DocumentFragment.

Constant Value: 4

public static final short ACTION_INSERT_BEFORE

Insert the result of the parse operation as the immediately preceding sibling of the context node. For this action to work the context node's parent must be an Element or a DocumentFragment.

Constant Value: 3

public static final short ACTION_REPLACE

Replace the context node with the result of the parse operation. For this action to work, the context node must have a parent, and the parent must be an Element or a DocumentFragment.

Constant Value: 5

public static final short ACTION_REPLACE_CHILDREN

Replace all the children of the context node with the result of the parse operation. For this action to work, the context node must be an Element, a Document, or a DocumentFragment.

Constant Value: 2

Public Methods

public abstract void abort ()

Abort the loading of the document that is currently being loaded by the LSParser. If the LSParser is currently not busy, a call to this method does nothing.

public abstract boolean getAsync ()

true if the LSParser is asynchronous, false if it is synchronous.

public abstract boolean getBusy ()

true if the LSParser is currently busy loading a document, otherwise false.

public abstract DOMConfiguration getDomConfig ()

The DOMConfiguration object used when parsing an input source. This DOMConfiguration is specific to the parse operation. No parameter values from this DOMConfiguration object are passed automatically to the DOMConfiguration object on the Document that is created, or used, by the parse operation. The DOM application is responsible for passing any needed parameter values from this DOMConfiguration object to the DOMConfiguration object referenced by the Document object.
In addition to the parameters recognized in on the DOMConfiguration interface defined in [DOM Level 3 Core] , the DOMConfiguration objects for LSParser add or modify the following parameters:

"charset-overrides-xml-encoding"
true
[optional] (default) If a higher level protocol such as HTTP [IETF RFC 2616] provides an indication of the character encoding of the input stream being processed, that will override any encoding specified in the XML declaration or the Text declaration (see also section 4.3.3, "Character Encoding in Entities", in [XML 1.0]). Explicitly setting an encoding in the LSInput overrides any encoding from the protocol.
false
[required] The parser ignores any character set encoding information from higher-level protocols.
"disallow-doctype"
true
[optional] Throw a fatal "doctype-not-allowed" error if a doctype node is found while parsing the document. This is useful when dealing with things like SOAP envelopes where doctype nodes are not allowed.
false
[required] (default) Allow doctype nodes in the document.
"ignore-unknown-character-denormalizations"
true
[required] (default) If, while verifying full normalization when [XML 1.1] is supported, a processor encounters characters for which it cannot determine the normalization properties, then the processor will ignore any possible denormalizations caused by these characters. This parameter is ignored for [XML 1.0].
false
[optional] Report an fatal "unknown-character-denormalization" error if a character is encountered for which the processor cannot determine the normalization properties.
"infoset"
See the definition of DOMConfiguration for a description of this parameter. Unlike in [DOM Level 3 Core] , this parameter will default to true for LSParser.
"namespaces"
true
[required] (default) Perform the namespace processing as defined in [XML Namespaces] and [XML Namespaces 1.1] .
false
[optional] Do not perform the namespace processing.
"resource-resolver"
[required] A reference to a LSResourceResolver object, or null. If the value of this parameter is not null when an external resource (such as an external XML entity or an XML schema location) is encountered, the implementation will request that the LSResourceResolver referenced in this parameter resolves the resource.
"supported-media-types-only"
true
[optional] Check that the media type of the parsed resource is a supported media type. If an unsupported media type is encountered, a fatal error of type "unsupported-media-type" will be raised. The media types defined in [IETF RFC 3023] must always be accepted.
false
[required] (default) Accept any media type.
"validate"
See the definition of DOMConfiguration for a description of this parameter. Unlike in [DOM Level 3 Core] , the processing of the internal subset is always accomplished, even if this parameter is set to false.
"validate-if-schema"
See the definition of DOMConfiguration for a description of this parameter. Unlike in [DOM Level 3 Core] , the processing of the internal subset is always accomplished, even if this parameter is set to false.
"well-formed"
See the definition of DOMConfiguration for a description of this parameter. Unlike in [DOM Level 3 Core] , this parameter cannot be set to false.