Main Page

API for XML

Every XML document begins with the
XML prolog
, which is the first line in the previous code,
<?xml
version=”1.0”?>
. This line alone tells parsers and browsers that this file should be parsed based on
the XML rules discussed earlier. The second line,
<books>
, is the
document element
, which is the outer-
most start tag in the file (an
element
is considered the contents of a start tag and end tag). All other tags
must be contained within this one in order to constitute a valid XML file. The second line of the XML
file need not always contain the document element; it can come later if comments or other (???)
The third line in this sample file is a comment, which you may recognize as the same style comment
used in HTML. This is one of the syntax elements XML inherited from SGML.
A little bit farther down the page you find a
<desc>
tag with some special syntax inside it. The
<![CDATA[
]]>
code is used to indicate text that should not be parsed, allowing special characters such as less-than
and greater-than to be included without fear of breaking the XML syntax. The text must appear between
<![CDATA[
and
]]>
to be properly shielded from parsing. This is called a
Character Data Section
or
CData
Section
for short.
The following line is just before the second book definition:
<?page render multiple authors ?>
Even though this looks like the XML prolog, it is actually considered a different type of syntax called a
processing instruction
. The purpose of processing instructions (or PIs for short) is to provide extra infor-
mation to programs that are processing the page, such as XML parsers. PIs are generally free form. Their
only requirement is that a letter must follow the first question mark. After that point, a PI can contain
any sequence of characters aside from the less-than or greater-than symbols.
The most common PI is used to specify a style sheet for an XML file:
<?xml-stylesheet type=”text/css”” href=”MyStyles.css” ?>
This PI is typically placed immediately after the XML prolog and is used by Web browsers to display the
XML data using particular styles.
If you’re interested in learning more about XML and its many uses, consider picking up Beginning
XML, 3rd Edition (Wiley Publishing, Inc., ISBN 0-7645-7077-3).
An API for XML
After XML was defined as a language, the need arose for a way to both represent and manipulate XML
code using common programming languages such as Java.
First came the Simple API for XML (SAX) project for Java. SAX provides an event-based API to parse
XML. Essentially, SAX parsers start out at the beginning of the file and parse their way through the code
in one straight pass, firing events every time it encounters a start tag, end tag, attribute, text, or other
XML syntax. It is up to the developer, then, to determine what to do when each of these events occurs.
SAX parsers are lightweight and fast because they just parse the text and continue on their way. Their
main downside is the inability to stop, go backward, or access a specific part of the XML structure with-
out starting from the beginning of the file.
162
Chapter 6
09_579088 ch06.qxd 3/28/05 11:37 AM Page 162


JavaScript EditorFree JavaScript Editor     Ajax Editor


©