May 00 Online
Volume Number: 16 (2000)
Issue Number: 5
Column Tag: MacTech Online
May 00 Online
By Jeff Clites <email@example.com>
A few months ago, we briefly touched on the subject of XML, the biggest hype magnet since Java. A quick glance at any IT-related publication or website will give you a pretty good idea of the degree of interest there is in anything related to XML, and companies are churning out press releases about their plans to utilize XML faster than the press can cover them. XML isn't the Next Big Thing, it's the Current Big Thing. No matter where you stand on the hype-versus-substance issue, as a developer you'll need a working understanding of the technology to meet the demands of new standards and new modes of application interoperability. Beneath the static there is great potential in XML, and everyone will benefit from the attention it is drawing to issues surrounding communication and standardization. This month we'll cover a few of the resources you can turn to when working with this rapidly growing technology.
Although there are many choices available, it is difficult to find the perfect printed reference; many of the obvious choices turn out not to be very good, and the field is evolving so rapidly that even the high-quality books quickly become out-of-date. For the moment, though, there are a couple of winners. The XML Pocket Reference, from O'Reilly and Associates, serves as both a beginner's introduction and a quick reference. For an extensive survey of current (and developing) technologies and APIs, try Professional XML from Wrox Press.
XML Pocket Reference
There is also no end to web-based coverage. An indispensable source is the XML FAQ, which covers a wide range of topics, and is well written. At the other end of the difficulty spectrum is the XML specification itself. Although you'll want to avoid it as long as possible, at some point you'll need to go back to the source to get the definitive word on something. It's a difficult read, but there is a cleverly-constructed annotated version available, which makes deciphering the formal language a bit easier. This latter reference is provided by XML.com, which is an excellent source of references and ongoing coverage of the industry, with timely articles on emerging technologies. Next, the XML Cover Pages are an exhaustive and neutral reference for all things XML (and SGML) related. Also of general interest is the original "mission statement" of the XML standard, which sets forth the design goals for its development. Finally, those planning on developing or working with XML parsers should take a look at the Lark parser, developed by Tim Bray, co-editor of the XML specification. It does not appear to be under active development, but it does provide an interesting case study in parser development in Java.
The XML FAQ
The Annotated XML Specification
The XML Cover Pages
Design Principles for XML
An Introduction to XML Processing with Lark and Larval
XML was designed to be easy to parse, and XML documents are often characterized as self-describing, but if you need to develop an XML-processing application you'll quickly find out that this is nonetheless not a trivial job. Fortunately, there are a number of parsers out there to help you with this task, and simple XML documents can in fact be created and parsed simply. One of the first XML parsers, and perhaps the most widely used, is James Clark's expat. It's a non-validating parser written in C, and recently an Objective C wrapper has been created for it, so it should be straightforward to use it from within Cocoa applications. (I believe that expat is also being used internally within Mac OS X, but it isn't clear what sort of API Apple will provide to access it.) Moving forward, it is likely that the Xerces parsers, which are part of the Apache XML project, will be widely used. They are validating parsers, and are available in Java, C++, and Perl. Much of their code originated from IBM's alphaWorks project, and IBM continues to provide its own versions, XML4J and XML4C for the Java and C++ versions respectively, which combine Xerces with their own Unicode classes, providing support for an expanded range of encodings. Apple is in fact using XML4J in their recently-released version 4.5 of WebObjects, and again it isn't clear to what extent the parser will be accessible from other parts of Apple's frameworks. Xerces-C contains a fair amount of code which must be customized when porting it to new platforms, but a classic Mac OS port has been developed, and a BSD version is in development which is likely to compile under Mac OS X. Also of note is the gnome-xml parser, which is under active development. It falls under the umbrella of the Linux Gnome project, but it is independent of the rest of Gnome and should be portable to other environments and platforms. (For example, it is known to work under Windows.) It is also a validating parser, but appears to be simpler than Xerces and may be worth a look if you need to access a validating parser from C and related languages. The gnome-xml web page also has links to several other articles to get you started, and in particular you should take a look at the article on IBM's developerWorks site.
expat - XML Parser Toolkit
Objective C wrapper for the expat XML Parser
The Apache XML Project
IBM's XML Parser for Java (XML4J)
IBM's XML for C++ parser (XML4C)
The XML library for Gnome
Making application programming easy with GNOME libraries, Part 3
If you're experimenting with XML for the first time, or you need to process XML in a web-related context, you should check out the numerous Perl modules available. A good place to start is the libxml-perl package. Of particular interest are the XML-Grove modules, which let you manipulate an XML document as a tree of objects and access various parts of it using a path-like syntax. This is analogous to the DOM and XPath APIs, and the grove interface will likely become obsolete after these have matured, but in the short term it provides a convenient and powerful approach to XML processing. There are a number of other APIs available for working with XML documents from Perl, including DOM-based and SAX-based parser APIs (the former allows you to access XML documents as a tree of objects, and the latter as a stream of events), as well as support for various approaches to XML querying. IBM has two excellent articles which are not to be missed; one gives a brief run-through of all of the XML-related Perl tools available, and the other gives a detailed tutorial on manipulating XML using Perl, including conversion of XML to HTML and XML-driven database access.
Essential tools and libraries for using XML with Perl
Manipulating XML documents with Perl and other scripting languages
When you grow weary of mucking around with the innards of the future web, cruise on over to the MacTech Online web pages at <www.mactech.com/online/>, and let you browser worry about the parsing for a while.