TweetFollow Us on Twitter

Sep 00 Online

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: MacTech Online

PDF and XML

by Jeff Clites <online@mactech.com>

Last month we covered Adobe's Portable Document Format (PDF), focusing on how it relates to Quartz, Apple's new imaging model. In brief, PDF originated as a simplification of PostScript, retaining PostScript's primitive graphics operators while discarding its programming-language constructs and adding file and document structure specifications. Quartz (specifically, the Core Graphics Rendering API) is again based on this same set of operators, making it natural to "record" graphics operations into a PDF file, and just as natural to "play back" a PDF into a series of native drawing instructions. At its simplest, PDF is the new PICT; more interestingly, the Quartz imaging model is at the center of all 2-D graphics on Mac OS X, providing a centralized facility for rending drawing commands from different APIs (such as QuickDraw) into different output formats, be they destined for the screen, a printer, or a file.

Of course, part of the beauty of Quartz is that it frees the programmer from having to worry about the details of this process. At the same time, Quartz is certain to increase the popularity of PDF, and in particular expand its use beyond just a format for traditional documents. Accordingly, it will be to a programmer's advantage to know as much as possible about PDF, and to be aware of its strengths and its weaknesses.

As touched on above, PDF defines a file format in addition to a graphics model. In the abstract, a PDF file describes a tree of objects, with a significant separation between document content and document layout. This should send off bells in a developer's head, because it sounds similar to XML, and it's natural to wonder how deep this connection is-to ask questions like, "can a PDF document be represented in XML." The short answer is "probably not", but it's interesting to investigate the parallels between the two formats.

Intersections with XML

PDF and XML are similar in that they define a file structure which is designed to encapsulate a wide range of data in a fairly generalized, hierarchical fashion. Although PDF is designed to be extensible, it does define an interpretation for the information it contains, and it's not clear how well current PDF-rendering applications would handle PDF documents with content which they don't recognize. XML is at the other extreme. At its core, it says nothing about the semantics of the data which it can contain, and it's often used as a format for information which isn't naturally thought of as a "document." But given its generality, it would certainly be possible to devise an XML-based format to encapsulate page-descriptions in a manner similar to PDF. On the other hand, there are several facilities of PDF which are not easily mimicked using XML-features dealing more with practical performance issues than with conceptual structure.

PDF was designed to be a final format, so that PDFs represent finished documents, rather than in-progress works (such as word-processing documents) which will be extensively changed. Still, it is possible to make limited modifications to PDFs, and interestingly this can be done by appending the "change" information to the end of a PDF, without requiring the entire document to be rewritten. This makes it convenient to prepare an initial document and at a later stage add annotations or hyperlinks. This approach also provides a measure of safety, as previous versions of a document can be recovered simply by truncating the changes off the end, and modifications cannot cause complete corruption of the base document. This also means that it is possible to modify large documents without large resource requirements.

Despite XML's flexibility, it isn't possible to create a well-formed XML document by appending information directly to another document, because of the requirement that there be a single root element. (It is possible to work around this limitation, but only by splitting the document into multiple files.) Additionally, PDF documents frequently encapsulate binary data (such as images or compressed text), and it is not convenient to embed such data into XML documents directly-XML is a text-based format, and binary data could be interpreted as markup, or mangled if the document is converted to a different character encoding. XML-based formats traditionally handle this by storing the data in a separate file which is then referenced from the base document, just as images are included in HTML files. This is less convenient than PDF's single-file approach. (It would be possible to include binary data in XML documents by converting it into a text-based representation, such as Base-64 encoding, but this tends to offset the benefits of compression.) Finally, PDF has a higher structural flexibility, in that logical containment is not always represented by physical containment. In other words, structures which logically contain other objects may do so by referencing the objects by name, whereas in XML such containment is almost always represented by physically nesting elements. This flexibility allows the same PDF to be represented in different ways, so that for example a PDF file may be optimized for page-at-a-time delivery over the internet, or alternatively it could be created in a single-pass by a printer driver.

FOP

So despite the current popularity of XML, it isn't likely that PDF is going to be superceded any time soon. So where do PDF and XML intersect? Well, as we observed before, it's natural to think of XML as unformatted data, and to think of PDF as an output format. The preferred way to get from XML to something with formatting is by way of XSL Transformations (XSLT). In the case of XML-to-PDF transformation, there's a tool to help with the process, FOP. (It's part of the Apache XML project.) To use FOP, you first use an XSLT processor to convert your XML document into a tree of formatting objects, which may itself be represented as an XML document. This is where you determine the form of your final document. Since, as mentioned above, XML documents are traditionally devoid of formatting information and are often viewed as pure data, any decisions about how this information will be presented must be encapsulated in the style sheet. Once this is done, and you have your tree of formatting objects, you feed this into FOP, which produces your final PDF. FOP is very much a work in progress, and does not yet support all of the formatting objects defined in the XSL specification, but even as-is it appears quite useful. IBM has an informative tutorial on transforming XML documents. (A free registration is required to access the tutorial.) It discusses using FOP to create PDF documents, and in addition shows you how to generate SVG (Scalable Vector Graphics), which is useful for creating things like charts and graphs from XML-encapsulated data.

OmniPDF

Finally, while you're playing with PDF, be sure to check out OmniPDF if you are running Mac OS X. It's a very cool PDF viewer. It's still under development, but it's Cocoa-native (and hence Mac-OS-X-native), and it really shows off the power of Quartz, as it uses Core Graphics Rendering to do its magic. (OmniPDF is from the Omni Group, who also created OmniWeb, which is currently the only Cocoa-native web browser available. You should check it out also-it's a refreshing alternative, and it has many fun features which set it apart from your usual browser choices.)

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

How to get coins faster in Rodeo Stamped...
There comes a time in a cowboy or cowgirl's life when all the riding and lassoing skills in the world aren't enough. You're going to need some cold, hard cash to keep your sky zoo expanding in Rodeo Stampede. [Read more] | Read more »
How to out-do Cam Newton in Can You Dab?
The thing about dance crazes is that you're never really sure when they've run their course. Take the Dab, for instance. Propelled by its adoption as the touchdown celebration of choice for Carolina Panthers quarterback Cam Newton, the Dab seemed... | Read more »
Artik Games releases Splashy Cats for An...
Splashy Cats had us hooked from the title alone, and when we found out the game was literally just zig-zagging one of our favourite pop-culture references, guised as a playable cat character, down a river – our appetites were whetted to say the... | Read more »
Battle Cars (Games)
Battle Cars 1.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.1 (iTunes) Description: Welcome to the world of Battle Cars. Battle Cars is a classic arcade top-down racing game with fast mini cars and funny weapons to... | Read more »
How to get started with live.ly
One could be forgiven for thinking that there are already plenty of streaming video apps out there. It's just that the App Store charts would insist that you're mistaken. [Read more] | Read more »
Rodeo Stampede: Guide to all Savannah an...
A "gotta catch 'em all" joke seems appropriate here, even though we're talking animals in Rodeo Stampede and not pocket monsters. By now you've probably had plenty of rides, tamed some animals and built yourself a pretty nice zoo | Read more »
Is there cross-platform play in slither....
So you've sunken plenty of hours into crawling around in slither.io on your iPhone or iPad. You've got your stories of tragedy and triumph, the times you coiled four snakes at one time balanced out by the others when you had a length of more than... | Read more »
Rodeo Stampede guide to running a better...
In Rodeo Stampede, honing your skills so you can jump from animal to animal and outrun the herd as long as possible is only half the fun. Once you've tamed a few animals, you can bring them home with you. [Read more] | Read more »
VoxSyn (Music)
VoxSyn 1.0 Device: iOS Universal Category: Music Price: $6.99, Version: 1.0 (iTunes) Description: VoxSyn turns your voice into the most flexible vocal sound generator ever. Instantly following even subtle modulations of pitch and... | Read more »
Catch Battleplans on Google Play from Ju...
Real-time strategy title Battleplans is due for release on Google Play on June 30th, following its release for iOS systems last month. With its simple interface and pretty graphics, the crowd-pleaser brings a formerly overlooked genre out for the... | Read more »

Price Scanner via MacPrices.net

MacBook Airs on sale for up to $50-$100 off M...
B&H Photo has 13″ and 11″ MacBook Airs on sale for up to $100 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 11″ 1.6GHz/128GB MacBook Air: $849 $50 off list price - 11″ 1.... Read more
Brexit Vote Result Forecast To Slash UK 2016...
Uncertainty and economic volatility can be expected to increase over the next nine months, as the Brexit and concerns over the future of the EU hit IT investment, say Canalys market analysts, with... Read more
13-inch 256GB MacBook Air on sale for $1149,...
Amazon has the 2016 13″ 1.6GHz/256GB MacBook Air (model MMGG2LL/A) on sale for $1149.99 including free shipping. Their price is also $50 off MSRP. Read more
Haven App Launches New Age Of Wirless 911 Eme...
Haven from RapidSOS represents a transformation in access to emergency services from a phone call solely dependent on voice to a robust data connection for voice, text, medical/demographic data.... Read more
Cu Parachute 1.1 Retirement Success PLanning...
Tucson, Arizona based Indie developer Bradley McCarthy has announce the release of Cu (Copper) Parachute 1.1 for iPhone, iPad, and iPod touch devices — a tool with which users can continuously... Read more
Research and Markets Releases iPhone 6s Plus...
A new analysis report from Dublin-based Research and Markets observes that with the iPhone 6s Plus, Apple introduced a new rear camera module. The new device has similar structure and technology than... Read more
Apple refurbished Retina MacBook Pros availab...
Apple has Certified Refurbished 2015 13″ and 15″ Retina MacBook Pros available for up to $380 off the cost of new models. An Apple one-year warranty is included with each model, and shipping is free... Read more
Apple refurbished 11-inch MacBook Airs availa...
Apple has Certified Refurbished 11″ MacBook Airs (the latest models), available for up to $170 off the cost of new models. An Apple one-year warranty is included with each MacBook, and shipping is... Read more
Apple price trackers, updated continuously
Scan our Apple Price Trackers for the latest information on sales, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. We update the trackers continuously: - 15″... Read more
12-inch 32GB and 128GB WiFi iPad Pros on sale...
B&H Photo has 12″ 32GB & 128GB WiFi iPad Pros on sale for up to $80 off MSRP, each including free shipping. B&H charges sales tax in NY only: - 12″ Space Gray 32GB WiFi iPad Pro: $749 $50... Read more

Jobs Board

*Apple* iPhone 6s and New Products Tester Ne...
…we therefore look forward to put out products to quality test for durability. Apple leads the digital music revolution with its iPods and iTunes online store, Read more
*Apple* iPhone 6s and New Products Tester Ne...
…we therefore look forward to put out products to quality test for durability. Apple leads the digital music revolution with its iPods and iTunes online store, Read more
*Apple* Retail - Multiple Positions, Willow...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* iPhone 6s and New Products Tester Ne...
…we therefore look forward to put out products to quality test for durability. Apple leads the digital music revolution with its iPods and iTunes online store, Read more
*Apple* iPhone 6s and New Products Tester Ne...
…we therefore look forward to put out products to quality test for durability. Apple leads the digital music revolution with its iPods and iTunes online store, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.