TweetFollow Us on Twitter

Sep 00 Online

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: MacTech Online

PDF and XML

by Jeff Clites <online@mactech.com>

Last month we covered Adobe's Portable Document Format (PDF), focusing on how it relates to Quartz, Apple's new imaging model. In brief, PDF originated as a simplification of PostScript, retaining PostScript's primitive graphics operators while discarding its programming-language constructs and adding file and document structure specifications. Quartz (specifically, the Core Graphics Rendering API) is again based on this same set of operators, making it natural to "record" graphics operations into a PDF file, and just as natural to "play back" a PDF into a series of native drawing instructions. At its simplest, PDF is the new PICT; more interestingly, the Quartz imaging model is at the center of all 2-D graphics on Mac OS X, providing a centralized facility for rending drawing commands from different APIs (such as QuickDraw) into different output formats, be they destined for the screen, a printer, or a file.

Of course, part of the beauty of Quartz is that it frees the programmer from having to worry about the details of this process. At the same time, Quartz is certain to increase the popularity of PDF, and in particular expand its use beyond just a format for traditional documents. Accordingly, it will be to a programmer's advantage to know as much as possible about PDF, and to be aware of its strengths and its weaknesses.

As touched on above, PDF defines a file format in addition to a graphics model. In the abstract, a PDF file describes a tree of objects, with a significant separation between document content and document layout. This should send off bells in a developer's head, because it sounds similar to XML, and it's natural to wonder how deep this connection is-to ask questions like, "can a PDF document be represented in XML." The short answer is "probably not", but it's interesting to investigate the parallels between the two formats.

Intersections with XML

PDF and XML are similar in that they define a file structure which is designed to encapsulate a wide range of data in a fairly generalized, hierarchical fashion. Although PDF is designed to be extensible, it does define an interpretation for the information it contains, and it's not clear how well current PDF-rendering applications would handle PDF documents with content which they don't recognize. XML is at the other extreme. At its core, it says nothing about the semantics of the data which it can contain, and it's often used as a format for information which isn't naturally thought of as a "document." But given its generality, it would certainly be possible to devise an XML-based format to encapsulate page-descriptions in a manner similar to PDF. On the other hand, there are several facilities of PDF which are not easily mimicked using XML-features dealing more with practical performance issues than with conceptual structure.

PDF was designed to be a final format, so that PDFs represent finished documents, rather than in-progress works (such as word-processing documents) which will be extensively changed. Still, it is possible to make limited modifications to PDFs, and interestingly this can be done by appending the "change" information to the end of a PDF, without requiring the entire document to be rewritten. This makes it convenient to prepare an initial document and at a later stage add annotations or hyperlinks. This approach also provides a measure of safety, as previous versions of a document can be recovered simply by truncating the changes off the end, and modifications cannot cause complete corruption of the base document. This also means that it is possible to modify large documents without large resource requirements.

Despite XML's flexibility, it isn't possible to create a well-formed XML document by appending information directly to another document, because of the requirement that there be a single root element. (It is possible to work around this limitation, but only by splitting the document into multiple files.) Additionally, PDF documents frequently encapsulate binary data (such as images or compressed text), and it is not convenient to embed such data into XML documents directly-XML is a text-based format, and binary data could be interpreted as markup, or mangled if the document is converted to a different character encoding. XML-based formats traditionally handle this by storing the data in a separate file which is then referenced from the base document, just as images are included in HTML files. This is less convenient than PDF's single-file approach. (It would be possible to include binary data in XML documents by converting it into a text-based representation, such as Base-64 encoding, but this tends to offset the benefits of compression.) Finally, PDF has a higher structural flexibility, in that logical containment is not always represented by physical containment. In other words, structures which logically contain other objects may do so by referencing the objects by name, whereas in XML such containment is almost always represented by physically nesting elements. This flexibility allows the same PDF to be represented in different ways, so that for example a PDF file may be optimized for page-at-a-time delivery over the internet, or alternatively it could be created in a single-pass by a printer driver.

FOP

So despite the current popularity of XML, it isn't likely that PDF is going to be superceded any time soon. So where do PDF and XML intersect? Well, as we observed before, it's natural to think of XML as unformatted data, and to think of PDF as an output format. The preferred way to get from XML to something with formatting is by way of XSL Transformations (XSLT). In the case of XML-to-PDF transformation, there's a tool to help with the process, FOP. (It's part of the Apache XML project.) To use FOP, you first use an XSLT processor to convert your XML document into a tree of formatting objects, which may itself be represented as an XML document. This is where you determine the form of your final document. Since, as mentioned above, XML documents are traditionally devoid of formatting information and are often viewed as pure data, any decisions about how this information will be presented must be encapsulated in the style sheet. Once this is done, and you have your tree of formatting objects, you feed this into FOP, which produces your final PDF. FOP is very much a work in progress, and does not yet support all of the formatting objects defined in the XSL specification, but even as-is it appears quite useful. IBM has an informative tutorial on transforming XML documents. (A free registration is required to access the tutorial.) It discusses using FOP to create PDF documents, and in addition shows you how to generate SVG (Scalable Vector Graphics), which is useful for creating things like charts and graphs from XML-encapsulated data.

OmniPDF

Finally, while you're playing with PDF, be sure to check out OmniPDF if you are running Mac OS X. It's a very cool PDF viewer. It's still under development, but it's Cocoa-native (and hence Mac-OS-X-native), and it really shows off the power of Quartz, as it uses Core Graphics Rendering to do its magic. (OmniPDF is from the Omni Group, who also created OmniWeb, which is currently the only Cocoa-native web browser available. You should check it out also-it's a refreshing alternative, and it has many fun features which set it apart from your usual browser choices.)

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

How to build a successful civilisation i...
GodFinger 2 grants you godlike powers, leaving you to raise a civilization of followers. In the spirit of games like Black & White, the GodFinger games will see you building bigger and better villages, developing more advanced technology and... | Read more »
How to get all the crabs in Mr Crab 2
Mr. Crab 2 may look like a cutesy platformer for kids, but if you're the kind of person who likes to complete a game 100%, you'll soon realise that it's a tougher than a crustacean's shell. [Read more] | Read more »
How to be a star in Britney Spears: Amer...
If you've ever wanted to be a star, baby, then you've probably already checked out Britney Spears: American Dream and are happily making your way up the charts. But fame doesn't come easy, and everyone needs a helping hand sometimes. So we've got... | Read more »
AppSpy is hiring a part time Staff Write...
| Read more »
How to save lives in ER Surgery Simulato...
A serious earthquake has struck a nearby town in ER Surgery Simulator - Emergency Doctor, and it’s up to you to save the victims. [Read more] | Read more »
Tips and tricks to get a high score in G...
Ketchapp Games loves the endless runner genre. And its newest game, Gravity Switch, is no exception. Gravity Switch takes a fresh approach, though, as you move a block, suspended in zero gravity, safely through a maze of shifting pillars. If the... | Read more »
Tips and tricks to get a high score in S...
Smash Fu is a high-paced tile-tapping game that requires quick reflexes and some practice. You’ll have to smash bricks with the skill of a seasoned black belt to get a high score. To raise the stakes a bit, you’ll also have to avoid tapping any... | Read more »
How to keep the ball rolling in Dropple
If you're new to the minimalist puzzler Dropple, you may find yourself struggling to make it beyond the first couple of steps before your ball falls into the endless abyss below. [Read more] | Read more »
Game Craft releases new Legend of War ti...
Set for release at the end of this month, real time strategy title Legend of War seems sure to delight with a veritable feast of sweet features to get stuck into. Developed by Game Craft, the game is due for release through both the App Store and... | Read more »
How not to die in Traffic Rider
Traffic Rider, an Out Run-esque game in which your ride a motorcycle recklessly into trffic, might not seem particularly complicated. [Read more] | Read more »

Price Scanner via MacPrices.net

Textkraft Professional Becomes A Mobile Produ...
The new update 4.1 of Textkraft Professional for the iPad comes with many new and updated features that will be particularly of interest to self-publishers of e-books. Highlights include import and... Read more
SnipNotes 2.0 – Intelligent note-taking for i...
Indie software developer Felix Lisczyk has announced the release and immediate availability of SnipNotes 2.0, the next major version of his productivity app for iOS devices and Apple Watch.... Read more
Pitch Clock – The Entrepreneur’s Wingman Laun...
Grand Rapids, Michigan based Skunk Tank has announced the release and immediate availability of Pitch Clock – The Entrepreneur’s Wingman 1.1, the company’s new business app available exclusively on... Read more
13-inch 2.9GHz Retina MacBook Pro on sale for...
B&H Photo has the 13″ 2.9GHz Retina MacBook Pro (model #MF841LL/A) on sale for $1599 including free shipping plus NY tax only. Their price is $200 off MSRP. Amazon also has the 13″ 3.9GHz Retina... Read more
Apple price trackers, updated continuously
Scan our Apple Price Trackers for the latest information on sales, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. We update the trackers continuously: - 15″... Read more
Clearance 12-inch Retina MacBooks available s...
B&H Photo has dropped prices on leftover 2015 12″ Retina MacBooks with models now available starting at $999. Shipping is free, and B&H charges NY tax only: - 12″ 1.1GHz Gray Retina MacBook... Read more
Check Apple prices on any device with the iTr...
MacPrices is proud to offer readers a free iOS app (iPhones, iPads, & iPod touch) and Android app (Google Play and Amazon App Store) called iTracx, which allows you to glance at today’s lowest... Read more
New 2016 13-inch 256GB MacBook Air on sale fo...
B&H Photo has the new 13″ 1.6GHz/256GB MacBook Air (model MMGG2LL/A) on sale for $1149 including free shipping plus NY sales tax only. Their price is $50 off MSRP. Amazon has the 13″ 1.6GHz/256GB... Read more
Apple refurbished iPad Air 2s available start...
Apple has Certified Refurbished iPad Air 2 available starting at $339. Apple’s one-year warranty is included with each model, and shipping is free: - 128GB Wi-Fi iPad Air 2: $499 - 64GB Wi-Fi iPad... Read more
Accenture and Vatican Opera Romana Pellegrina...
Accenture has announced that the official mobile application for the Extraordinary Jubilee Year of Mercy declared by Pope Francis has been built and launched by Accenture Mobility, part of Accenture... Read more

Jobs Board

*Apple* Nissan Service Technicians - Apple A...
Apple Automotive is one of the fastest growing dealer...and it shows. Consider making the switch to the Apple Automotive Group today! At Apple Automotive , Read more
ISCS *Apple* ID Site Support Engineer - APP...
…position, we are looking for an individual who has experience supporting customers with Apple ID issues and enjoys this area of support. This person should be Read more
Automotive Sales Consultant - Apple Ford Linc...
…you. The best candidates are smart, technologically savvy and are customer focused. Apple Ford Lincoln Apple Valley is different, because: $30,000 annual salary Read more
*Apple* Support Technician II - Worldventure...
…global, fast growing member based travel company, is currently sourcing for an Apple Support Technician II to be based in our Plano headquarters. WorldVentures is Read more
Restaurant Manager (Neighborhood Captain) - A...
…in every aspect of daily operation. WHY YOU'LL LIKE IT: You'll be the Big Apple . You'll solve problems. You'll get to show your ability to handle the stress and Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.