TweetFollow Us on Twitter

Sep 00 Online

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: MacTech Online

PDF and XML

by Jeff Clites <online@mactech.com>

Last month we covered Adobe's Portable Document Format (PDF), focusing on how it relates to Quartz, Apple's new imaging model. In brief, PDF originated as a simplification of PostScript, retaining PostScript's primitive graphics operators while discarding its programming-language constructs and adding file and document structure specifications. Quartz (specifically, the Core Graphics Rendering API) is again based on this same set of operators, making it natural to "record" graphics operations into a PDF file, and just as natural to "play back" a PDF into a series of native drawing instructions. At its simplest, PDF is the new PICT; more interestingly, the Quartz imaging model is at the center of all 2-D graphics on Mac OS X, providing a centralized facility for rending drawing commands from different APIs (such as QuickDraw) into different output formats, be they destined for the screen, a printer, or a file.

Of course, part of the beauty of Quartz is that it frees the programmer from having to worry about the details of this process. At the same time, Quartz is certain to increase the popularity of PDF, and in particular expand its use beyond just a format for traditional documents. Accordingly, it will be to a programmer's advantage to know as much as possible about PDF, and to be aware of its strengths and its weaknesses.

As touched on above, PDF defines a file format in addition to a graphics model. In the abstract, a PDF file describes a tree of objects, with a significant separation between document content and document layout. This should send off bells in a developer's head, because it sounds similar to XML, and it's natural to wonder how deep this connection is-to ask questions like, "can a PDF document be represented in XML." The short answer is "probably not", but it's interesting to investigate the parallels between the two formats.

Intersections with XML

PDF and XML are similar in that they define a file structure which is designed to encapsulate a wide range of data in a fairly generalized, hierarchical fashion. Although PDF is designed to be extensible, it does define an interpretation for the information it contains, and it's not clear how well current PDF-rendering applications would handle PDF documents with content which they don't recognize. XML is at the other extreme. At its core, it says nothing about the semantics of the data which it can contain, and it's often used as a format for information which isn't naturally thought of as a "document." But given its generality, it would certainly be possible to devise an XML-based format to encapsulate page-descriptions in a manner similar to PDF. On the other hand, there are several facilities of PDF which are not easily mimicked using XML-features dealing more with practical performance issues than with conceptual structure.

PDF was designed to be a final format, so that PDFs represent finished documents, rather than in-progress works (such as word-processing documents) which will be extensively changed. Still, it is possible to make limited modifications to PDFs, and interestingly this can be done by appending the "change" information to the end of a PDF, without requiring the entire document to be rewritten. This makes it convenient to prepare an initial document and at a later stage add annotations or hyperlinks. This approach also provides a measure of safety, as previous versions of a document can be recovered simply by truncating the changes off the end, and modifications cannot cause complete corruption of the base document. This also means that it is possible to modify large documents without large resource requirements.

Despite XML's flexibility, it isn't possible to create a well-formed XML document by appending information directly to another document, because of the requirement that there be a single root element. (It is possible to work around this limitation, but only by splitting the document into multiple files.) Additionally, PDF documents frequently encapsulate binary data (such as images or compressed text), and it is not convenient to embed such data into XML documents directly-XML is a text-based format, and binary data could be interpreted as markup, or mangled if the document is converted to a different character encoding. XML-based formats traditionally handle this by storing the data in a separate file which is then referenced from the base document, just as images are included in HTML files. This is less convenient than PDF's single-file approach. (It would be possible to include binary data in XML documents by converting it into a text-based representation, such as Base-64 encoding, but this tends to offset the benefits of compression.) Finally, PDF has a higher structural flexibility, in that logical containment is not always represented by physical containment. In other words, structures which logically contain other objects may do so by referencing the objects by name, whereas in XML such containment is almost always represented by physically nesting elements. This flexibility allows the same PDF to be represented in different ways, so that for example a PDF file may be optimized for page-at-a-time delivery over the internet, or alternatively it could be created in a single-pass by a printer driver.

FOP

So despite the current popularity of XML, it isn't likely that PDF is going to be superceded any time soon. So where do PDF and XML intersect? Well, as we observed before, it's natural to think of XML as unformatted data, and to think of PDF as an output format. The preferred way to get from XML to something with formatting is by way of XSL Transformations (XSLT). In the case of XML-to-PDF transformation, there's a tool to help with the process, FOP. (It's part of the Apache XML project.) To use FOP, you first use an XSLT processor to convert your XML document into a tree of formatting objects, which may itself be represented as an XML document. This is where you determine the form of your final document. Since, as mentioned above, XML documents are traditionally devoid of formatting information and are often viewed as pure data, any decisions about how this information will be presented must be encapsulated in the style sheet. Once this is done, and you have your tree of formatting objects, you feed this into FOP, which produces your final PDF. FOP is very much a work in progress, and does not yet support all of the formatting objects defined in the XSL specification, but even as-is it appears quite useful. IBM has an informative tutorial on transforming XML documents. (A free registration is required to access the tutorial.) It discusses using FOP to create PDF documents, and in addition shows you how to generate SVG (Scalable Vector Graphics), which is useful for creating things like charts and graphs from XML-encapsulated data.

OmniPDF

Finally, while you're playing with PDF, be sure to check out OmniPDF if you are running Mac OS X. It's a very cool PDF viewer. It's still under development, but it's Cocoa-native (and hence Mac-OS-X-native), and it really shows off the power of Quartz, as it uses Core Graphics Rendering to do its magic. (OmniPDF is from the Omni Group, who also created OmniWeb, which is currently the only Cocoa-native web browser available. You should check it out also-it's a refreshing alternative, and it has many fun features which set it apart from your usual browser choices.)

 
AAPL
$100.96
Apple Inc.
-0.83
MSFT
$47.52
Microsoft Corpora
+0.84
GOOG
$596.08
Google Inc.
+6.81

MacTech Search:
Community Search:

Software Updates via MacUpdate

Airfoil 4.8.9 - Send audio from any app...
Airfoil allows you to send any audio to AirPort Express units, Apple TVs, and even other Macs and PCs, all in sync! It's your audio - everywhere. With Airfoil you can take audio from any... Read more
WhatRoute 1.13.0 - Geographically trace...
WhatRoute is designed to find the names of all the routers an IP packet passes through on its way from your Mac to a destination host. It also measures the round-trip time from your Mac to the... Read more
Chromium 37.0.2062.122 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. FreeSMUG-Free OpenSource Mac User Group build is... Read more
Attachment Tamer 3.1.14b9 - Take control...
Attachment Tamer gives you control over attachment handling in Apple Mail. It fixes the most annoying Apple Mail flaws, ensures compatibility with other email software, and allows you to set up how... Read more
Duplicate Annihilator 5.0 - Find and del...
Duplicate Annihilator takes on the time-consuming task of comparing the images in your iPhoto library using effective algorithms to make sure that no duplicate escapes. Duplicate Annihilator detects... Read more
jAlbum Pro 12.2 - Organize your digital...
jAlbum Pro has all the features you love in jAlbum, but comes with a commercial license. With jAlbum, you can create gorgeous custom photo galleries for the Web without writing a line of code!... Read more
jAlbum 12.2 - Create custom photo galler...
With jAlbum, you can create gorgeous custom photo galleries for the Web without writing a line of code! Beginner-friendly, with pro results Simply drag and drop photos into groups, choose a design... Read more
Quicken 2015 2.0.4 - Complete personal f...
Quicken 2015 helps you manage all your personal finances in one place, so you can see where you're spending and where you can save. Quicken automatically categorizes your financial transactions,... Read more
iMazing 1.0 - Complete iOS device manage...
iMazing (formerly DiskAid) is the ultimate iOS device manager with capabilities far beyond what iTunes offers. With iMazing and your iOS device (iPhone, iPad, or iPod), you can: Copy music to and... Read more
Xcode 6.0.1 - Integrated development env...
Apple Xcode is Apple Computer's integrated development environment (IDE) for OS X. The full Xcode package is free to ADC members and includes all the tools you need to create, debug, and optimize... Read more

Latest Forum Discussions

See All

View Source – HTML, JavaScript and CSS...
View Source – HTML, JavaScript and CSS 1.0 Device: iOS Universal Category: Utilities Price: $.99, Version: 1.0 (iTunes) Description: View Source is an app plus an iOS 8 Safari extension that makes it easy to do one key web developer... | Read more »
Avenged Sevenfold’s Hail To The King: De...
Avenged Sevenfold’s Hail To The King: Deathbat is Coming to iOS on October 16th Posted by Jessica Fisher on September 19th, 2014 [ permalink ] Just in time for Halloween, on October 16 Avenged Sevenfold will be launching | Read more »
Talisman Has Gone Universal – Can Now be...
Talisman Has Gone Universal – Can Now be Played on the iPhone Posted by Jessica Fisher on September 19th, 2014 [ permalink ] | Read more »
Tap Army Review
Tap Army Review By Jennifer Allen on September 19th, 2014 Our Rating: :: SHOOT EM ALLUniversal App - Designed for iPhone and iPad Mindless but fun, Tap Army is a lane-based shooter that should help you relieve some stress.   | Read more »
Monsters! Volcanoes! Loot! Epic Island f...
Monsters! Volcanoes! Loot! | Read more »
Plunder Pirates: Tips, Tricks, Strategie...
Ahoy There, Seadogs: Interested in knowing our thoughts on all this plundering and pirating? Check out our Plunder Pirates Review! Have you just downloaded the rather enjoyable pirate-em-up Plunder Pirates and are in need of some assistance? Never... | Read more »
Goat Simulator Review
Goat Simulator Review By Lee Hamlet on September 19th, 2014 Our Rating: :: THE GRUFFEST OF BILLY GOATSUniversal App - Designed for iPhone and iPad Unleash chaos as a grumpy goat in this humorous but short-lived casual game.   | Read more »
A New and Improved Wunderlist is Here fo...
A New and Improved Wunderlist is Here for iOS 8 Posted by Jessica Fisher on September 19th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Evernote Update for iOS 8 Adds Web Clipp...
Evernote Update for iOS 8 Adds Web Clipping, Quick Notes, and More Posted by Ellis Spice on September 19th, 2014 [ permalink ] | Read more »
Apple Names Ultimate Productivity Bundl...
Apple Names Ultimate Productivity Bundle by Readdle as the Essential Bundle on the App Store Posted by Jessica Fisher on September 19th, 2014 [ permalink | Read more »

Price Scanner via MacPrices.net

Updated Price Trackers
We’ve updated our Mac Price Trackers with the latest information on prices, bundles, and availability on systems from Apple’s authorized internet/catalog resellers: - 15″ MacBook Pros - 13″ MacBook... Read more
Mac Pros available for up to $260 off MSRP
Adorama has Mac Pros on sale for up to $260 off MSRP. Shipping is free, and Adorama charges sales tax in NY & NJ only: - 4-core Mac Pro: $2839.99, $160 off MSRP - 6-core Mac Pro: $3739.99, $260... Read more
13-inch 2.6GHz/256GB Retina MacBook Pros avai...
B&H Photo has the 13″ 2.6GHz/256GB Retina MacBook Pro on sale for $1379 including free shipping plus NY sales tax only. Their price is $120 off MSRP. Read more
Previous-generation 15-inch 2.0GHz Retina Mac...
B&H Photo has leftover previous-generation 15″ 2.0GHz Retina MacBook Pros now available for $1599 including free shipping plus NY sales tax only. Their price is $400 off original MSRP. B&H... Read more
21″ 2.7GHz iMac available for $1179, save $12...
Adorama has 21″ 2.7GHz Hawell iMacs on sale for $1179.99 including free shipping. Their price is $120 off MSRP. NY and NJ sales tax only. Read more
iOS 8 Adoption Rate Slower than iOS 7, 6, Hit...
Apple began pushing out iOS 8 updates to eligible devices around 1pm ET on September 17, 2014. However, unlike with iOS 7, which boasted a wide variety of differences from its predecessor iOS 6, in... Read more
LIkely Final Definitive OS X 10.9.5 Mavericks...
Apple has released what will almost certainly be the last incremental version number update of OS X 10.9 Mavericks (save for futire security updates) before OS X 10.10 Yosemite is released next month... Read more
Fingerprints, Apple Pay and Identity Theft Wa...
On Sep 9th, CEO Tim Cook unveiled Apple Pay, along with the new iPhone 6 and iWatch. Apple Pay is a newly developed technology that utilizes a near field communication (NFC) to enable customer... Read more
Amazon Introduces Two All-New Kindles
Amazon on Thursday introduced the 7th generation of its Kindle dedicated e-reader device: Kindle Voyage, its top-of-the-line e-reader, and the new $79 Kindle, with a 20% faster processor, twice the... Read more
Save up to $300 on the price of a new Mac wit...
Purchase a new Mac or iPad at The Apple Store for Education and take up to $300 off MSRP. All teachers, students, and staff of any educational institution qualify for the discount. Shipping is free,... Read more

Jobs Board

*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
Project Manager, *Apple* Financial Services...
**Job Summary** Apple Financial Services (AFS) offers consumers, businesses and educational institutions ways to finance Apple purchases. We work with national and Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.