TweetFollow Us on Twitter

Sep 00 Online

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: MacTech Online

PDF and XML

by Jeff Clites <online@mactech.com>

Last month we covered Adobe's Portable Document Format (PDF), focusing on how it relates to Quartz, Apple's new imaging model. In brief, PDF originated as a simplification of PostScript, retaining PostScript's primitive graphics operators while discarding its programming-language constructs and adding file and document structure specifications. Quartz (specifically, the Core Graphics Rendering API) is again based on this same set of operators, making it natural to "record" graphics operations into a PDF file, and just as natural to "play back" a PDF into a series of native drawing instructions. At its simplest, PDF is the new PICT; more interestingly, the Quartz imaging model is at the center of all 2-D graphics on Mac OS X, providing a centralized facility for rending drawing commands from different APIs (such as QuickDraw) into different output formats, be they destined for the screen, a printer, or a file.

Of course, part of the beauty of Quartz is that it frees the programmer from having to worry about the details of this process. At the same time, Quartz is certain to increase the popularity of PDF, and in particular expand its use beyond just a format for traditional documents. Accordingly, it will be to a programmer's advantage to know as much as possible about PDF, and to be aware of its strengths and its weaknesses.

As touched on above, PDF defines a file format in addition to a graphics model. In the abstract, a PDF file describes a tree of objects, with a significant separation between document content and document layout. This should send off bells in a developer's head, because it sounds similar to XML, and it's natural to wonder how deep this connection is-to ask questions like, "can a PDF document be represented in XML." The short answer is "probably not", but it's interesting to investigate the parallels between the two formats.

Intersections with XML

PDF and XML are similar in that they define a file structure which is designed to encapsulate a wide range of data in a fairly generalized, hierarchical fashion. Although PDF is designed to be extensible, it does define an interpretation for the information it contains, and it's not clear how well current PDF-rendering applications would handle PDF documents with content which they don't recognize. XML is at the other extreme. At its core, it says nothing about the semantics of the data which it can contain, and it's often used as a format for information which isn't naturally thought of as a "document." But given its generality, it would certainly be possible to devise an XML-based format to encapsulate page-descriptions in a manner similar to PDF. On the other hand, there are several facilities of PDF which are not easily mimicked using XML-features dealing more with practical performance issues than with conceptual structure.

PDF was designed to be a final format, so that PDFs represent finished documents, rather than in-progress works (such as word-processing documents) which will be extensively changed. Still, it is possible to make limited modifications to PDFs, and interestingly this can be done by appending the "change" information to the end of a PDF, without requiring the entire document to be rewritten. This makes it convenient to prepare an initial document and at a later stage add annotations or hyperlinks. This approach also provides a measure of safety, as previous versions of a document can be recovered simply by truncating the changes off the end, and modifications cannot cause complete corruption of the base document. This also means that it is possible to modify large documents without large resource requirements.

Despite XML's flexibility, it isn't possible to create a well-formed XML document by appending information directly to another document, because of the requirement that there be a single root element. (It is possible to work around this limitation, but only by splitting the document into multiple files.) Additionally, PDF documents frequently encapsulate binary data (such as images or compressed text), and it is not convenient to embed such data into XML documents directly-XML is a text-based format, and binary data could be interpreted as markup, or mangled if the document is converted to a different character encoding. XML-based formats traditionally handle this by storing the data in a separate file which is then referenced from the base document, just as images are included in HTML files. This is less convenient than PDF's single-file approach. (It would be possible to include binary data in XML documents by converting it into a text-based representation, such as Base-64 encoding, but this tends to offset the benefits of compression.) Finally, PDF has a higher structural flexibility, in that logical containment is not always represented by physical containment. In other words, structures which logically contain other objects may do so by referencing the objects by name, whereas in XML such containment is almost always represented by physically nesting elements. This flexibility allows the same PDF to be represented in different ways, so that for example a PDF file may be optimized for page-at-a-time delivery over the internet, or alternatively it could be created in a single-pass by a printer driver.

FOP

So despite the current popularity of XML, it isn't likely that PDF is going to be superceded any time soon. So where do PDF and XML intersect? Well, as we observed before, it's natural to think of XML as unformatted data, and to think of PDF as an output format. The preferred way to get from XML to something with formatting is by way of XSL Transformations (XSLT). In the case of XML-to-PDF transformation, there's a tool to help with the process, FOP. (It's part of the Apache XML project.) To use FOP, you first use an XSLT processor to convert your XML document into a tree of formatting objects, which may itself be represented as an XML document. This is where you determine the form of your final document. Since, as mentioned above, XML documents are traditionally devoid of formatting information and are often viewed as pure data, any decisions about how this information will be presented must be encapsulated in the style sheet. Once this is done, and you have your tree of formatting objects, you feed this into FOP, which produces your final PDF. FOP is very much a work in progress, and does not yet support all of the formatting objects defined in the XSL specification, but even as-is it appears quite useful. IBM has an informative tutorial on transforming XML documents. (A free registration is required to access the tutorial.) It discusses using FOP to create PDF documents, and in addition shows you how to generate SVG (Scalable Vector Graphics), which is useful for creating things like charts and graphs from XML-encapsulated data.

OmniPDF

Finally, while you're playing with PDF, be sure to check out OmniPDF if you are running Mac OS X. It's a very cool PDF viewer. It's still under development, but it's Cocoa-native (and hence Mac-OS-X-native), and it really shows off the power of Quartz, as it uses Core Graphics Rendering to do its magic. (OmniPDF is from the Omni Group, who also created OmniWeb, which is currently the only Cocoa-native web browser available. You should check it out also-it's a refreshing alternative, and it has many fun features which set it apart from your usual browser choices.)

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Firefox 37.0 - Fast, safe Web browser. (...
Firefox offers a fast, safe Web browsing experience. Browse quickly, securely, and effortlessly. With its industry-leading features, Firefox is the choice of Web development professionals and casual... Read more
Arq 4.11 - Online backup to Google Drive...
Arq is super-easy online backup for the Mac. Back up to your own Google Drive storage (15GB free storage), your own Amazon Glacier ($.01/GB per month storage) or S3, or any SFTP server. Arq backs up... Read more
MacFamilyTree 7.3.4 - Create and explore...
MacFamilyTree gives genealogy a facelift: it's modern, interactive, incredibly fast, and easy to use. We're convinced that generations of chroniclers would have loved to trade in their genealogy... Read more
Yummy FTP 1.10.2 - FTP/SFTP/FTPS client...
Yummy FTP is an FTP + SFTP + FTPS file transfer client which focuses on speed, reliability and productivity. Whether you need to transfer a few files or a few thousand, schedule automatic backups, or... Read more
VueScan 9.5.08 - Scanner software with a...
VueScan is a scanning program that works with most high-quality flatbed and film scanners to produce scans that have excellent color fidelity and color balance. VueScan is easy to use, and has... Read more
Iridient Developer 3.0.1 - Powerful imag...
Iridient Developer (was RAW Developer) is a powerful image conversion application designed specifically for OS X. Iridient Developer gives advanced photographers total control over every aspect of... Read more
Air Video Server HD 2.1.0 - Stream video...
Air Video Server HD streams videos instantly from your computer on your iPhone, iPad, iPod touch or Apple TV. No need to worry about converting or transferring files. We took everything that was... Read more
Duplicate Annihilator 5.7.5 - Find and d...
Duplicate Annihilator takes on the time-consuming task of comparing the images in your iPhoto library using effective algorithms to make sure that no duplicate escapes. Duplicate Annihilator... Read more
BusyContacts 1.0.2 - Fast, efficient con...
BusyContacts is a contact manager for OS X that makes creating, finding, and managing contacts faster and more efficient. It brings to contact management the same power, flexibility, and sharing... Read more
Capture One Pro 8.2.0.82 - RAW workflow...
Capture One Pro 8 is a professional RAW converter offering you ultimate image quality with accurate colors and incredible detail from more than 300 high-end cameras -- straight out of the box. It... Read more

2K Announces WWE 2K, Mobile's First...
It seems like this month has been pretty big for wrestling. First Wrestlemania, then 2K has announces that they're releasing  WWE 2K for iOS. It's a simulation-based WWE game where you'll get to play with several WWE superstars such as John Cena, ... | Read more »
How the Apple Watch Could Change the Fac...
The Apple Watch is still a ways out, but my previous musings on the wearable’s various features got me thinking: what might it be like a year after launch? Two years? Five years? What if it becomes a symbiotic part of the iOS framework to the point... | Read more »
Pie In The Sky: A Pizza Odyssey (Games)
Pie In The Sky: A Pizza Odyssey 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: A game about delivering pizza. In space. | Read more »
Chosen Gives Hopeful Singers, Songwriter...
If YouTube videos and reality TV shows like The Voice have taught us one thing, it’s that there are a lot of people out there who are anxious to show the world their talents. And if they’ve taught us a second thing, it’s that there’s an almost... | Read more »
Android's Popular OfficeSuite Now A...
Once only available for Android devices, OfficeSuite has finally landed on the app store. The Mobile Systems app lets you view, edit, create, and share Word, Excel, and PowerPoint documents as well as convert them to/from PDFs. It's touted as being... | Read more »
Warhammer: Arcane Magic is Coming Soon,...
Turbo Tape Games has announced that they're joining forces with Games Workshop to bring the turn-based strategy board game, Warhammer: Arcane Magic, to life on the iOS. | Read more »
Fast & Furious: Legacy's Creati...
| Read more »
N-Fusion and 505's Ember is Totally...
| Read more »
These are All the Apple Watch Apps and G...
The Apple Watch is less than a month from hitting store shelves, and once you get your hands on it you're probably going to want some apps and games to install. Fear not! We've compiled a list of all the Apple Watch apps and games we've been able to... | Read more »
Appy to Have Known You - Lee Hamlet Look...
Being at 148Apps these past 2 years has been an awesome experience that has taught me a great deal, and working with such a great team has been a privilege. Thank you to Rob Rich, and to both Rob LeFebvre and Jeff Scott before him, for helping me... | Read more »

Price Scanner via MacPrices.net

Adobe Brings Powerful Layout-Design Capabilit...
Adobe today announced the availability of Adobe Comp CC, a free iPad app that enables rapid creation of layout concepts for mobile, Web and print projects. With Comp CC, designers can rough out and... Read more
Apple offering refurbished 27-inch 5K iMacs f...
The Apple Store is offering Apple Certified Refurbished 27″ 3.5GHz 5K iMacs for $2119 including free shipping. Their price is $380 off the price of new models, and it’s the lowest price available for... Read more
16GB iPad mini on sale for $199, save $50
Walmart has 16GB iPad minis (1st generation) available for $199.99 on their online store, including free shipping. Their price is $50 off MSRP. Online orders only. Read more
New 128GB MacBook Airs on sale for $50 off MS...
 B&H Photo has 128GB 11″ and 13″ 2015 MacBook Airs on sale today for $50 off MSRP including free shipping plus NY sales tax only: - 11″ 1.6GHz/128GB MacBook Air (Model #MJVM2LL/A): $849 $50 off... Read more
13-inch 2.6GHz Retina MacBook Pro (refurbishe...
The Apple Store has Apple Certified Refurbished 13″ 2.6GHz/128GB Retina MacBook Pros available for $979 including free shipping. Original MSRP for this model was $1299. Read more
Save up to $600 with Apple refurbished Mac Pr...
The Apple Store is offering Apple Certified Refurbished Mac Pros for up to $600 off the cost of new models. An Apple one-year warranty is included with each Mac Pro, and shipping is free. The... Read more
Samsung Galaxy S 6 and Galaxy S 6 edge U.S. P...
Samsung Electronics America, Inc. has announced the Galaxy S 6 and Galaxy S 6 edge will be available in the U.S. beginning April 10, with pre-orders being accepted now. “We have completely reimagined... Read more
13-inch 2.5GHz MacBook Pro (refurbished) avai...
The Apple Store has Apple Certified Refurbished 13″ 2.5GHz MacBook Pros available for $829, or $270 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.... Read more
Save up to $80 on iPad Air 2s, NY tax only, f...
 B&H Photo has iPad Air 2s on sale for $80 off MSRP including free shipping plus NY sales tax only: - 16GB iPad Air 2 WiFi: $469.99 $30 off - 64GB iPad Air 2 WiFi: $549.99 $50 off - 128GB iPad... Read more
iMacs on sale for up to $205 off MSRP
B&H Photo has 21″ and 27″ iMacs on sale for up to $205 off MSRP including free shipping plus NY sales tax only: - 21″ 1.4GHz iMac: $1019 $80 off - 21″ 2.7GHz iMac: $1189 $110 off - 21″ 2.9GHz... Read more

Jobs Board

DevOps Software Engineer - *Apple* Pay, iOS...
**Job Summary** Imagine what you could do here. At Apple , great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
Sr. Technical Services Consultant, *Apple*...
**Job Summary** Apple Professional Services (APS) has an opening for a senior technical position that contributes to Apple 's efforts for strategic and transactional Read more
Lead *Apple* Solutions Consultant - Retail...
**Job Summary** Job Summary The Lead ASC is an Apple employee who serves as the Apple business manager and influencer in a hyper-business critical Reseller's store Read more
*Apple* Pay - Site Reliability Engineer - Ap...
**Job Summary** Imagine what you could do here. At Apple , great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.