Oct 98 Online
Volume Number: 14 (1998)
Issue Number: 10
Column Tag: MacTech Online
Getting It In Print
by Jeff Clites, email@example.com
Not too many years ago a document was, almost by definition, something which lived on paper. Although the print-based heritage remains, today we think of digital documents just as often. Interestingly, at the same time the distinction between a document and the information it contains has blurred, and where it was once clear (though possibly arbitrary) where one document ends and another begins, it is now more difficult to define. This month we're going to focus on the concept of the document, from several different perspectives.
PostScript and the Mac OS have had an interesting relationship, and one which is still continuing to evolve. Apple's LaserWriters were among the first PostScript printers to hit the market, and they helped spark the desktop publishing revolution. The first generation of Apple's new operating system, OS X Server (formerly Rhapsody), will inherit NeXT's use of Display PostScript as its imaging model, later to be replaced by Apple's own resolution-independent graphics subsystem, Extended QuickDraw. Even so, it is likely that PostScript will remain the lingua franca of high-end printing for a while to come, and as a developer it can be helpful to have at least a passing acquaintance with the language and concepts - and under the current Mac OS it is almost essential if you want to print rotated text, for instance.
In actuality, PostScript is a programming language, and a PostScript document is really a program - it just happens to be a program which focusses on producing graphics on a page (a page-description language). The official home of PostScript is of course Adobe's site, and programmers wishing to learn the language can find a wealth of technotes there, along with a bibliography and links to other resources on the web. Beginners can start at A First Guide to PostScript, which includes an operator reference, and you can download an archive containing the entire site for easy off-line viewing. The second place to stop is the RightBrain Software site, where the author of Thinking in PostScript, now out of print, has generously provided a PDF version of the book for free download. And no developer should be without Ghostscript, a free interpreter which allows PostScript files to be viewed, printed to non-PostScript printers, or converted to PDF format.
- Adobe's Technical Notes for Developers
- A First Guide to PostScript
- RightBrain Software
- Ghostscript, Ghostview and Gsview
Adobe's Portable Document Format (PDF) has become the standard for digital documentation which is platform-independent and which prints beautifully in addition to being easily viewable on-screen. Like PostScript, PDF is a final format, as opposed to a revisable format, meaning that it is intended as a format for a finished document rather than for a file which will be edited further. (Although it is possible to modify both sorts of documents after they are generated, and it is easier with PDF documents than with PostScript files.) While PostScript documents are really programs, PDFs are more naturally thought of as object databases. Apple has indicated that PDF will play an important role as a graphics format in OS X, so it may become important for developers to have at least a passing acquaintance with its inner workings.
For those interested in looking under the hood of PDF now, there are a number of independent web sites devoted to the topic, and Adobe's site has the freely available PDF specification and links to other sources of information. The two best sites to get you started are Acrobuddies and the PDF Research Companion. They have a number of documents which explain the structure of PDF files, as well has how to use them or produce them from within an application.
- Adobe PDF
- PDF Research Companion
TeX is a typesetting system invented by Donald Knuth. In his words, it is "intended for the creation of beautiful books - and especially for books that contain a lot of mathematics." It is widely used today to produce technical publications, most often using the LaTeX macro package. In brief, an author prepares a plain text document in the TeX language, and later compiles it into final form (for instance, as PostScript).
Much of the hassle of preparing large documents is lessened by TeX-not only can you quickly set complicated equation, but also it is very easy to build an index or reformat for different paper sizes or for output in different fonts, for example. Its chief strength may be that it gives you fine control over the appearance of you document when you want it, and handles things automatically when you don't. There are also numerous add-on packages to extend the language, for instance allowing you to create complex graphics or display chemical structures.
TeX is somewhat complex to learn, but many swear by it. It does show its Unix heritage, and the chief hurdle for a Macintosh user may be getting used to a TeX system which usually consists of several different applications and associated files rather than a single application. The place to start is unquestionably the Mac TeX/LaTeX Software Page, with links and descriptions of most of the TeX resources available to Mac users. Among other things, this site has current information on the two most popular shareware packages, CMacTeX and OzTeX, and recommendations on references for beginners.
- Macintosh TeX/LaTeX Software Page
- TeX Frequently Asked Questions
The Standard Generalized Markup Language (SGML) is the final step of abstraction of a document, where content is completely separated from presentation. Like HTML, stretches of plain text are tagged by text within angle brackets, but the idea is to label the logical structure of the document rather than its appearance - content rather than form. How an SGML document is displayed, if it is displayed at all, is decided separately. (Note that HTML 4.0 is explicitly defined as an SGML application.)
The upshot of this rigid separation is that the same document can be retargetted to different methods of output (or different uses) without modification. So, the exact same source can be used to produce an HTML version of a document for publication on the web, a PDF version for distribution on CD, a TeX version for producing printed documentation, and a plain-text version for posting to a newsgroup. In some cases, the target may not be formatted output at all - the markup may be used to tag information for entry into a database.
ArborText has an excellent introduction to SGML and much of the terminology surrounding it, and A Gentle Introduction to SGML provides a more detailed description. XML for Managers describes XML, a subset of SGML developed for web deployment. To delve deeper, the SGML/XML web page is the comprehensive source of information available on the web. There are many books relating to SGML technology (see the last-mentioned site for a large bibliography), but Parseme.1st: SGML for Software Developers by Sean McGrath (ISBN: 0134889673) is a clear and well-illustrated introduction aimed at developers.
- SGML: Getting Started
- A Gentle Introduction to SGML
- XML for Managers
- The SGML/XML Web Page
Finally, I want to mention the Simple Document Format (SDF). It is simpler and less general than SGML, but its goal is to provide an author-friendly markup language for producing documents which can be output in multiple formats from the same source. (Supported output formats include HTML, PostScript, PDF, LaTeX, SGML, POD, RTF, and plain text.) It is Perl-based, and the author has indicated that MacPerl-specific instruction are forthcoming. If your goal is to find a simple way to produce documentation with a minimum of duplicated effort, this may be a good place to start.
- SDF - The Author-friendly Markup Language
These and oodles of other links are available from the MacTech Online web pages at www.mactech.com/online/.