TweetFollow Us on Twitter

PDFLib

Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Programming Techniques

Lasso 3.5

by Kas Thomas

A great freeware library makes adding PDF support to an app easy

Adobe's Portable Document Format (PDF) has become a de facto standard for electronic document interchange, based on its ability to deliver graphically rich, structured content in a consistent manner across multiple operating environments. Almost every large web site offers at least some PDF-based content, making the Acrobat Reader one of the most popular downloads on the web. (Incredibly, Adobe claims to average some 100,000 downloads of the Reader from its web site per day.)

Because of its support for vector graphics, font embedding, hypertext links, and other advanced features, PDF is a powerful, far-reaching document standard. But that also means it's a relatively complex standard (for details, see the September 1999 MacTech) - and therefore far from trivial to support in an application.

From a programming standpoint, one can talk about two types of PDF support: support for PDF reading (import), and support for PDF writing (export). As with TIFF, QuickTime, and many other complex formats, it's much easier to provide write support than read support, because a comprehensive PDF-read capability means implementing the entire rather ponderous PDF specification (see http://partners.adobe.com/asn/developer/PDFS/TN/PDFSPEC.PDF), whereas a write-only facility may mean implementing only a tiny subset of the PDF spec - the subset of particular interest to your application. For example, if your application primarily outputs ASCII text, there is no need to implement graphics-embedding, halftoning, transfer functions, etc., in order to support PDF output.

Adding a well-defined PDF-output capability to an application can be surprisingly quick and easy, if you make full use of existing tools. For this article, I decided to add PDF export capability to BBEdit (the popular text editor), with the aid of a third-party freeware PDF library called PDFLib. Source code for the BBEdit plug-in accompanies this article. (The complete CW Pro 5 project, including PDFLib and its source files, can be found online at ftp://www.mactech.com.) But before we start talking code, let's take a moment to review the basics of the PDF format, then look at what kinds of development paths one might take to arrive at a PDF-export capability, and what sorts of tools are currently available to make the programmer's life easier.

PDF Fundamentals

Adobe's Portable Document Format is a kind of gigantic, special-purpose markup language, based largely on Postscript (the postfix-notation page description language) but lacking Postscript's control-flow constructs. PDF is a sort of "unrolled" version of Postscript, in which all graphics operations are inline (rather than relying on loops) and therefore speedy. Lookups and indexing operations are likewise fast because of PDF's extensive use of associative arrays (or "dictionaries," in Adobe parlance), organized into treelike structures in which all nodes have forward and/or back-pointers to other nodes; plus, every leaf (of every kind) has an entry in a giant 'xref' table, so that the offset of any object can be looked up instantly.

Pages are organized into sets of objects that describe a page's resources and content. The objects are human-readable ASCII and look like:

4 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 8 0 R
/MediaBox [0 0 612 792]
/Contents [5 0 R ]
>>
endobj

In this case, the top line tells us we're dealing with Object No. 4, revision zero. The object is a dictionary object, as indicated by the double angle brackets, << and >>, enclosing the object. The first entry in the dictionary is a label telling the type of dictionary (in this case, a Page). The next label/value pair is a backpointer to the parent of this object, namely Object No. 1. (A reference ending in 'R', such as 1 0 R, is a pointer to an object.) The next entry tells where the page's resources can be found (namely, in Object No. 8.) The MediaBox entry gives the page's dimensions, in points (72 points to the inch); here, 612 by 792 means that we're dealing with a standard U.S. Letter-size page (8.5 by 11 inches). The final entry, in the above example, shows where the page's Contents (probably a stream object) can be found, namely in Object No. 5.

When text needs to be displayed on a page, it is packaged inside a stream object. The stream object will contain the actual ASCII or Unicode strings that need to be displayed, along with various Postscript-like operators, such as m for "moveto" and TL for "set leading," that control the stroking, filling, and positioning of the individual letters or glyphs.

When all of the objects in a PDF file have been written, a cross-reference ('xref') table must be inserted into the file. The entries in the 'xref' table must conform to a fixed format (see my article in MacTech for September 1999) and they must contain the exact byte offset from the start of the file to the object referenced by the entry in question. The integrity of the PDF document rests on the accuracy of the byte offsets stored in the 'xref' table. Since most of these offsets aren't known until the objects are written, the 'xref' table usually goes at the end of the file. (This isn't always the case, however. So-called "linearized" or "optimized" PDF files have an 'xref' table at the beginning of the file.)

Several things should be obvious by now. First, there is nothing freeform about a PDF file. Unlike HTML, a PDF file is highly structured, with many pointers between objects. Byte offsets matter a great deal and must be accounted for when the file is written. Secondly, PDF files are largely self-contained, bringing with them their own font resources and embedded graphics (rather than linking external resources). Thirdly, to write a PDF file means lots of string manipulations - something that, frankly, ANSI C is a little weak at (compared to, say, Perl). Beyond that, the PDF specification itself (currently contained in a 518-page, 160,000-word document) can be difficult to read and interpret. Supporting PDF export in an application written in C can be a bit tedious, to say the least.

Third-Party Libraries

It helps, in a situation like this, to be able to call on help from third parties, rather than reinvent the wheel yourself. Fortunately, some excellent tools are available to make your life easier. Among the general-purpose libraries are available for adding PDF handling capabilities to applications are:

  1. Adobe's PDF Library, also known as PDFL40; for use with Code Warrior on the Mac, Visual C++ 5.0 on Windows platforms, and gcc 2.8 on Sparc Solaris.
  2. The CLibPDF Library, by FastIO Systems (http://www.fastio.com); an ANSI C library, compilable on just about any platform.
  3. PDFLib, by Thomas Merz (http://www.pdflib.com); a C library, with bindings for C++, Java, Perl, Python, Tcl, and Visual BASIC.

If you're a Perl user, you'll want to check out PDF-on-the-Fly, a Perl library available from the University of Nottingham (http://www.ep.cs.nott.ac.uk/pdf-pl/download/manual.pdf), as well as txt2pdf, a library from Sanface Software (sanface@sanface.com).

Adobe's PDFL40 is without question the most powerful and robust library available, relying as it does on the Acrobat 4.0 codebase. With PDFL40, you can read, display, and write PDF from your own application. But unfortunately, PDFL40 isn't free - and even if you can afford the licensing fees, you may not be allowed to use the library. As stated in Adobe's literature, PDFL40 is selectively licensed to developers who are creating "products that are strategic to Adobe's marketing plans." In other words, Adobe will review your development plans carefully, and if they like what you're doing and if you agree to play by Adobe's rules, you may be allowed to pay to use the library.

Outside of Adobe, the two best-known C libraries for PDF support are CLibPDF, by FastIO Systems, and Thomas Merz's PDFLib. Both come with full source code and can be used without restriction (or virtually without restriction) by individual developers who are creating freeware or personal-use software. (Corporate users and commercial developers must take out a license, at significant cost.) The main restriction of these libraries is that they support PDF output only. They will not help you read PDF or display a PDF document on the screen. The same is true for the two Perl libraries: PDF-on-the-Fly and txt2pdf are basically write-only. If you need to put PDF up on the screen, you'll probably want to look into an open-source program called Ghostscript (http://www.cs.wisc.edu/~ghost/index.html), which started as a freeware PostScript interpreter, written in 1988 by L. Peter Deutsch, founder of Aladdin Systems. Starting with version 3.3, Ghostscript has been able to read and display PDF files in addition to PostScript documents. With version 4.0, Ghostscript added Postscript-to-PDF conversion (i.e., Distiller functionality). Because the code is generic C, Ghostscript has been successfully ported to most platforms, including Win32, OS/2, MacOS, Unix, Amiga, VAX, etc. (An excellent PDF-based manual for Ghostscript is available from Thomas Merz; see http://www.muc.de/~tm.)

CLibPDF and PDFLib are similar in their capabilities. Their differences are summed up in Table 1. Both are extremely easy to set up and use. Of the two, CLibPDF is the more advanced package in terms of the number of features and overall performance. CLibPDF has roughly 170 library routines to PDFLib's 88. Many of CLibPDF's routines provide advanced graphics capabilities involving setting up Cartesian coordinate axes (linear or logarithmic) and plotting data (including data stored in external files). CLibPDF was designed to make it easy for people who need to generate 2D plots to create attractive graphs on-the-fly in PDF, without passing the data through an intermediary application such as Matlab. In this, it excels.

PDFLib ClibPDF
Full source code available? Yes Yes
Documentation 64 pp. 75 pp.
API calls, total 88 170
Image formats supported Gif,Tiff.JPEG,CCITT JPEG
Font metrics formats AFM/PFA PFM/PFB
Thread safe? Yes Yes
Bindings for scripting languages? Yes No
Font embedding? Yes Yes
Font subsetting? No No
Compression? No Flate only
Text-justification option? No Yes
Vector graphics functions? Yes Yes
Custom graph plotting? No Yes
Annotations? Yes Yes
Bookmarks? Yes Yes
Hypertext links? Yes Yes
Form widgets? No No
Reenter pages after writing? No Yes

Table 1. Comparison of PDFLib and CLibPDF.

CLibPDF is also the clear winner in terms of benchmark scores. In a test (conducted by a corporate user) involving the construction of an intricate 156-page document filled with engineering information, CLibPDF produced a 257,027-byte PDF file in just 15 seconds. By comparison, Adobe's Distiller took over three minutes to produce a 197,548-byte file; Adobe's PDF Library took 54 seconds to create a 284,365-byte file; and PDFLib took 84 seconds to yield a 1,314,084-byte finished document. The filesize disparity is due to the fact that PDFLib uses no text compression, whereas the others do. (Adobe uses a combination of LZW and Flate compression. To avoid patent infringement issues, CLibPDF uses only Flate compression.)

Wherefore PDFLib?

Why would anybody use PDFLib? For one thing, it's the only library that comes with ready-made bindings for Perl, Python, Tcl, Java, and (on Win32) Visual BASIC. This is incredibly important if you're a web developer who needs to be able to serve dynamic PDF - PDF pages generated automatically, on the fly - for web clients. Dynamic PDF pages (via Perl, say) are easily possible using PDFLib. All you have to do is link the PDFLib shared library with the PerlStub file (which is part of the MacPerl distribution suite) and follow the calling conventions given in Thomas Merz's excellent documentation (which has example code listings for all the different bindings).

But what about the big file sizes? you ask. It's true that, as of yet, PDFLib does not have any compression support - for text. For imagery, PDFLib supports JPEG, GIF, TIFF and CCITT bitmaps, all of which are compressed. (Acrobat Reader handles the decompression automatically.) CLibPDF, on the other hand, only handles JPEG embedding, unless you pay the license fee ($1,000), in which case you can get TIFF support (among other features).

PDFLib's lack of text compression can result in big files if you're mainly outputting big gobs of text. But if you will be serving dynamic PDF web pages (or creating other fairly small text files), you won't suffer for not having compression, since small text streams often compress poorly - or even grow, rather than shrink - at pack-down time.

It turns out PDFLib is ideal for generating small to medium-sized text-based PDF documents, because - unlike Adobe's own products - PDFLib won't automatically embed fonts or font subsets for any of the standard 14 core Type 1 fonts that are included with Acrobat Reader (the Helvetica, Times, and Courier families, plus Zapf Dingbats and Symbol). This can be important, because although a small PDF file may or may not shrink significantly with compression turned on, it will definitely grow when fonts are embedded unnecessarily.

Another reason to use PDFLib is that it's nominally smaller and easier to learn than CLibPDF (although the latter is by no means hard to work with). And should you later need to port your code to a scripting language, you can reuse your code with very little work.

Adding PDF Export to BBEdit

BBEdit (by Bare Bones Software) is one of the most popular ASCII editors on the Mac. Features like regex-based (regular expression) search-and-replace, robust HTML tools, and neck-snapping performance have endeared BBEdit to thousands of loyal users. But when it comes to producing eyepleasing output, BBEdit isn't exactly a killer app. Wouldn't it be nice to be able to save BBEdit documents as PDF files now and then? PDF is easier to look at (and print out) than raw ASCII, any day.

It's not hard to add PDF export to BBEdit, because like so many software products these days, BBEdit supports a plug-in API that allows third-party programmers access to the main program's data. The BBEdit plug-in API is well documented and has hooks to many utility functions for retrieving the text from documents, manipulating user selections, etc. Space doesn't permit a full tutorial on writing BBEdit plug-ins here. However, we will have space to run through the 200 or so lines of C required for a short plug-in that lets the user save an open BBEdit document as a PDF file.

The Code

The BBEdit plug-in interface requires that we compile an old-fashioned Code Resource of type 'BBXT' and creator 'R*ch'. (The creator type can be anything you want, but if you stay with 'R*ch', your plug-in will have the icon associated with BBEdit extensions.) Note that the name of your 'BBXT' resource (not the filename of your plug-in) is the name that will appear in the BBEdit "Tools" menu at runtime.

The main() routine for our PDF-Output plug-in, shown in Listing 1, is typical of most BBEdit extensions. It shows that our resource is called with a pointer to a BBEdit structure called the ExternalCallbackBlock; a WindowPtr associated with the frontmost user window; a long int containing various flag values to convey information about the state in which BBEdit is in; and pointers to AppleEvents. All we do in main() is call EnterCodeResource(), check our flags (and the WindowPtr, for validity), then call bbxtGetWindowContents() - which retrieves a Handle to all the text in the frontmost (active) document - before handing the text off to our filtering routine. When we're done, we call ExitCodeResource() and that's all she wrote. Easy as pi.


Listing 1: main( )

main( )
pascal OSErr main(ExternalCallbackBlock *callbacks, 
			WindowPtr w, 
			long flags, 
			AppleEvent *event, AppleEvent *reply)
{
	OSErr	err = noErr;

	EnterCodeResource();

	{
		Handle text;
		WindowPtr newWindow;

		if (!w || (xfWindowOpen & flags == 0) 
			return err;

		text = bbxtGetWindowContents(callbacks,w);

		err = pdfTranslate(callbacks,text,w); // write pdf

	} 

	ExitCodeResource();

	return err;
}

We don't actually do anything with the AppleEvent pointers in this example. In a real plug-in, these pointers would be the mechanism by which your plug-in could be controlled through OS-level scripts. Most of the time, though, these pointers will be nil. In all versions of BBEdit Lite, for example, the pointers are always nil.

The real heavy lifting occurs in Listing 2, where our PDFLib routines get called. Before using any other PDFLib routines, we call PDF_new() to initialize the library. (This results in a number of large data structures being allocated and filled out for us, behind the scenes. The principal data structure is something called, appropriately, a PDF. A pointer to this data structure must be passed to every library routine so that PDFLib can keep track of the PDF document's state.) At the end of the routine, before exiting, we call PDF_close() to close the connection to the PDFLib library, freeing all resources that were allocated earlier.


Listing 2: pdfTranslate( )

pdfTranslate( )

OSErr pdfTranslate( ExternalCallbackBlock *callbacks, 
			Handle theText,
			WindowPtr w ) {


	PDF *p = nil;
	int font,j;
	long i,linecount,textLength;
	OSErr err = noErr;
	Boolean timeForNewPage = false; // sentinel
	char *input,
			filename[32],
			buf[TAB_VALUE *CHARS_WIDE];
	unsigned char *out = buf;
	char okLineEnders[] = "- ;:>";

	p = PDF_new();  
	if (p == nil) return -1;

	HLockHi(theText);

	FudgeName(callbacks,filename,w); // create outfile name

	// open the new PDF file 
	if (PDF_open_file(p,(char *)filename )==-1){
		fprintf(stderr,"Error:cannot open temp.pdf file.\n");
		exit(2);
	}

  // these lines are optional:
	PDF_set_info(p,"Creator","BBEdit PDF Exporter plug-in");
	PDF_set_info(p,"Author","Kas Thomas");
	PDF_set_info(p,"Title","Hello world!");

	PDF_begin_page(p,letter_width,letter_height); // start a page

	// find a base-14 font
	font = PDF_findfont(p,"Times-Roman","default",0);
		if (font ==-1){
		fprintf(stderr,"Couldn't set font!\n");
		HUnlock(theText);
		exit(3);
	}

	PDF_setfont(p,font,FONTSIZE); // set font & size
	PDF_set_leading(p, LEADING);  // set line spacing 

	PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);

	PDF_show(p," ");

	input = *(unsigned char **)theText;

	textLength = GetHandleSize(theText); // how long is our text?

   // for every character...
	for (i = 0,linecount = 1; i < textLength - 1 ; )
	{
	  // fetch the current line...
	 	for (j = 0, out = buf; 
				 j < CHARS_WIDE - 1 && i < textLength - 1;
				 j++) 
	 		{	 		
	 			*out++ = input[i++];

				if (input[i-1] == TAB) { // we must handle Tabs ourselves
					int k;

					for (k = 0; k < TAB_VALUE; k++)
				  	*out++ = SPACE;
				  }

				if (input[i-1] == CARRIAGE_RETURN)   // break on CR
					goto TerminateLine;
			}

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

		TerminateLine:

		*out = 0x00; // make it a C string

		PDF_continue_text(p,buf); // write to PDF file

		if (linecount++ % LINES_PER_PAGE == 0) { // end of page?
			PDF_end_page(p);
			timeForNewPage = true;
			}

		if (timeForNewPage && i < textLength - 1) { // more to do
			timeForNewPage = false;
			PDF_begin_page(p,letter_width,letter_height); // new page
			PDF_setfont(p,font,FONTSIZE);
			PDF_set_leading(p, LEADING);
			PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);
			PDF_show(p," ");
			}

	}   // for i

	PDF_end_page(p);	// close page
	PDF_close(p);		// close PDF obj

	HUnlock(theText);

	return err;
}

The PDFLib routine PDF_open_file() will create a new file for us (in the current directory) if we pass it a pointer to a PDF struct along with a pointer to a filename string. Note that the filename string must be a C string. We create the necessary string (consisting of the original file's name, plus the extension ".pdf") in a custom utility routine, FudgeName(). See Listing 3.

After creating our (empty) output file, we make three calls to PDF_set_info(), to set the file's Creator, Author, and Title. These strings will show up when the user does a Get Info on the PDF document while viewing it in Acrobat Reader. It is not strictly necessary to call PDF_set_info(), since PDF files are not required to have "Get Info" info; but PDFLib makes creating these tags easy. (Again, though, note the use of C strings rather than Pascal strings.)


Listing 3: FudgeName()

FudgeName( )
// Get the current BBEdit file's name, add ".pdf" to it, put it in 'str' as a C string.

void FudgeName(ExternalCallbackBlock *cb, 
					unsigned char *str, WindowPtr w) 
{
 	Str255 fName;
	short v;
	long d;
	long length;
	char ending[] = { '.','p','d','f', 0x00 };

	bbxtGetDocInfo(cb,w,fName,&v,&d); 
	length = *fName;	// Pascal string

   // now we create a C string:
	BlockMove(fName+1,str, length);
	BlockMove(ending,str+length,5);	
}

To begin a PDF page, we call - what else? - PDF_begin_page() with, in this case, the predefined values letter_width and letter_height, which correspond to the dimensions of a standard U.S. letter-sized page. (PDFLib also has predefined constants for A4, legal, and many other page sizes. Or you can use your own custom dimensions.)

Next, we come to one of the most important calls in this or any routine that uses the PDFLib library. Namely, we do:

 font = PDF_findfont(p,"Times-Roman","default",0);

The purpose of this call is to locate font resources for our document and specify an encoding for the font. (Here, we're guaranteed to get a valid return value, since Times-Roman is one of the base-14 fonts that Acrobat Reader can always use.) Allowable encoding values are built-in, pdfdoc, macroman, macexpert, winansi or default (see Section 3.4.2 of Thomas Merz's excellent PDFLib manual). In our case, we're content to let PDFLib determine the most suitable encoding based on the environment, so we indicate this by setting the third argument to "default." (The encoding must be specified as a C string.)

The return value from PDF_findfont(), if not equal to -1 (an error), will be needed in subsequent calls involving typesetting parameters, such as PDF_setfont() and PDF_set_leading(). It's important to understand that the value returned by PDF_findfont() is not an enumerated value or an index into a fixed lookup table. Rather, it's an index into the font cache of one particular PDF document. If you're working with two documents, one may store Times-Roman in its cache at a different index than the other; hence, PDF_findfont() may return two different values for the same font, based on the font's use in two different files. Don't just assume that if PDF_findfont() returns '1' for Helvetica, that therefore Helvetica will always be referenced by a font value of '1'. It may only be '1' for one file, in one particular context.

Having gotten a valid return value from PDF_findfont(), we use that value in a call to PDF_setfont(), which attaches the font resource to the PDF file and also lets us specify the point size of the font. The point size can be any floating-point value: 24.0 for a small headline, say, or 10.0 to 12.0 for regular body copy, etc. (Fractional values like 13.4 are fine, too.) We can similarly set the line spacing with PDF_set_leading(). Typically, the leading is close in value to the point size of the text. If you specify the leading as 1.2 times the point size, you won't go far wrong. (For double-spaced text, try 3.0 or 4.0 times the point size.)

The library function PDF_set_text_pos() lets us position our "pen" or insertion point at any x-y position on the page. Here, you have to remember that in the PDF coordinate space, (0,0) corresponds to the lower left corner of the page, with 'y' increasing in the up direction. Also, recall that in the PDF world, the default unit of space is the typesetter's point, which is 1/72-inch. Thus, if you want to begin writing at a distance of one inch from the left edge of the page and ten inches up from the bottom, you would specify coordinates of (72, 720).

To write text on a PDF page, you can either make repeated calls to PDF_set_text_pos() and PDF_show(), specifying new line-start coordinates every time, or else make one call to PDF_show(), followed by repeated calls to PDF_continue_text(). The latter function automatically repositions the insertion point to the start of a new line, using the left-margin and leading parameters that you've already specified. This can be more convenient than keeping track of line depths yourself. To keep our main loop from having to be a do-while loop, we make a dummy call to PDF_show() with a value of "  " before entering the loop. Then, inside the loop, we just make repeated calls to PDF_continue_text().

The Main Loop

Our main loop, which is actually a double nested loop, deserves comment. The outer loop counts individual characters and makes sure that we loop over all the characters in the source document, stopping only when we've gotten to the end of the file. The inner loop fetches one line of text at a time, writing to a line buffer, 'buf', which is conservatively sized at a fixed size of TAB_VALUE * CHARS_WIDE. In a real application, you'd determine CHARS_WIDE dynamically, based (perhaps) on the point size of the text or some other metric. For this short demo, we've hard-coded the type size at 9.0 points and the line width, CHARS_WIDE, at 80 via #defines. The reason our line buffer has to be sized at TAB_VALUE * CHARS_WIDE is that it's conceivable that we could encounter a pathological line of "text" where every character is a Tab. If a Tab is equal to five spaces, our line buffer had better be 400 bytes in capacity rather than just 80, or else we'll overwrite the buffer.

Inside the inner loop, as we gather characters into a "line" of text, we have to handle Tabs ourselves, converting ASCII 0x09 (the Tab character - which is a non-printing ASCII value) to spaces. We also check for end-of-line characters ourselves. In true Mac-centric manner, we ignore linefeeds and consider every newline to be equal to ASCII 0x0D (carriage return). Of course, text files created on a Unix machine won't conform to this assumption, since in the Unix world newlines tend to be ASCII 0x0A (linefeed). In the DOS and Windows worlds, lines end with both a linefeed and a carriage return: 0x0D0A.

Our inner loop is constructed in such a way that when the number of characters read equals CHARS_WIDE, we bail out and write the line to the PDF file, but in addition, we bail out any time a hard return (carriage return) is encountered. This lets us handle both traditional Mac text files (in which lines are soft-wrapped to the screen, with carriage returns coming only once per paragraph) as well as DOS-style documents in which every single line (not just the paragraph) ends with a hard return.

The fact that there are two ways to fall out of the inner loop has interesting consequences. Obviously, if we encounter a hard return, there's no question about what to do: we immediately write the line out to the file. But if we fall out of the main loop because our line has begun to exceed CHARS_WIDE characters, it's possible (likely, in fact) that we've bailed out in the middle of a word! Hence, we have to insert some contingency code to read to the end of the current word. The code that does this looks like:

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

The standard C function strchr() checks to see if the second argument (a character) occurs anywhere in the first argument (a string). It returns NULL on a miss and non-NULL on a match.

If we fall out of the loop because of a hard return, we don't need the above code. Therefore we can skip around it with (ugh) a goto. There are probably better ways (stylistically) to handle this situation, but in the interest of clarity, I decided to keep the goto, for now.

Once we're out of the loop, we have to remember to make our line a C string (i.e., we must null-terminate it); then we can call PDF_continue_text(p,buf) to write the line. All that remains is to check the number of lines written, to see if it's time for a new page, and if so, start a new page. Here, it's important to note that every call to PDF_begin_page() results in PDFLib resetting its graphic state, which means we need to specify our type size, leading, and cursor-position values all over again. If you forget to do this, you'll be wondering where all the text went on the second and subsequent pages of your PDF document.

When we're done, we call PDF_close(), unlock our text handle, and return to the calling routine. Using PDF_close() actually not only frees up our library-invoked resources but also closes any working files we've left open. So at this point, we can consider our work done, and control can return to the host process, in this case BBEdit.

Enhancements

In a real-world BBEdit extension, it would be a good idea not only to get serious about error-checking but also consider such things as a user preferences dialog and support for Apple Events (which should include a mechanism for suppressing dialogs, so that scripted operations aren't hung up in midstream by unattended dialogs). Also, the main loop should be wrapped with the BBEdit API's bbxtStartProgress() and bbxtDoneProgress() calls, and the inner loop should contain one call to bbxtDoProgress() for every line of text processed, so that the user knows how things are progressing. BBEdit will display a progress thermometer automatically, suppressing it for short-duration events, if you use these calls.

BBEdit's plug-in API also has some handy convenience routines for dealing with Apple Events. For example, consider what you can do with the following three lines:

	bbxtFindApplication(cb,'CARO', &appFSS);
	err = bbxtLaunchApplication(cb,'CARO',&appFSS,&psn);
	bbxtSendOpenDoc(cb,'CARO', nil, &fss,true);

With the arguments shown, the first call has the effect of searching the BBEdit default disk for the application whose signature is 'CARO' - namely, Acrobat Reader. The second function launches that application, and the third function sends it an 'odoc' event, instructing the app to open the document specified by the FSSpec pointed to by &fss. In other words, with three lines of code you can make BBEdit launch Acrobat Reader and display your just-created PDF file in a Reader window. To accomplish this with our own custom-written code (properly error-checked) would require at least 200 lines of additional code, doubling the size of our plug-in!

In terms of the PDF-writing portions of the code, there are many possible further enhancements. For example, it would be nice to let the user specify page margins, text size, leading, etc. by means of a setup dialog. Also, you could try justifying the user's text. PDFLib offers functions for controlling character spacing, word spacing, and character widths, with accuracy of a thousandth of an em. (An em is a typesetter's unit, roughly equivalent to the point size of the type.) You can use the PDFLib routine PDF_stringwidth() to find out how wide a given text string is. As an exercise, you might try developing a justification routine that preferentially adjusts word spacing, followed by character spacing, followed by character width, each with its own weighting factor. (For some interesting algorithms here, seek out Don Lancaster's excellent article on "Picojustification" at http://www.tinaja.com/glib/picojust.pdf.)

A PDF-outputting BBEdit plug-in that incorporates some of these features (and others, such as rudimentary HTML tag interpretation) can be found at http://www.acroforms.com.

Conclusion

Thomas Merz's PDFLib library offers an excellent way to get started in PDF programming, combining ease of use with cross-platform and even cross-language portability. It's the only PDF library that can easily be adapted for use with Perl, Python, Tcl, Visual BASIC, and Java, as well as C/C++. It comes with outstanding documentation, plenty of sample code (for all language bindings), and the price - for non-commercial users - can't be beat, since it's free.

Look at it this way: Now you don't have any excuse for not putting PDF support in your applications!


Kas Thomas is a frequent contributor to MacTech and author of a forthcoming O'Reilly book on PDF-based web programming. You can reach him at kt@acroforms.com.

 
AAPL
$102.06
Apple Inc.
+0.48
MSFT
$46.58
Microsoft Corpora
+0.06
GOOG
$588.08
Google Inc.
+3.31

MacTech Search:
Community Search:

Software Updates via MacUpdate

Delivery Status 6.1.2 - Check delivery s...
Delivery Status displays delivery status of packages for a variety of shipment services. Can't wait for your packages to arrive? Don't waste your time checking the site constantly, just open this all... Read more
Mavericks Cache Cleaner 8.0.9 - Clear ca...
Mavericks Cache Cleaner is an award-winning general purpose tool for OS X. MCC makes system maintenance simple with an easy point-and-click interface to many OS X functions. Novice and expert users... Read more
OneNote 15.2.2 - Free digital notebook f...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
Apple Configurator 1.6 - Configure and d...
Apple Configurator makes it easy for anyone to mass configure and deploy iPhone, iPad, and iPod touch in a school, business, or institution. Three simple workflows let you prepare new iOS devices... Read more
SpamSieve 2.9.16 - Robust spam filter fo...
SpamSieve is a robust spam filter for major email clients that uses powerful Bayesian spam filtering. SpamSieve understands what your spam looks like in order to block it all, but also learns what... Read more
OS X Server 3.2.1 - For OS X 10.9.5 Mave...
OS X Server is the next generation of Apple's award winning server software. Designed for OS X and iOS devices, OS X Server makes it easy to share files, schedule meetings, synchronize contacts, host... Read more
Apple Security Update 2014-004 - For OS...
Apple Security Update is recommended for all users and improves the security of Mac OS X. For information on the security content of this update, please visit this website: http://support.apple.com/... Read more
OS X Mavericks 10.9.5 - The latest versi...
Apple OS X Mavericks is the latest release of the world's most advanced desktop operating system. Now free! With more than 200 new features, OS X Mavericks brings Maps and iBooks to the Mac,... Read more
iExplorer 3.5.0.0 - View and transfer al...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
BusyCal 2.6 - Powerful calendar app with...
BusyCal is an award-winning desktop calendar that combines personal productivity features for individuals with powerful calendar sharing capabilities for families and workgroups. BusyCal's unique... Read more

Latest Forum Discussions

See All

TwistedRun Review
TwistedRun Review By Rob Thomas on September 18th, 2014 Our Rating: :: DON'T TWIST YOUR ANKLE!Universal App - Designed for iPhone and iPad TwistedRun is kind of like running up a giant curly fry into the sky. Or maybe that was just... | Read more »
Scope Review
Scope Review By Jennifer Allen on September 18th, 2014 Our Rating: :: LOCATION AWAREiPhone App - Designed for the iPhone, compatible with the iPad Want to easily find photos from around the world based on their location? Scope is a... | Read more »
HipstaFox Review
HipstaFox Review By Jordan Minor on September 18th, 2014 Our Rating: :: FANTASTIC MR. FOXUniversal App - Designed for iPhone and iPad HipstaFox is a great single that makes players long for the whole album.   | Read more »
Ninja Raft (Games)
Ninja Raft 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: ** Special Launch Price ** "Ninja Raft is definitely the game to play if you’re into Tower Defense games and want to play something... | Read more »
Paste+ | Clipboard Action Widget (Produ...
Paste+ | Clipboard Action Widget 1.0.0 Device: iOS iPhone Category: Productivity Price: $3.99, Version: 1.0.0 (iTunes) Description: Powerful clipboard widget for iOS 8. Reimagine what you can do with your clipboard! Paste+ is a... | Read more »
Agenda+ | Calendar & Reminder Widget...
Agenda+ | Calendar & Reminder Widget 1.0.0 Device: iOS iPhone Category: Productivity Price: $1.99, Version: 1.0.0 (iTunes) Description: Best Calendar and Reminder Widget Ever for iOS 8 | Read more »
Leg·end·ar·y (Games)
Leg·end·ar·y 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Simple and yet complex. Leg•end•dar•y is a grid-based puzzle game based on numbers. Discover the adventure of Leg•end•dar•y and... | Read more »
KuaiBoard (formerly QuickBoard) (Utilit...
KuaiBoard (formerly QuickBoard) 1.0 Device: iOS Universal Category: Utilities Price: $1.99, Version: 1.0 (iTunes) Description: KuaiBoard is currently 50% off for launch! Billing info. Signatures. Locations. KuaiBoard allows you to... | Read more »
Treasure Fetch - Adventure Time (Games)
Treasure Fetch - Adventure Time 1.0.1 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.1 (iTunes) Description: Adventure Time Treasure Fetch is a fresh take on the classic Snake game! STRETCHY JAKE Snake your way... | Read more »
Light in the Dark (Games)
Light in the Dark 1.1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.1.0 (iTunes) Description: Be enlightened by this delightful puzzle game that has never before seen the light of day! "A physics and color-blending... | Read more »

Price Scanner via MacPrices.net

Save up to $300 on the price of a new Mac wit...
Purchase a new Mac or iPad at The Apple Store for Education and take up to $300 off MSRP. All teachers, students, and staff of any educational institution qualify for the discount. Shipping is free,... Read more
13-inch 2.8GHz Retina MacBook Pro available f...
B&H Photo has the new 2014 13″ 2.8GHz Retina MacBook Pro on sale for $1699.99 including free shipping plus NY sales tax only. They’ll also include free copies of Parallels Desktop and LoJack for... Read more
16GB iPad Air on sale for $449, save $50
Walmart has the 16GB iPad Air WiFi on sale for $449 on their online store for a limited time. Choose free home shipping or free local store pickup. Their price represents a $50 savings over standard... Read more
13-inch 256GB MacBook Air on sale for $1099,...
B&H Photo has the 2014 13″ 1.4GHz 256GB MacBook Air on sale for $1099.99. Shipping is free, and B&H charges NY sales tax only. Their price is $100 off MSRP. Read more
Toshiba Introduces TransMemory ID High-Speed...
Toshiba’s Digital Products Division (DPD), a division of Toshiba America Information Systems, Inc., today introduced the TransMemory ID USB 3.0 Flash Drive, a simpler storage solution for people who... Read more
New iPads and OS X Yosemite Release Coming Oc...
The DailyDot’s Micah Singleton reports that Apple is planning to hold its next product announcement event on Oct. 21, at which it will unveil the iPad Air 2 and iPad mini 3 and release a final build... Read more
Logitech Bluetooth Multi-Device Cross-Platfor...
Logitech has an enviable track record of making some of the best computer keyboards and mice. At least in my estimation, the best freestanding keyboards I’ve ever used have been Logitech units,... Read more
Roundup of Apple refurbished iPad Airs and iP...
Apple is offering Certified Refurbished iPad Airs for up to $140 off MSRP. Apple’s one-year warranty is included with each model, and shipping is free. Stock tends to come and go with some of these... Read more
Sprint offers 16GB iPad mini for $199.99 with...
Sprint is offering 1st generation 16GB iPad minis for $199.99 with a 2-year service agreement. Standard MSRP for this iPad is $429. Their price is the lowest available for this model. Read more
2.5GHz Mac mini remains on sale for $549, sav...
B&H Photo has the 2.5GHz Mac mini on sale for $549.99 including free shipping. That’s $50 off MSRP, and B&H will also include a free copy of Parallels Desktop software. NY sales tax only. Read more

Jobs Board

Project Manager, *Apple* Financial Services...
**Job Summary** Apple Financial Services (AFS) offers consumers, businesses and educational institutions ways to finance Apple purchases. We work with national and Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.