TweetFollow Us on Twitter

PDFLib

Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Programming Techniques

Lasso 3.5

by Kas Thomas

A great freeware library makes adding PDF support to an app easy

Adobe's Portable Document Format (PDF) has become a de facto standard for electronic document interchange, based on its ability to deliver graphically rich, structured content in a consistent manner across multiple operating environments. Almost every large web site offers at least some PDF-based content, making the Acrobat Reader one of the most popular downloads on the web. (Incredibly, Adobe claims to average some 100,000 downloads of the Reader from its web site per day.)

Because of its support for vector graphics, font embedding, hypertext links, and other advanced features, PDF is a powerful, far-reaching document standard. But that also means it's a relatively complex standard (for details, see the September 1999 MacTech) - and therefore far from trivial to support in an application.

From a programming standpoint, one can talk about two types of PDF support: support for PDF reading (import), and support for PDF writing (export). As with TIFF, QuickTime, and many other complex formats, it's much easier to provide write support than read support, because a comprehensive PDF-read capability means implementing the entire rather ponderous PDF specification (see http://partners.adobe.com/asn/developer/PDFS/TN/PDFSPEC.PDF), whereas a write-only facility may mean implementing only a tiny subset of the PDF spec - the subset of particular interest to your application. For example, if your application primarily outputs ASCII text, there is no need to implement graphics-embedding, halftoning, transfer functions, etc., in order to support PDF output.

Adding a well-defined PDF-output capability to an application can be surprisingly quick and easy, if you make full use of existing tools. For this article, I decided to add PDF export capability to BBEdit (the popular text editor), with the aid of a third-party freeware PDF library called PDFLib. Source code for the BBEdit plug-in accompanies this article. (The complete CW Pro 5 project, including PDFLib and its source files, can be found online at ftp://www.mactech.com.) But before we start talking code, let's take a moment to review the basics of the PDF format, then look at what kinds of development paths one might take to arrive at a PDF-export capability, and what sorts of tools are currently available to make the programmer's life easier.

PDF Fundamentals

Adobe's Portable Document Format is a kind of gigantic, special-purpose markup language, based largely on Postscript (the postfix-notation page description language) but lacking Postscript's control-flow constructs. PDF is a sort of "unrolled" version of Postscript, in which all graphics operations are inline (rather than relying on loops) and therefore speedy. Lookups and indexing operations are likewise fast because of PDF's extensive use of associative arrays (or "dictionaries," in Adobe parlance), organized into treelike structures in which all nodes have forward and/or back-pointers to other nodes; plus, every leaf (of every kind) has an entry in a giant 'xref' table, so that the offset of any object can be looked up instantly.

Pages are organized into sets of objects that describe a page's resources and content. The objects are human-readable ASCII and look like:

4 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 8 0 R
/MediaBox [0 0 612 792]
/Contents [5 0 R ]
>>
endobj

In this case, the top line tells us we're dealing with Object No. 4, revision zero. The object is a dictionary object, as indicated by the double angle brackets, << and >>, enclosing the object. The first entry in the dictionary is a label telling the type of dictionary (in this case, a Page). The next label/value pair is a backpointer to the parent of this object, namely Object No. 1. (A reference ending in 'R', such as 1 0 R, is a pointer to an object.) The next entry tells where the page's resources can be found (namely, in Object No. 8.) The MediaBox entry gives the page's dimensions, in points (72 points to the inch); here, 612 by 792 means that we're dealing with a standard U.S. Letter-size page (8.5 by 11 inches). The final entry, in the above example, shows where the page's Contents (probably a stream object) can be found, namely in Object No. 5.

When text needs to be displayed on a page, it is packaged inside a stream object. The stream object will contain the actual ASCII or Unicode strings that need to be displayed, along with various Postscript-like operators, such as m for "moveto" and TL for "set leading," that control the stroking, filling, and positioning of the individual letters or glyphs.

When all of the objects in a PDF file have been written, a cross-reference ('xref') table must be inserted into the file. The entries in the 'xref' table must conform to a fixed format (see my article in MacTech for September 1999) and they must contain the exact byte offset from the start of the file to the object referenced by the entry in question. The integrity of the PDF document rests on the accuracy of the byte offsets stored in the 'xref' table. Since most of these offsets aren't known until the objects are written, the 'xref' table usually goes at the end of the file. (This isn't always the case, however. So-called "linearized" or "optimized" PDF files have an 'xref' table at the beginning of the file.)

Several things should be obvious by now. First, there is nothing freeform about a PDF file. Unlike HTML, a PDF file is highly structured, with many pointers between objects. Byte offsets matter a great deal and must be accounted for when the file is written. Secondly, PDF files are largely self-contained, bringing with them their own font resources and embedded graphics (rather than linking external resources). Thirdly, to write a PDF file means lots of string manipulations - something that, frankly, ANSI C is a little weak at (compared to, say, Perl). Beyond that, the PDF specification itself (currently contained in a 518-page, 160,000-word document) can be difficult to read and interpret. Supporting PDF export in an application written in C can be a bit tedious, to say the least.

Third-Party Libraries

It helps, in a situation like this, to be able to call on help from third parties, rather than reinvent the wheel yourself. Fortunately, some excellent tools are available to make your life easier. Among the general-purpose libraries are available for adding PDF handling capabilities to applications are:

  1. Adobe's PDF Library, also known as PDFL40; for use with Code Warrior on the Mac, Visual C++ 5.0 on Windows platforms, and gcc 2.8 on Sparc Solaris.
  2. The CLibPDF Library, by FastIO Systems (http://www.fastio.com); an ANSI C library, compilable on just about any platform.
  3. PDFLib, by Thomas Merz (http://www.pdflib.com); a C library, with bindings for C++, Java, Perl, Python, Tcl, and Visual BASIC.

If you're a Perl user, you'll want to check out PDF-on-the-Fly, a Perl library available from the University of Nottingham (http://www.ep.cs.nott.ac.uk/pdf-pl/download/manual.pdf), as well as txt2pdf, a library from Sanface Software (sanface@sanface.com).

Adobe's PDFL40 is without question the most powerful and robust library available, relying as it does on the Acrobat 4.0 codebase. With PDFL40, you can read, display, and write PDF from your own application. But unfortunately, PDFL40 isn't free - and even if you can afford the licensing fees, you may not be allowed to use the library. As stated in Adobe's literature, PDFL40 is selectively licensed to developers who are creating "products that are strategic to Adobe's marketing plans." In other words, Adobe will review your development plans carefully, and if they like what you're doing and if you agree to play by Adobe's rules, you may be allowed to pay to use the library.

Outside of Adobe, the two best-known C libraries for PDF support are CLibPDF, by FastIO Systems, and Thomas Merz's PDFLib. Both come with full source code and can be used without restriction (or virtually without restriction) by individual developers who are creating freeware or personal-use software. (Corporate users and commercial developers must take out a license, at significant cost.) The main restriction of these libraries is that they support PDF output only. They will not help you read PDF or display a PDF document on the screen. The same is true for the two Perl libraries: PDF-on-the-Fly and txt2pdf are basically write-only. If you need to put PDF up on the screen, you'll probably want to look into an open-source program called Ghostscript (http://www.cs.wisc.edu/~ghost/index.html), which started as a freeware PostScript interpreter, written in 1988 by L. Peter Deutsch, founder of Aladdin Systems. Starting with version 3.3, Ghostscript has been able to read and display PDF files in addition to PostScript documents. With version 4.0, Ghostscript added Postscript-to-PDF conversion (i.e., Distiller functionality). Because the code is generic C, Ghostscript has been successfully ported to most platforms, including Win32, OS/2, MacOS, Unix, Amiga, VAX, etc. (An excellent PDF-based manual for Ghostscript is available from Thomas Merz; see http://www.muc.de/~tm.)

CLibPDF and PDFLib are similar in their capabilities. Their differences are summed up in Table 1. Both are extremely easy to set up and use. Of the two, CLibPDF is the more advanced package in terms of the number of features and overall performance. CLibPDF has roughly 170 library routines to PDFLib's 88. Many of CLibPDF's routines provide advanced graphics capabilities involving setting up Cartesian coordinate axes (linear or logarithmic) and plotting data (including data stored in external files). CLibPDF was designed to make it easy for people who need to generate 2D plots to create attractive graphs on-the-fly in PDF, without passing the data through an intermediary application such as Matlab. In this, it excels.

PDFLib ClibPDF
Full source code available? Yes Yes
Documentation 64 pp. 75 pp.
API calls, total 88 170
Image formats supported Gif,Tiff.JPEG,CCITT JPEG
Font metrics formats AFM/PFA PFM/PFB
Thread safe? Yes Yes
Bindings for scripting languages? Yes No
Font embedding? Yes Yes
Font subsetting? No No
Compression? No Flate only
Text-justification option? No Yes
Vector graphics functions? Yes Yes
Custom graph plotting? No Yes
Annotations? Yes Yes
Bookmarks? Yes Yes
Hypertext links? Yes Yes
Form widgets? No No
Reenter pages after writing? No Yes

Table 1. Comparison of PDFLib and CLibPDF.

CLibPDF is also the clear winner in terms of benchmark scores. In a test (conducted by a corporate user) involving the construction of an intricate 156-page document filled with engineering information, CLibPDF produced a 257,027-byte PDF file in just 15 seconds. By comparison, Adobe's Distiller took over three minutes to produce a 197,548-byte file; Adobe's PDF Library took 54 seconds to create a 284,365-byte file; and PDFLib took 84 seconds to yield a 1,314,084-byte finished document. The filesize disparity is due to the fact that PDFLib uses no text compression, whereas the others do. (Adobe uses a combination of LZW and Flate compression. To avoid patent infringement issues, CLibPDF uses only Flate compression.)

Wherefore PDFLib?

Why would anybody use PDFLib? For one thing, it's the only library that comes with ready-made bindings for Perl, Python, Tcl, Java, and (on Win32) Visual BASIC. This is incredibly important if you're a web developer who needs to be able to serve dynamic PDF - PDF pages generated automatically, on the fly - for web clients. Dynamic PDF pages (via Perl, say) are easily possible using PDFLib. All you have to do is link the PDFLib shared library with the PerlStub file (which is part of the MacPerl distribution suite) and follow the calling conventions given in Thomas Merz's excellent documentation (which has example code listings for all the different bindings).

But what about the big file sizes? you ask. It's true that, as of yet, PDFLib does not have any compression support - for text. For imagery, PDFLib supports JPEG, GIF, TIFF and CCITT bitmaps, all of which are compressed. (Acrobat Reader handles the decompression automatically.) CLibPDF, on the other hand, only handles JPEG embedding, unless you pay the license fee ($1,000), in which case you can get TIFF support (among other features).

PDFLib's lack of text compression can result in big files if you're mainly outputting big gobs of text. But if you will be serving dynamic PDF web pages (or creating other fairly small text files), you won't suffer for not having compression, since small text streams often compress poorly - or even grow, rather than shrink - at pack-down time.

It turns out PDFLib is ideal for generating small to medium-sized text-based PDF documents, because - unlike Adobe's own products - PDFLib won't automatically embed fonts or font subsets for any of the standard 14 core Type 1 fonts that are included with Acrobat Reader (the Helvetica, Times, and Courier families, plus Zapf Dingbats and Symbol). This can be important, because although a small PDF file may or may not shrink significantly with compression turned on, it will definitely grow when fonts are embedded unnecessarily.

Another reason to use PDFLib is that it's nominally smaller and easier to learn than CLibPDF (although the latter is by no means hard to work with). And should you later need to port your code to a scripting language, you can reuse your code with very little work.

Adding PDF Export to BBEdit

BBEdit (by Bare Bones Software) is one of the most popular ASCII editors on the Mac. Features like regex-based (regular expression) search-and-replace, robust HTML tools, and neck-snapping performance have endeared BBEdit to thousands of loyal users. But when it comes to producing eyepleasing output, BBEdit isn't exactly a killer app. Wouldn't it be nice to be able to save BBEdit documents as PDF files now and then? PDF is easier to look at (and print out) than raw ASCII, any day.

It's not hard to add PDF export to BBEdit, because like so many software products these days, BBEdit supports a plug-in API that allows third-party programmers access to the main program's data. The BBEdit plug-in API is well documented and has hooks to many utility functions for retrieving the text from documents, manipulating user selections, etc. Space doesn't permit a full tutorial on writing BBEdit plug-ins here. However, we will have space to run through the 200 or so lines of C required for a short plug-in that lets the user save an open BBEdit document as a PDF file.

The Code

The BBEdit plug-in interface requires that we compile an old-fashioned Code Resource of type 'BBXT' and creator 'R*ch'. (The creator type can be anything you want, but if you stay with 'R*ch', your plug-in will have the icon associated with BBEdit extensions.) Note that the name of your 'BBXT' resource (not the filename of your plug-in) is the name that will appear in the BBEdit "Tools" menu at runtime.

The main() routine for our PDF-Output plug-in, shown in Listing 1, is typical of most BBEdit extensions. It shows that our resource is called with a pointer to a BBEdit structure called the ExternalCallbackBlock; a WindowPtr associated with the frontmost user window; a long int containing various flag values to convey information about the state in which BBEdit is in; and pointers to AppleEvents. All we do in main() is call EnterCodeResource(), check our flags (and the WindowPtr, for validity), then call bbxtGetWindowContents() - which retrieves a Handle to all the text in the frontmost (active) document - before handing the text off to our filtering routine. When we're done, we call ExitCodeResource() and that's all she wrote. Easy as pi.


Listing 1: main( )

main( )
pascal OSErr main(ExternalCallbackBlock *callbacks, 
			WindowPtr w, 
			long flags, 
			AppleEvent *event, AppleEvent *reply)
{
	OSErr	err = noErr;

	EnterCodeResource();

	{
		Handle text;
		WindowPtr newWindow;

		if (!w || (xfWindowOpen & flags == 0) 
			return err;

		text = bbxtGetWindowContents(callbacks,w);

		err = pdfTranslate(callbacks,text,w); // write pdf

	} 

	ExitCodeResource();

	return err;
}

We don't actually do anything with the AppleEvent pointers in this example. In a real plug-in, these pointers would be the mechanism by which your plug-in could be controlled through OS-level scripts. Most of the time, though, these pointers will be nil. In all versions of BBEdit Lite, for example, the pointers are always nil.

The real heavy lifting occurs in Listing 2, where our PDFLib routines get called. Before using any other PDFLib routines, we call PDF_new() to initialize the library. (This results in a number of large data structures being allocated and filled out for us, behind the scenes. The principal data structure is something called, appropriately, a PDF. A pointer to this data structure must be passed to every library routine so that PDFLib can keep track of the PDF document's state.) At the end of the routine, before exiting, we call PDF_close() to close the connection to the PDFLib library, freeing all resources that were allocated earlier.


Listing 2: pdfTranslate( )

pdfTranslate( )

OSErr pdfTranslate( ExternalCallbackBlock *callbacks, 
			Handle theText,
			WindowPtr w ) {


	PDF *p = nil;
	int font,j;
	long i,linecount,textLength;
	OSErr err = noErr;
	Boolean timeForNewPage = false; // sentinel
	char *input,
			filename[32],
			buf[TAB_VALUE *CHARS_WIDE];
	unsigned char *out = buf;
	char okLineEnders[] = "- ;:>";

	p = PDF_new();  
	if (p == nil) return -1;

	HLockHi(theText);

	FudgeName(callbacks,filename,w); // create outfile name

	// open the new PDF file 
	if (PDF_open_file(p,(char *)filename )==-1){
		fprintf(stderr,"Error:cannot open temp.pdf file.\n");
		exit(2);
	}

  // these lines are optional:
	PDF_set_info(p,"Creator","BBEdit PDF Exporter plug-in");
	PDF_set_info(p,"Author","Kas Thomas");
	PDF_set_info(p,"Title","Hello world!");

	PDF_begin_page(p,letter_width,letter_height); // start a page

	// find a base-14 font
	font = PDF_findfont(p,"Times-Roman","default",0);
		if (font ==-1){
		fprintf(stderr,"Couldn't set font!\n");
		HUnlock(theText);
		exit(3);
	}

	PDF_setfont(p,font,FONTSIZE); // set font & size
	PDF_set_leading(p, LEADING);  // set line spacing 

	PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);

	PDF_show(p," ");

	input = *(unsigned char **)theText;

	textLength = GetHandleSize(theText); // how long is our text?

   // for every character...
	for (i = 0,linecount = 1; i < textLength - 1 ; )
	{
	  // fetch the current line...
	 	for (j = 0, out = buf; 
				 j < CHARS_WIDE - 1 && i < textLength - 1;
				 j++) 
	 		{	 		
	 			*out++ = input[i++];

				if (input[i-1] == TAB) { // we must handle Tabs ourselves
					int k;

					for (k = 0; k < TAB_VALUE; k++)
				  	*out++ = SPACE;
				  }

				if (input[i-1] == CARRIAGE_RETURN)   // break on CR
					goto TerminateLine;
			}

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

		TerminateLine:

		*out = 0x00; // make it a C string

		PDF_continue_text(p,buf); // write to PDF file

		if (linecount++ % LINES_PER_PAGE == 0) { // end of page?
			PDF_end_page(p);
			timeForNewPage = true;
			}

		if (timeForNewPage && i < textLength - 1) { // more to do
			timeForNewPage = false;
			PDF_begin_page(p,letter_width,letter_height); // new page
			PDF_setfont(p,font,FONTSIZE);
			PDF_set_leading(p, LEADING);
			PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);
			PDF_show(p," ");
			}

	}   // for i

	PDF_end_page(p);	// close page
	PDF_close(p);		// close PDF obj

	HUnlock(theText);

	return err;
}

The PDFLib routine PDF_open_file() will create a new file for us (in the current directory) if we pass it a pointer to a PDF struct along with a pointer to a filename string. Note that the filename string must be a C string. We create the necessary string (consisting of the original file's name, plus the extension ".pdf") in a custom utility routine, FudgeName(). See Listing 3.

After creating our (empty) output file, we make three calls to PDF_set_info(), to set the file's Creator, Author, and Title. These strings will show up when the user does a Get Info on the PDF document while viewing it in Acrobat Reader. It is not strictly necessary to call PDF_set_info(), since PDF files are not required to have "Get Info" info; but PDFLib makes creating these tags easy. (Again, though, note the use of C strings rather than Pascal strings.)


Listing 3: FudgeName()

FudgeName( )
// Get the current BBEdit file's name, add ".pdf" to it, put it in 'str' as a C string.

void FudgeName(ExternalCallbackBlock *cb, 
					unsigned char *str, WindowPtr w) 
{
 	Str255 fName;
	short v;
	long d;
	long length;
	char ending[] = { '.','p','d','f', 0x00 };

	bbxtGetDocInfo(cb,w,fName,&v,&d); 
	length = *fName;	// Pascal string

   // now we create a C string:
	BlockMove(fName+1,str, length);
	BlockMove(ending,str+length,5);	
}

To begin a PDF page, we call - what else? - PDF_begin_page() with, in this case, the predefined values letter_width and letter_height, which correspond to the dimensions of a standard U.S. letter-sized page. (PDFLib also has predefined constants for A4, legal, and many other page sizes. Or you can use your own custom dimensions.)

Next, we come to one of the most important calls in this or any routine that uses the PDFLib library. Namely, we do:

 font = PDF_findfont(p,"Times-Roman","default",0);

The purpose of this call is to locate font resources for our document and specify an encoding for the font. (Here, we're guaranteed to get a valid return value, since Times-Roman is one of the base-14 fonts that Acrobat Reader can always use.) Allowable encoding values are built-in, pdfdoc, macroman, macexpert, winansi or default (see Section 3.4.2 of Thomas Merz's excellent PDFLib manual). In our case, we're content to let PDFLib determine the most suitable encoding based on the environment, so we indicate this by setting the third argument to "default." (The encoding must be specified as a C string.)

The return value from PDF_findfont(), if not equal to -1 (an error), will be needed in subsequent calls involving typesetting parameters, such as PDF_setfont() and PDF_set_leading(). It's important to understand that the value returned by PDF_findfont() is not an enumerated value or an index into a fixed lookup table. Rather, it's an index into the font cache of one particular PDF document. If you're working with two documents, one may store Times-Roman in its cache at a different index than the other; hence, PDF_findfont() may return two different values for the same font, based on the font's use in two different files. Don't just assume that if PDF_findfont() returns '1' for Helvetica, that therefore Helvetica will always be referenced by a font value of '1'. It may only be '1' for one file, in one particular context.

Having gotten a valid return value from PDF_findfont(), we use that value in a call to PDF_setfont(), which attaches the font resource to the PDF file and also lets us specify the point size of the font. The point size can be any floating-point value: 24.0 for a small headline, say, or 10.0 to 12.0 for regular body copy, etc. (Fractional values like 13.4 are fine, too.) We can similarly set the line spacing with PDF_set_leading(). Typically, the leading is close in value to the point size of the text. If you specify the leading as 1.2 times the point size, you won't go far wrong. (For double-spaced text, try 3.0 or 4.0 times the point size.)

The library function PDF_set_text_pos() lets us position our "pen" or insertion point at any x-y position on the page. Here, you have to remember that in the PDF coordinate space, (0,0) corresponds to the lower left corner of the page, with 'y' increasing in the up direction. Also, recall that in the PDF world, the default unit of space is the typesetter's point, which is 1/72-inch. Thus, if you want to begin writing at a distance of one inch from the left edge of the page and ten inches up from the bottom, you would specify coordinates of (72, 720).

To write text on a PDF page, you can either make repeated calls to PDF_set_text_pos() and PDF_show(), specifying new line-start coordinates every time, or else make one call to PDF_show(), followed by repeated calls to PDF_continue_text(). The latter function automatically repositions the insertion point to the start of a new line, using the left-margin and leading parameters that you've already specified. This can be more convenient than keeping track of line depths yourself. To keep our main loop from having to be a do-while loop, we make a dummy call to PDF_show() with a value of "  " before entering the loop. Then, inside the loop, we just make repeated calls to PDF_continue_text().

The Main Loop

Our main loop, which is actually a double nested loop, deserves comment. The outer loop counts individual characters and makes sure that we loop over all the characters in the source document, stopping only when we've gotten to the end of the file. The inner loop fetches one line of text at a time, writing to a line buffer, 'buf', which is conservatively sized at a fixed size of TAB_VALUE * CHARS_WIDE. In a real application, you'd determine CHARS_WIDE dynamically, based (perhaps) on the point size of the text or some other metric. For this short demo, we've hard-coded the type size at 9.0 points and the line width, CHARS_WIDE, at 80 via #defines. The reason our line buffer has to be sized at TAB_VALUE * CHARS_WIDE is that it's conceivable that we could encounter a pathological line of "text" where every character is a Tab. If a Tab is equal to five spaces, our line buffer had better be 400 bytes in capacity rather than just 80, or else we'll overwrite the buffer.

Inside the inner loop, as we gather characters into a "line" of text, we have to handle Tabs ourselves, converting ASCII 0x09 (the Tab character - which is a non-printing ASCII value) to spaces. We also check for end-of-line characters ourselves. In true Mac-centric manner, we ignore linefeeds and consider every newline to be equal to ASCII 0x0D (carriage return). Of course, text files created on a Unix machine won't conform to this assumption, since in the Unix world newlines tend to be ASCII 0x0A (linefeed). In the DOS and Windows worlds, lines end with both a linefeed and a carriage return: 0x0D0A.

Our inner loop is constructed in such a way that when the number of characters read equals CHARS_WIDE, we bail out and write the line to the PDF file, but in addition, we bail out any time a hard return (carriage return) is encountered. This lets us handle both traditional Mac text files (in which lines are soft-wrapped to the screen, with carriage returns coming only once per paragraph) as well as DOS-style documents in which every single line (not just the paragraph) ends with a hard return.

The fact that there are two ways to fall out of the inner loop has interesting consequences. Obviously, if we encounter a hard return, there's no question about what to do: we immediately write the line out to the file. But if we fall out of the main loop because our line has begun to exceed CHARS_WIDE characters, it's possible (likely, in fact) that we've bailed out in the middle of a word! Hence, we have to insert some contingency code to read to the end of the current word. The code that does this looks like:

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

The standard C function strchr() checks to see if the second argument (a character) occurs anywhere in the first argument (a string). It returns NULL on a miss and non-NULL on a match.

If we fall out of the loop because of a hard return, we don't need the above code. Therefore we can skip around it with (ugh) a goto. There are probably better ways (stylistically) to handle this situation, but in the interest of clarity, I decided to keep the goto, for now.

Once we're out of the loop, we have to remember to make our line a C string (i.e., we must null-terminate it); then we can call PDF_continue_text(p,buf) to write the line. All that remains is to check the number of lines written, to see if it's time for a new page, and if so, start a new page. Here, it's important to note that every call to PDF_begin_page() results in PDFLib resetting its graphic state, which means we need to specify our type size, leading, and cursor-position values all over again. If you forget to do this, you'll be wondering where all the text went on the second and subsequent pages of your PDF document.

When we're done, we call PDF_close(), unlock our text handle, and return to the calling routine. Using PDF_close() actually not only frees up our library-invoked resources but also closes any working files we've left open. So at this point, we can consider our work done, and control can return to the host process, in this case BBEdit.

Enhancements

In a real-world BBEdit extension, it would be a good idea not only to get serious about error-checking but also consider such things as a user preferences dialog and support for Apple Events (which should include a mechanism for suppressing dialogs, so that scripted operations aren't hung up in midstream by unattended dialogs). Also, the main loop should be wrapped with the BBEdit API's bbxtStartProgress() and bbxtDoneProgress() calls, and the inner loop should contain one call to bbxtDoProgress() for every line of text processed, so that the user knows how things are progressing. BBEdit will display a progress thermometer automatically, suppressing it for short-duration events, if you use these calls.

BBEdit's plug-in API also has some handy convenience routines for dealing with Apple Events. For example, consider what you can do with the following three lines:

	bbxtFindApplication(cb,'CARO', &appFSS);
	err = bbxtLaunchApplication(cb,'CARO',&appFSS,&psn);
	bbxtSendOpenDoc(cb,'CARO', nil, &fss,true);

With the arguments shown, the first call has the effect of searching the BBEdit default disk for the application whose signature is 'CARO' - namely, Acrobat Reader. The second function launches that application, and the third function sends it an 'odoc' event, instructing the app to open the document specified by the FSSpec pointed to by &fss. In other words, with three lines of code you can make BBEdit launch Acrobat Reader and display your just-created PDF file in a Reader window. To accomplish this with our own custom-written code (properly error-checked) would require at least 200 lines of additional code, doubling the size of our plug-in!

In terms of the PDF-writing portions of the code, there are many possible further enhancements. For example, it would be nice to let the user specify page margins, text size, leading, etc. by means of a setup dialog. Also, you could try justifying the user's text. PDFLib offers functions for controlling character spacing, word spacing, and character widths, with accuracy of a thousandth of an em. (An em is a typesetter's unit, roughly equivalent to the point size of the type.) You can use the PDFLib routine PDF_stringwidth() to find out how wide a given text string is. As an exercise, you might try developing a justification routine that preferentially adjusts word spacing, followed by character spacing, followed by character width, each with its own weighting factor. (For some interesting algorithms here, seek out Don Lancaster's excellent article on "Picojustification" at http://www.tinaja.com/glib/picojust.pdf.)

A PDF-outputting BBEdit plug-in that incorporates some of these features (and others, such as rudimentary HTML tag interpretation) can be found at http://www.acroforms.com.

Conclusion

Thomas Merz's PDFLib library offers an excellent way to get started in PDF programming, combining ease of use with cross-platform and even cross-language portability. It's the only PDF library that can easily be adapted for use with Perl, Python, Tcl, Visual BASIC, and Java, as well as C/C++. It comes with outstanding documentation, plenty of sample code (for all language bindings), and the price - for non-commercial users - can't be beat, since it's free.

Look at it this way: Now you don't have any excuse for not putting PDF support in your applications!


Kas Thomas is a frequent contributor to MacTech and author of a forthcoming O'Reilly book on PDF-based web programming. You can reach him at kt@acroforms.com.

 
AAPL
$97.00
Apple Inc.
-0.19
MSFT
$44.72
Microsoft Corpora
-0.16
GOOG
$594.93
Google Inc.
-1.05

MacTech Search:
Community Search:

Software Updates via MacUpdate

Airfoil 4.8.7 - Send audio from any app...
Airfoil allows you to send any audio to AirPort Express units, Apple TVs, and even other Macs and PCs, all in sync! It's your audio - everywhere. With Airfoil you can take audio from any... Read more
Microsoft Remote Desktop 8.0.8 - Connect...
With Microsoft Remote Desktop, you can connect to a remote PC and your work resources from almost anywhere. Experience the power of Windows with RemoteFX in a Remote Desktop client designed to help... Read more
xACT 2.30 - Audio compression toolkit. (...
xACT stands for X Aaudio Compression Toolkit, an application that encodes and decodes FLAC, SHN, Monkey’s Audio, TTA, Wavpack, and Apple Lossless files. It also can encode these formats to MP3, AAC... Read more
Firefox 31.0 - Fast, safe Web browser. (...
Firefox for Mac offers a fast, safe Web browsing experience. Browse quickly, securely, and effortlessly. With its industry-leading features, Firefox is the choice of Web development professionals... Read more
Little Snitch 3.3.3 - Alerts you to outg...
Little Snitch gives you control over your private outgoing data. Track background activityAs soon as your computer connects to the Internet, applications often have permission to send any... Read more
Thunderbird 31.0 - Email client from Moz...
As of July 2012, Thunderbird has transitioned to a new governance model, with new features being developed by the broader free software and open source community, and security fixes and improvements... Read more
Together 3.2 - Store and organize all of...
Together helps you organize your Mac, giving you the ability to store, edit and preview your files in a single clean, uncluttered interface. Smart storage. With simple drag-and-drop functionality,... Read more
Cyberduck 4.5 - FTP and SFTP browser. (F...
Cyberduck is a robust FTP/FTP-TLS/SFTP browser for the Mac whose lack of visual clutter and cleverly intuitive features make it easy to use. Support for external editors and system technologies such... Read more
iExplorer 3.4 - View and transfer all th...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
Airmail 1.4 - Powerful, minimal email cl...
Airmail is a powerful, minimal mail client.It was designed to retain the same experience with a single or multiple accounts and provide a quick, modern and easy-to-use user experience. Airmail... Read more

Latest Forum Discussions

See All

The Order of Souls Review
The Order of Souls Review By Campbell Bird on July 24th, 2014 Our Rating: :: STORY GRINDUniversal App - Designed for iPhone and iPad The Order of Souls is a free-to-play, turn-based RPG with a genre-mixing art style, interesting... | Read more »
Revolution 60 Review
Revolution 60 Review By Jordan Minor on July 24th, 2014 Our Rating: :: LASS EFFECTUniversal App - Designed for iPhone and iPad Revolution 60 is a bold, cinematic action game with ambition to spare.   | Read more »
Matter (Photography)
Matter 1.0.1 Device: iOS Universal Category: Photography Price: $1.99, Version: 1.0.1 (iTunes) Description: Add stunning 3D effects to your photos with real-time shadows and reflections. Export your creations as photos or video loops... | Read more »
Fanatic Earth Review
Fanatic Earth Review By Brittany Vincent on July 24th, 2014 Our Rating: :: BY-THE-NUMBERSUniversal App - Designed for iPhone and iPad Kemco’s stable of mobile RPGs grows, but in Fanatic Earth’s situation it’s a case of quantity... | Read more »
Together for iOS (Productivity)
Together for iOS 1.0 Device: iOS Universal Category: Productivity Price: $9.99, Version: 1.0 (iTunes) Description: Together is an app for keeping things in one place. Notes, documents, images, movies, sounds, web pages and bookmarks... | Read more »
The Phantom PI Mission Apparition (Game...
The Phantom PI Mission Apparition 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: ** Release sale! 50% off for a limited time! ** The Phantom PI Mission Apparition is a spooky, puzzly, rock’... | Read more »
The Great Prank War (Games)
The Great Prank War 1.0.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.0 (iTunes) Description: Help Mordecai, Rigby, Muscle Man and Skips take the park back from Gene and his goons with a plethora of prank-related... | Read more »
Teenage Mutant Ninja Turtles (Games)
Teenage Mutant Ninja Turtles 1.0.0 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0.0 (iTunes) Description: Download the all new Teenage Mutant Ninja Turtles Official Movie Game! | Read more »
Dream Revenant (Games)
Dream Revenant 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: EXCLUSIVE LAUNCH PRICE ! Dream Revenant is at $1.99 for a limited time ! | Read more »
Traps n' Gemstones (Games)
Traps n' Gemstones 1.00 Device: iOS Universal Category: Games Price: $2.99, Version: 1.00 (iTunes) Description: LAUNCH SALE! 40% off, JULY ONLY! TRAPS N' GEMSTONES is an adventurous platform game, among gamers typically known as the... | Read more »

Price Scanner via MacPrices.net

What Should Apple’s Next MacBook Priority Be;...
Stabley Times’ Phil Moore says that after expanding its iMac lineup with a new low end model, Apple’s next Mac hardware decision will be how it wants to approach expanding its MacBook lineup as well... Read more
ArtRage For iPhone Painting App Free During C...
ArtRage for iPhone is currently being offered for free (regularly $1.99) during Comic-Con San Diego #SDCC, July 24-27, in celebration of the upcoming ArtRage 4.5 and other 64-bit versions of the... Read more
With The Apple/IBM Alliance, Is The iPad Now...
Almost since the iPad was rolled out in 2010, and especially after Apple made a 128 GB storage configuration available in 2012, there’s been debate over whether the iPad is a serious tool for... Read more
MacBook Airs on sale starting at $799, free s...
B&H Photo has the new 2014 MacBook Airs on sale for up to $100 off MSRP for a limited time. Shipping is free, and B&H charges NY sales tax only. They also include free copies of Parallels... Read more
Apple 27″ Thunderbolt Display (refurbished) a...
The Apple Store has Apple Certified Refurbished 27″ Thunderbolt Displays available for $799 including free shipping. That’s $200 off the cost of new models. Read more
WaterField Designs Unveils Cycling Ride Pouch...
High end computer case and bag maker WaterField Designs of San Francisco now enters the cycling market with the introduction of the Cycling Ride Pouch – an upscale toolkit with a scratch-free iPhone... Read more
Kingston Digital Ships Large Capacity Near 1T...
Kingston Digital, Inc., the Flash memory affiliate of Kingston Technology Company, Inc.,has announced its latest addition to the SSDNow V300 series, the V310. The Kingston SSDNow V310 solid-state... Read more
Apple’s Fiscal Third Quarter Results; Record...
Apple has announced financial results for its fiscal 2014 third quarter ended June 28, 2014, racking up quarterly revenue of $37.4 billion and quarterly net profit of $7.7 billion, or $1.28 per... Read more
15-inch 2.0GHz MacBook Pro Retina on sale for...
B&H Photo has the 15″ 2.0GHz Retina MacBook Pro on sale for $1829 including free shipping plus NY sales tax only. Their price is $170 off MSRP. B&H will also include free copies of Parallels... Read more
Apple restocks refurbished Mac minis for up t...
The Apple Store has restocked Apple Certified Refurbished Mac minis for up to $150 off the cost of new models. Apple’s one-year warranty is included with each mini, and shipping is free: - 2.5GHz Mac... Read more

Jobs Board

Sr Software Lead Engineer, *Apple* Online S...
Sr Software Lead Engineer, Apple Online Store Publishing Systems Keywords: Company: Apple Job Code: E3PCAK8MgYYkw Location (City or ZIP): Santa Clara Status: Full Read more
Senior Interaction Designer, *Apple* Online...
**Job Summary** Apple is looking for a hands on Senior…will be a key player in designing for the Apple Online Store. The ideal designer will have a Read more
*Apple* Sales Chat Rep - Apple (United State...
…is looking for motivated, outgoing, and tech savvy individuals who want to offer Apple Customers an unparalleled customer experience over chat. At Apple , we believe Read more
Mac Expert - *Apple* Online Store Mexico -...
…MUST be fluent in English and Spanish to be considered for this position At Apple , we believe that hard work, a fun environment, creativity and innovation fuel the Read more
*Apple* Industrial Design CAD Sculptor - App...
**Job Summary** The Apple Industrial Design team is looking for a CAD sculptor/Digital 3D modeler to create high quality CAD models used in the industrial design process Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.