TweetFollow Us on Twitter

PDFLib

Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Programming Techniques

Lasso 3.5

by Kas Thomas

A great freeware library makes adding PDF support to an app easy

Adobe's Portable Document Format (PDF) has become a de facto standard for electronic document interchange, based on its ability to deliver graphically rich, structured content in a consistent manner across multiple operating environments. Almost every large web site offers at least some PDF-based content, making the Acrobat Reader one of the most popular downloads on the web. (Incredibly, Adobe claims to average some 100,000 downloads of the Reader from its web site per day.)

Because of its support for vector graphics, font embedding, hypertext links, and other advanced features, PDF is a powerful, far-reaching document standard. But that also means it's a relatively complex standard (for details, see the September 1999 MacTech) - and therefore far from trivial to support in an application.

From a programming standpoint, one can talk about two types of PDF support: support for PDF reading (import), and support for PDF writing (export). As with TIFF, QuickTime, and many other complex formats, it's much easier to provide write support than read support, because a comprehensive PDF-read capability means implementing the entire rather ponderous PDF specification (see http://partners.adobe.com/asn/developer/PDFS/TN/PDFSPEC.PDF), whereas a write-only facility may mean implementing only a tiny subset of the PDF spec - the subset of particular interest to your application. For example, if your application primarily outputs ASCII text, there is no need to implement graphics-embedding, halftoning, transfer functions, etc., in order to support PDF output.

Adding a well-defined PDF-output capability to an application can be surprisingly quick and easy, if you make full use of existing tools. For this article, I decided to add PDF export capability to BBEdit (the popular text editor), with the aid of a third-party freeware PDF library called PDFLib. Source code for the BBEdit plug-in accompanies this article. (The complete CW Pro 5 project, including PDFLib and its source files, can be found online at ftp://www.mactech.com.) But before we start talking code, let's take a moment to review the basics of the PDF format, then look at what kinds of development paths one might take to arrive at a PDF-export capability, and what sorts of tools are currently available to make the programmer's life easier.

PDF Fundamentals

Adobe's Portable Document Format is a kind of gigantic, special-purpose markup language, based largely on Postscript (the postfix-notation page description language) but lacking Postscript's control-flow constructs. PDF is a sort of "unrolled" version of Postscript, in which all graphics operations are inline (rather than relying on loops) and therefore speedy. Lookups and indexing operations are likewise fast because of PDF's extensive use of associative arrays (or "dictionaries," in Adobe parlance), organized into treelike structures in which all nodes have forward and/or back-pointers to other nodes; plus, every leaf (of every kind) has an entry in a giant 'xref' table, so that the offset of any object can be looked up instantly.

Pages are organized into sets of objects that describe a page's resources and content. The objects are human-readable ASCII and look like:

4 0 obj
<</Type /Page
/Parent 1 0 R
/Resources 8 0 R
/MediaBox [0 0 612 792]
/Contents [5 0 R ]
>>
endobj

In this case, the top line tells us we're dealing with Object No. 4, revision zero. The object is a dictionary object, as indicated by the double angle brackets, << and >>, enclosing the object. The first entry in the dictionary is a label telling the type of dictionary (in this case, a Page). The next label/value pair is a backpointer to the parent of this object, namely Object No. 1. (A reference ending in 'R', such as 1 0 R, is a pointer to an object.) The next entry tells where the page's resources can be found (namely, in Object No. 8.) The MediaBox entry gives the page's dimensions, in points (72 points to the inch); here, 612 by 792 means that we're dealing with a standard U.S. Letter-size page (8.5 by 11 inches). The final entry, in the above example, shows where the page's Contents (probably a stream object) can be found, namely in Object No. 5.

When text needs to be displayed on a page, it is packaged inside a stream object. The stream object will contain the actual ASCII or Unicode strings that need to be displayed, along with various Postscript-like operators, such as m for "moveto" and TL for "set leading," that control the stroking, filling, and positioning of the individual letters or glyphs.

When all of the objects in a PDF file have been written, a cross-reference ('xref') table must be inserted into the file. The entries in the 'xref' table must conform to a fixed format (see my article in MacTech for September 1999) and they must contain the exact byte offset from the start of the file to the object referenced by the entry in question. The integrity of the PDF document rests on the accuracy of the byte offsets stored in the 'xref' table. Since most of these offsets aren't known until the objects are written, the 'xref' table usually goes at the end of the file. (This isn't always the case, however. So-called "linearized" or "optimized" PDF files have an 'xref' table at the beginning of the file.)

Several things should be obvious by now. First, there is nothing freeform about a PDF file. Unlike HTML, a PDF file is highly structured, with many pointers between objects. Byte offsets matter a great deal and must be accounted for when the file is written. Secondly, PDF files are largely self-contained, bringing with them their own font resources and embedded graphics (rather than linking external resources). Thirdly, to write a PDF file means lots of string manipulations - something that, frankly, ANSI C is a little weak at (compared to, say, Perl). Beyond that, the PDF specification itself (currently contained in a 518-page, 160,000-word document) can be difficult to read and interpret. Supporting PDF export in an application written in C can be a bit tedious, to say the least.

Third-Party Libraries

It helps, in a situation like this, to be able to call on help from third parties, rather than reinvent the wheel yourself. Fortunately, some excellent tools are available to make your life easier. Among the general-purpose libraries are available for adding PDF handling capabilities to applications are:

  1. Adobe's PDF Library, also known as PDFL40; for use with Code Warrior on the Mac, Visual C++ 5.0 on Windows platforms, and gcc 2.8 on Sparc Solaris.
  2. The CLibPDF Library, by FastIO Systems (http://www.fastio.com); an ANSI C library, compilable on just about any platform.
  3. PDFLib, by Thomas Merz (http://www.pdflib.com); a C library, with bindings for C++, Java, Perl, Python, Tcl, and Visual BASIC.

If you're a Perl user, you'll want to check out PDF-on-the-Fly, a Perl library available from the University of Nottingham (http://www.ep.cs.nott.ac.uk/pdf-pl/download/manual.pdf), as well as txt2pdf, a library from Sanface Software (sanface@sanface.com).

Adobe's PDFL40 is without question the most powerful and robust library available, relying as it does on the Acrobat 4.0 codebase. With PDFL40, you can read, display, and write PDF from your own application. But unfortunately, PDFL40 isn't free - and even if you can afford the licensing fees, you may not be allowed to use the library. As stated in Adobe's literature, PDFL40 is selectively licensed to developers who are creating "products that are strategic to Adobe's marketing plans." In other words, Adobe will review your development plans carefully, and if they like what you're doing and if you agree to play by Adobe's rules, you may be allowed to pay to use the library.

Outside of Adobe, the two best-known C libraries for PDF support are CLibPDF, by FastIO Systems, and Thomas Merz's PDFLib. Both come with full source code and can be used without restriction (or virtually without restriction) by individual developers who are creating freeware or personal-use software. (Corporate users and commercial developers must take out a license, at significant cost.) The main restriction of these libraries is that they support PDF output only. They will not help you read PDF or display a PDF document on the screen. The same is true for the two Perl libraries: PDF-on-the-Fly and txt2pdf are basically write-only. If you need to put PDF up on the screen, you'll probably want to look into an open-source program called Ghostscript (http://www.cs.wisc.edu/~ghost/index.html), which started as a freeware PostScript interpreter, written in 1988 by L. Peter Deutsch, founder of Aladdin Systems. Starting with version 3.3, Ghostscript has been able to read and display PDF files in addition to PostScript documents. With version 4.0, Ghostscript added Postscript-to-PDF conversion (i.e., Distiller functionality). Because the code is generic C, Ghostscript has been successfully ported to most platforms, including Win32, OS/2, MacOS, Unix, Amiga, VAX, etc. (An excellent PDF-based manual for Ghostscript is available from Thomas Merz; see http://www.muc.de/~tm.)

CLibPDF and PDFLib are similar in their capabilities. Their differences are summed up in Table 1. Both are extremely easy to set up and use. Of the two, CLibPDF is the more advanced package in terms of the number of features and overall performance. CLibPDF has roughly 170 library routines to PDFLib's 88. Many of CLibPDF's routines provide advanced graphics capabilities involving setting up Cartesian coordinate axes (linear or logarithmic) and plotting data (including data stored in external files). CLibPDF was designed to make it easy for people who need to generate 2D plots to create attractive graphs on-the-fly in PDF, without passing the data through an intermediary application such as Matlab. In this, it excels.

PDFLib ClibPDF
Full source code available? Yes Yes
Documentation 64 pp. 75 pp.
API calls, total 88 170
Image formats supported Gif,Tiff.JPEG,CCITT JPEG
Font metrics formats AFM/PFA PFM/PFB
Thread safe? Yes Yes
Bindings for scripting languages? Yes No
Font embedding? Yes Yes
Font subsetting? No No
Compression? No Flate only
Text-justification option? No Yes
Vector graphics functions? Yes Yes
Custom graph plotting? No Yes
Annotations? Yes Yes
Bookmarks? Yes Yes
Hypertext links? Yes Yes
Form widgets? No No
Reenter pages after writing? No Yes

Table 1. Comparison of PDFLib and CLibPDF.

CLibPDF is also the clear winner in terms of benchmark scores. In a test (conducted by a corporate user) involving the construction of an intricate 156-page document filled with engineering information, CLibPDF produced a 257,027-byte PDF file in just 15 seconds. By comparison, Adobe's Distiller took over three minutes to produce a 197,548-byte file; Adobe's PDF Library took 54 seconds to create a 284,365-byte file; and PDFLib took 84 seconds to yield a 1,314,084-byte finished document. The filesize disparity is due to the fact that PDFLib uses no text compression, whereas the others do. (Adobe uses a combination of LZW and Flate compression. To avoid patent infringement issues, CLibPDF uses only Flate compression.)

Wherefore PDFLib?

Why would anybody use PDFLib? For one thing, it's the only library that comes with ready-made bindings for Perl, Python, Tcl, Java, and (on Win32) Visual BASIC. This is incredibly important if you're a web developer who needs to be able to serve dynamic PDF - PDF pages generated automatically, on the fly - for web clients. Dynamic PDF pages (via Perl, say) are easily possible using PDFLib. All you have to do is link the PDFLib shared library with the PerlStub file (which is part of the MacPerl distribution suite) and follow the calling conventions given in Thomas Merz's excellent documentation (which has example code listings for all the different bindings).

But what about the big file sizes? you ask. It's true that, as of yet, PDFLib does not have any compression support - for text. For imagery, PDFLib supports JPEG, GIF, TIFF and CCITT bitmaps, all of which are compressed. (Acrobat Reader handles the decompression automatically.) CLibPDF, on the other hand, only handles JPEG embedding, unless you pay the license fee ($1,000), in which case you can get TIFF support (among other features).

PDFLib's lack of text compression can result in big files if you're mainly outputting big gobs of text. But if you will be serving dynamic PDF web pages (or creating other fairly small text files), you won't suffer for not having compression, since small text streams often compress poorly - or even grow, rather than shrink - at pack-down time.

It turns out PDFLib is ideal for generating small to medium-sized text-based PDF documents, because - unlike Adobe's own products - PDFLib won't automatically embed fonts or font subsets for any of the standard 14 core Type 1 fonts that are included with Acrobat Reader (the Helvetica, Times, and Courier families, plus Zapf Dingbats and Symbol). This can be important, because although a small PDF file may or may not shrink significantly with compression turned on, it will definitely grow when fonts are embedded unnecessarily.

Another reason to use PDFLib is that it's nominally smaller and easier to learn than CLibPDF (although the latter is by no means hard to work with). And should you later need to port your code to a scripting language, you can reuse your code with very little work.

Adding PDF Export to BBEdit

BBEdit (by Bare Bones Software) is one of the most popular ASCII editors on the Mac. Features like regex-based (regular expression) search-and-replace, robust HTML tools, and neck-snapping performance have endeared BBEdit to thousands of loyal users. But when it comes to producing eyepleasing output, BBEdit isn't exactly a killer app. Wouldn't it be nice to be able to save BBEdit documents as PDF files now and then? PDF is easier to look at (and print out) than raw ASCII, any day.

It's not hard to add PDF export to BBEdit, because like so many software products these days, BBEdit supports a plug-in API that allows third-party programmers access to the main program's data. The BBEdit plug-in API is well documented and has hooks to many utility functions for retrieving the text from documents, manipulating user selections, etc. Space doesn't permit a full tutorial on writing BBEdit plug-ins here. However, we will have space to run through the 200 or so lines of C required for a short plug-in that lets the user save an open BBEdit document as a PDF file.

The Code

The BBEdit plug-in interface requires that we compile an old-fashioned Code Resource of type 'BBXT' and creator 'R*ch'. (The creator type can be anything you want, but if you stay with 'R*ch', your plug-in will have the icon associated with BBEdit extensions.) Note that the name of your 'BBXT' resource (not the filename of your plug-in) is the name that will appear in the BBEdit "Tools" menu at runtime.

The main() routine for our PDF-Output plug-in, shown in Listing 1, is typical of most BBEdit extensions. It shows that our resource is called with a pointer to a BBEdit structure called the ExternalCallbackBlock; a WindowPtr associated with the frontmost user window; a long int containing various flag values to convey information about the state in which BBEdit is in; and pointers to AppleEvents. All we do in main() is call EnterCodeResource(), check our flags (and the WindowPtr, for validity), then call bbxtGetWindowContents() - which retrieves a Handle to all the text in the frontmost (active) document - before handing the text off to our filtering routine. When we're done, we call ExitCodeResource() and that's all she wrote. Easy as pi.


Listing 1: main( )

main( )
pascal OSErr main(ExternalCallbackBlock *callbacks, 
			WindowPtr w, 
			long flags, 
			AppleEvent *event, AppleEvent *reply)
{
	OSErr	err = noErr;

	EnterCodeResource();

	{
		Handle text;
		WindowPtr newWindow;

		if (!w || (xfWindowOpen & flags == 0) 
			return err;

		text = bbxtGetWindowContents(callbacks,w);

		err = pdfTranslate(callbacks,text,w); // write pdf

	} 

	ExitCodeResource();

	return err;
}

We don't actually do anything with the AppleEvent pointers in this example. In a real plug-in, these pointers would be the mechanism by which your plug-in could be controlled through OS-level scripts. Most of the time, though, these pointers will be nil. In all versions of BBEdit Lite, for example, the pointers are always nil.

The real heavy lifting occurs in Listing 2, where our PDFLib routines get called. Before using any other PDFLib routines, we call PDF_new() to initialize the library. (This results in a number of large data structures being allocated and filled out for us, behind the scenes. The principal data structure is something called, appropriately, a PDF. A pointer to this data structure must be passed to every library routine so that PDFLib can keep track of the PDF document's state.) At the end of the routine, before exiting, we call PDF_close() to close the connection to the PDFLib library, freeing all resources that were allocated earlier.


Listing 2: pdfTranslate( )

pdfTranslate( )

OSErr pdfTranslate( ExternalCallbackBlock *callbacks, 
			Handle theText,
			WindowPtr w ) {


	PDF *p = nil;
	int font,j;
	long i,linecount,textLength;
	OSErr err = noErr;
	Boolean timeForNewPage = false; // sentinel
	char *input,
			filename[32],
			buf[TAB_VALUE *CHARS_WIDE];
	unsigned char *out = buf;
	char okLineEnders[] = "- ;:>";

	p = PDF_new();  
	if (p == nil) return -1;

	HLockHi(theText);

	FudgeName(callbacks,filename,w); // create outfile name

	// open the new PDF file 
	if (PDF_open_file(p,(char *)filename )==-1){
		fprintf(stderr,"Error:cannot open temp.pdf file.\n");
		exit(2);
	}

  // these lines are optional:
	PDF_set_info(p,"Creator","BBEdit PDF Exporter plug-in");
	PDF_set_info(p,"Author","Kas Thomas");
	PDF_set_info(p,"Title","Hello world!");

	PDF_begin_page(p,letter_width,letter_height); // start a page

	// find a base-14 font
	font = PDF_findfont(p,"Times-Roman","default",0);
		if (font ==-1){
		fprintf(stderr,"Couldn't set font!\n");
		HUnlock(theText);
		exit(3);
	}

	PDF_setfont(p,font,FONTSIZE); // set font & size
	PDF_set_leading(p, LEADING);  // set line spacing 

	PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);

	PDF_show(p," ");

	input = *(unsigned char **)theText;

	textLength = GetHandleSize(theText); // how long is our text?

   // for every character...
	for (i = 0,linecount = 1; i < textLength - 1 ; )
	{
	  // fetch the current line...
	 	for (j = 0, out = buf; 
				 j < CHARS_WIDE - 1 && i < textLength - 1;
				 j++) 
	 		{	 		
	 			*out++ = input[i++];

				if (input[i-1] == TAB) { // we must handle Tabs ourselves
					int k;

					for (k = 0; k < TAB_VALUE; k++)
				  	*out++ = SPACE;
				  }

				if (input[i-1] == CARRIAGE_RETURN)   // break on CR
					goto TerminateLine;
			}

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

		TerminateLine:

		*out = 0x00; // make it a C string

		PDF_continue_text(p,buf); // write to PDF file

		if (linecount++ % LINES_PER_PAGE == 0) { // end of page?
			PDF_end_page(p);
			timeForNewPage = true;
			}

		if (timeForNewPage && i < textLength - 1) { // more to do
			timeForNewPage = false;
			PDF_begin_page(p,letter_width,letter_height); // new page
			PDF_setfont(p,font,FONTSIZE);
			PDF_set_leading(p, LEADING);
			PDF_set_text_pos(p,TEXT_STARTX,TEXT_STARTY);
			PDF_show(p," ");
			}

	}   // for i

	PDF_end_page(p);	// close page
	PDF_close(p);		// close PDF obj

	HUnlock(theText);

	return err;
}

The PDFLib routine PDF_open_file() will create a new file for us (in the current directory) if we pass it a pointer to a PDF struct along with a pointer to a filename string. Note that the filename string must be a C string. We create the necessary string (consisting of the original file's name, plus the extension ".pdf") in a custom utility routine, FudgeName(). See Listing 3.

After creating our (empty) output file, we make three calls to PDF_set_info(), to set the file's Creator, Author, and Title. These strings will show up when the user does a Get Info on the PDF document while viewing it in Acrobat Reader. It is not strictly necessary to call PDF_set_info(), since PDF files are not required to have "Get Info" info; but PDFLib makes creating these tags easy. (Again, though, note the use of C strings rather than Pascal strings.)


Listing 3: FudgeName()

FudgeName( )
// Get the current BBEdit file's name, add ".pdf" to it, put it in 'str' as a C string.

void FudgeName(ExternalCallbackBlock *cb, 
					unsigned char *str, WindowPtr w) 
{
 	Str255 fName;
	short v;
	long d;
	long length;
	char ending[] = { '.','p','d','f', 0x00 };

	bbxtGetDocInfo(cb,w,fName,&v,&d); 
	length = *fName;	// Pascal string

   // now we create a C string:
	BlockMove(fName+1,str, length);
	BlockMove(ending,str+length,5);	
}

To begin a PDF page, we call - what else? - PDF_begin_page() with, in this case, the predefined values letter_width and letter_height, which correspond to the dimensions of a standard U.S. letter-sized page. (PDFLib also has predefined constants for A4, legal, and many other page sizes. Or you can use your own custom dimensions.)

Next, we come to one of the most important calls in this or any routine that uses the PDFLib library. Namely, we do:

 font = PDF_findfont(p,"Times-Roman","default",0);

The purpose of this call is to locate font resources for our document and specify an encoding for the font. (Here, we're guaranteed to get a valid return value, since Times-Roman is one of the base-14 fonts that Acrobat Reader can always use.) Allowable encoding values are built-in, pdfdoc, macroman, macexpert, winansi or default (see Section 3.4.2 of Thomas Merz's excellent PDFLib manual). In our case, we're content to let PDFLib determine the most suitable encoding based on the environment, so we indicate this by setting the third argument to "default." (The encoding must be specified as a C string.)

The return value from PDF_findfont(), if not equal to -1 (an error), will be needed in subsequent calls involving typesetting parameters, such as PDF_setfont() and PDF_set_leading(). It's important to understand that the value returned by PDF_findfont() is not an enumerated value or an index into a fixed lookup table. Rather, it's an index into the font cache of one particular PDF document. If you're working with two documents, one may store Times-Roman in its cache at a different index than the other; hence, PDF_findfont() may return two different values for the same font, based on the font's use in two different files. Don't just assume that if PDF_findfont() returns '1' for Helvetica, that therefore Helvetica will always be referenced by a font value of '1'. It may only be '1' for one file, in one particular context.

Having gotten a valid return value from PDF_findfont(), we use that value in a call to PDF_setfont(), which attaches the font resource to the PDF file and also lets us specify the point size of the font. The point size can be any floating-point value: 24.0 for a small headline, say, or 10.0 to 12.0 for regular body copy, etc. (Fractional values like 13.4 are fine, too.) We can similarly set the line spacing with PDF_set_leading(). Typically, the leading is close in value to the point size of the text. If you specify the leading as 1.2 times the point size, you won't go far wrong. (For double-spaced text, try 3.0 or 4.0 times the point size.)

The library function PDF_set_text_pos() lets us position our "pen" or insertion point at any x-y position on the page. Here, you have to remember that in the PDF coordinate space, (0,0) corresponds to the lower left corner of the page, with 'y' increasing in the up direction. Also, recall that in the PDF world, the default unit of space is the typesetter's point, which is 1/72-inch. Thus, if you want to begin writing at a distance of one inch from the left edge of the page and ten inches up from the bottom, you would specify coordinates of (72, 720).

To write text on a PDF page, you can either make repeated calls to PDF_set_text_pos() and PDF_show(), specifying new line-start coordinates every time, or else make one call to PDF_show(), followed by repeated calls to PDF_continue_text(). The latter function automatically repositions the insertion point to the start of a new line, using the left-margin and leading parameters that you've already specified. This can be more convenient than keeping track of line depths yourself. To keep our main loop from having to be a do-while loop, we make a dummy call to PDF_show() with a value of "  " before entering the loop. Then, inside the loop, we just make repeated calls to PDF_continue_text().

The Main Loop

Our main loop, which is actually a double nested loop, deserves comment. The outer loop counts individual characters and makes sure that we loop over all the characters in the source document, stopping only when we've gotten to the end of the file. The inner loop fetches one line of text at a time, writing to a line buffer, 'buf', which is conservatively sized at a fixed size of TAB_VALUE * CHARS_WIDE. In a real application, you'd determine CHARS_WIDE dynamically, based (perhaps) on the point size of the text or some other metric. For this short demo, we've hard-coded the type size at 9.0 points and the line width, CHARS_WIDE, at 80 via #defines. The reason our line buffer has to be sized at TAB_VALUE * CHARS_WIDE is that it's conceivable that we could encounter a pathological line of "text" where every character is a Tab. If a Tab is equal to five spaces, our line buffer had better be 400 bytes in capacity rather than just 80, or else we'll overwrite the buffer.

Inside the inner loop, as we gather characters into a "line" of text, we have to handle Tabs ourselves, converting ASCII 0x09 (the Tab character - which is a non-printing ASCII value) to spaces. We also check for end-of-line characters ourselves. In true Mac-centric manner, we ignore linefeeds and consider every newline to be equal to ASCII 0x0D (carriage return). Of course, text files created on a Unix machine won't conform to this assumption, since in the Unix world newlines tend to be ASCII 0x0A (linefeed). In the DOS and Windows worlds, lines end with both a linefeed and a carriage return: 0x0D0A.

Our inner loop is constructed in such a way that when the number of characters read equals CHARS_WIDE, we bail out and write the line to the PDF file, but in addition, we bail out any time a hard return (carriage return) is encountered. This lets us handle both traditional Mac text files (in which lines are soft-wrapped to the screen, with carriage returns coming only once per paragraph) as well as DOS-style documents in which every single line (not just the paragraph) ends with a hard return.

The fact that there are two ways to fall out of the inner loop has interesting consequences. Obviously, if we encounter a hard return, there's no question about what to do: we immediately write the line out to the file. But if we fall out of the main loop because our line has begun to exceed CHARS_WIDE characters, it's possible (likely, in fact) that we've bailed out in the middle of a word! Hence, we have to insert some contingency code to read to the end of the current word. The code that does this looks like:

		// get to next word ending
		while (strchr(okLineEnders,input[i])==NULL)
			*out++ = input[i++];

The standard C function strchr() checks to see if the second argument (a character) occurs anywhere in the first argument (a string). It returns NULL on a miss and non-NULL on a match.

If we fall out of the loop because of a hard return, we don't need the above code. Therefore we can skip around it with (ugh) a goto. There are probably better ways (stylistically) to handle this situation, but in the interest of clarity, I decided to keep the goto, for now.

Once we're out of the loop, we have to remember to make our line a C string (i.e., we must null-terminate it); then we can call PDF_continue_text(p,buf) to write the line. All that remains is to check the number of lines written, to see if it's time for a new page, and if so, start a new page. Here, it's important to note that every call to PDF_begin_page() results in PDFLib resetting its graphic state, which means we need to specify our type size, leading, and cursor-position values all over again. If you forget to do this, you'll be wondering where all the text went on the second and subsequent pages of your PDF document.

When we're done, we call PDF_close(), unlock our text handle, and return to the calling routine. Using PDF_close() actually not only frees up our library-invoked resources but also closes any working files we've left open. So at this point, we can consider our work done, and control can return to the host process, in this case BBEdit.

Enhancements

In a real-world BBEdit extension, it would be a good idea not only to get serious about error-checking but also consider such things as a user preferences dialog and support for Apple Events (which should include a mechanism for suppressing dialogs, so that scripted operations aren't hung up in midstream by unattended dialogs). Also, the main loop should be wrapped with the BBEdit API's bbxtStartProgress() and bbxtDoneProgress() calls, and the inner loop should contain one call to bbxtDoProgress() for every line of text processed, so that the user knows how things are progressing. BBEdit will display a progress thermometer automatically, suppressing it for short-duration events, if you use these calls.

BBEdit's plug-in API also has some handy convenience routines for dealing with Apple Events. For example, consider what you can do with the following three lines:

	bbxtFindApplication(cb,'CARO', &appFSS);
	err = bbxtLaunchApplication(cb,'CARO',&appFSS,&psn);
	bbxtSendOpenDoc(cb,'CARO', nil, &fss,true);

With the arguments shown, the first call has the effect of searching the BBEdit default disk for the application whose signature is 'CARO' - namely, Acrobat Reader. The second function launches that application, and the third function sends it an 'odoc' event, instructing the app to open the document specified by the FSSpec pointed to by &fss. In other words, with three lines of code you can make BBEdit launch Acrobat Reader and display your just-created PDF file in a Reader window. To accomplish this with our own custom-written code (properly error-checked) would require at least 200 lines of additional code, doubling the size of our plug-in!

In terms of the PDF-writing portions of the code, there are many possible further enhancements. For example, it would be nice to let the user specify page margins, text size, leading, etc. by means of a setup dialog. Also, you could try justifying the user's text. PDFLib offers functions for controlling character spacing, word spacing, and character widths, with accuracy of a thousandth of an em. (An em is a typesetter's unit, roughly equivalent to the point size of the type.) You can use the PDFLib routine PDF_stringwidth() to find out how wide a given text string is. As an exercise, you might try developing a justification routine that preferentially adjusts word spacing, followed by character spacing, followed by character width, each with its own weighting factor. (For some interesting algorithms here, seek out Don Lancaster's excellent article on "Picojustification" at http://www.tinaja.com/glib/picojust.pdf.)

A PDF-outputting BBEdit plug-in that incorporates some of these features (and others, such as rudimentary HTML tag interpretation) can be found at http://www.acroforms.com.

Conclusion

Thomas Merz's PDFLib library offers an excellent way to get started in PDF programming, combining ease of use with cross-platform and even cross-language portability. It's the only PDF library that can easily be adapted for use with Perl, Python, Tcl, Visual BASIC, and Java, as well as C/C++. It comes with outstanding documentation, plenty of sample code (for all language bindings), and the price - for non-commercial users - can't be beat, since it's free.

Look at it this way: Now you don't have any excuse for not putting PDF support in your applications!


Kas Thomas is a frequent contributor to MacTech and author of a forthcoming O'Reilly book on PDF-based web programming. You can reach him at kt@acroforms.com.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

iWatermark Pro 2.0.0fc4 - Easily add wat...
iWatermark Pro is the essential watermarking app for professional, business, and personal use. Easily secure and protect your photos with text, a graphic, a signature, or a QR watermark. Once added... Read more
Amadeus Pro 2.4 - Multitrack sound recor...
Amadeus Pro lets you use your Mac for any audio-related task, such as live audio recording, digitizing tapes and records, converting between a variety of sound formats, etc. Thanks to its outstanding... Read more
iFFmpeg 6.4.2 - Convert multimedia files...
iFFmpeg is a comprehensive media tool to convert movie, audio and media files between formats. The FFmpeg command line instructions can be very hard to master/understand, so iFFmpeg does all the hard... Read more
EtreCheck 3.4.2 - For troubleshooting yo...
EtreCheck is an app that displays the important details of your system configuration and allow you to copy that information to the Clipboard. It is meant to be used with Apple Support Communities to... Read more
Carbon Copy Cloner 4.1.17 - Easy-to-use...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more
VueScan 9.5.81 - Scanner software with a...
VueScan is a scanning program that works with most high-quality flatbed and film scanners to produce scans that have excellent color fidelity and color balance. VueScan is easy to use, and has... Read more
Hopper Disassembler 4.2.10- - Binary dis...
Hopper Disassembler is a binary disassembler, decompiler, and debugger for 32- and 64-bit executables. It will let you disassemble any binary you want, and provide you all the information about its... Read more
Viber 6.8.6 - Send messages and make cal...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
Viber 6.8.6 - Send messages and make cal...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
Carbon Copy Cloner 4.1.17 - Easy-to-use...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more

Latest Forum Discussions

See All

Eden: Renaissance (Games)
Eden: Renaissance 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Eden: Renaissance is a thrilling turn-based puzzle adventure set in a luxurious world, offering a deep and moving... | Read more »
Glyph Quest Chronicles guide - how to ma...
Glyph Quest returns with a new free-to-play game, Glyph Quest Chronicles. Chronicles offers up more of the light-hearted, good humored fantasy fun that previous games featured, but with a few more refined tricks up its sleeve. It's a clever mix of... | Read more »
Catch yourself a Lugia and Articuno in P...
Pokémon Go Fest may have been a bit of a disaster, with Niantic offering fans full refunds and $100 worth of in-game curency to apologize for the failed event, but that hasn't ruined trainers' chances of catching new legendary Pokémon. Lugia nad... | Read more »
The best deals on the App Store this wee...
There are quite a few truly superb games on sale on the App Store this week. If you haven't played some of these, many of which are true classics, now's the time to jump on the bandwagon. Here are the deals you need to know about. [Read more] | Read more »
Realpolitiks Mobile (Games)
Realpolitiks Mobile 1.0 Device: iOS Universal Category: Games Price: $5.99, Version: 1.0 (iTunes) Description: PLEASE NOTE: The game might not work properly on discontinued 1GB of RAM devices (iPhone 5s, iPhone 6, iPhone 6 Plus, iPad... | Read more »
Layton’s Mystery Journey (Games)
Layton’s Mystery Journey 1.0.0 Device: iOS Universal Category: Games Price: $15.99, Version: 1.0.0 (iTunes) Description: THE MUCH-LOVED LAYTON SERIES IS BACK WITH A 10TH ANNIVERSARY INSTALLMENT! Developed by LEVEL-5, LAYTON’S... | Read more »
Full Throttle Remastered (Games)
Full Throttle Remastered 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Originally released by LucasArts in 1995, Full Throttle is a classic graphic adventure game from industry legend Tim... | Read more »
Stunning shooter Morphite gets a new tra...
Morphite is officially landing on iOS in September. The game looks like the space shooter we've been needing on mobile, and we're going to see if it fits the bill quite shortly. The game's a collaborative effort between Blowfish Studios, We're Five... | Read more »
Layton's Mystery Journey arrives to...
As you might recall, Layton's Mystery Journey is headed to iOS and Android -- tomorrow! To celebrate the impending launch, Level-5's released a new trailer, complete with an adorable hamster. [Read more] | Read more »
Sidewords (Games)
Sidewords 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Grab a cup of coffee and relax with Sidewords. Sidewords is part logic puzzle, part word game, all original. No timers. No... | Read more »

Price Scanner via MacPrices.net

Apple Move Away from White Label Event Apps C...
DoubleDutch, Inc., a global provider of Live Engagement Marketing (LEM) solutions, has made a statement in the light of a game-changing announcement from Apple at this year’s WWDC conference.... Read more
70 Year Old Artist Creates Art Tools for the...
New Hampshire-based developer Pirate’s Moon has announced MyArtTools 1.1.3, the update to their precision drawing app, designed by artist Richard Hoeper exclusively for use with the 12.9-inch iPad... Read more
Sale! New 2017 13-inch 2.3GHz MacBook Pros fo...
Amazon has new 2017 13″ 2.3GHz/128GB MacBook Pros on sale today for $150 off MSRP including free shipping. Their prices are the lowest available for these models from any reseller: – 13″ 2.3GHz/128GB... Read more
13″ 2.3GHz/128GB Space Gray MacBook Pro on sa...
MacMall has the 13″ 2.3GHz/128GB Space Gray MacBook Pro (MPXQ2LL/A) on sale for $1219 including free shipping. Their price is $80 off MSRP. Read more
Clearance 2016 12-inch Retina MacBooks, Apple...
Apple recently dropped prices on Certified Refurbished 2016 12″ Retina MacBooks, with models now available starting at $1019. Apple will include a standard one-year warranty with each MacBook, and... Read more
Save or Share
FotoJet Designer, is a simple but powerful new graphic design apps available on both Mac and Windows. With FotoJet Designer’s 900+ templates, thousands of resources, and powerful editing tools you... Read more
Logo Maker Shop iOS App Lets Businesses Get C...
A newly released app is designed to help business owners to get creative with their branding by designing their own logos. With more than 1,000 editable templates, Logo Maker Shop 1.0 provides the... Read more
Sale! New 15-inch MacBook Pros for up to $150...
Amazon has the new 2017 15″ MacBook Pros on sale for up to $150 off MSRP including free shipping: – 15″ 2.8GHz MacBook Pro Space Gray: $2249 $150 off MSRP – 15″ 2.89Hz MacBook Pro Space Gray: $2779 $... Read more
DEVONthink To Go 2.1.7 For iOS Brings Usabili...
DEVONtechnologies has updated DEVONthink To Go, the iOS companion to DEVONthink for Mac, with enhancements and bug fixes. Version 2.1.7 adds an option to clear the Global Inbox and makes the grid... Read more
15-inch 2.2GHz Retina MacBook Pro, Apple refu...
Apple has Certified Refurbished 2015 15″ 2.2GHz Retina MacBook Pros available for $1699. That’s $300 off MSRP, and it’s the lowest price available for a 15″ MacBook Pro. An Apple one-year warranty is... Read more

Jobs Board

*Apple* Solutions Consultant (ASC) - Poole -...
Job Summary The people here at Apple don't just create products - they create the kind of wonder that's revolutionised entire industries. It's the diversity of those Read more
SW Engineer *Apple* TV - Apple Inc. (United...
Changing the world is all in a day's work at Apple . If you love innovation, here's your chance to make a career of it. You'll work hard. But the job comes with more Read more
Frameworks Engineering Manager, *Apple* Wat...
Frameworks Engineering Manager, Apple Watch Job Number: 41632321 Santa Clara Valley, California, United States Posted: Jun. 15, 2017 Weekly Hours: 40.00 Job Summary Read more
Product Manager - *Apple* Pay on the *Appl...
Job Summary Apple is looking for a talented product manager to drive the expansion of Apple Pay on the Apple Online Store. This position includes a unique Read more
*Apple* Retail - Multiple Positions - Apple...
SalesSpecialist - Retail Customer Service and SalesTransform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.