Improve PDF Search by Making HTML Clones

Volume Number: 21 (2005)
Issue Number: 4
Column Tag: Programming

Improve PDF Search by Making HTML Clones

by Sid Steward

Help ensure that search engine users find your PDF content. Then, help them navigate it.

If a document isn't indexed by an internet search engine such as Google, then that document effectively does not exist to most web users. Even when a PDF document has been indexed, its long length poses further obstacles to the people you are trying to reach. For after finding your PDF via Google, the user must now run their query again using their PDF viewer. To accomplish this, they must first completely download the PDF to their computer. Ugh! All of this is like cruising at 65mph and then having to stop for a rush-hour traffic jam. With thousands of information sources clamoring for the reader's attention, it would be no surprise if they simply zipped around that PDF like a commuter in the car pool lane.

This article will explain how to create a convenient interface between your rich PDF documents and impatient search engine users. How? We will create an HTML clone of your PDF that is especially designed for search engines. Search engines can index your clone instead of your PDF. Search engine users can then drop into this HTML clone and navigate your document at that level until they decide to plunge into the original PDF (which is linked from every page).

To accomplish this, we'll wander over to the UNIX side of OS X. We'll use some command-line free software called pdftohtml to create the HTML pages. We'll then use sed to add navigation features to these HTML pages. Finally, we'll use bash to script the entire process. Both sed and bash come with OS X. We'll need to install pdftohtml. When our bash script is finished, we'll use DropScript to give it a simple, GUI interface.

I say "HTML clone" because we aren't talking about an "HTML conversion" or an "HTML edition." The HTML we'll create lacks the fidelity that those terms imply. Rather, the HTML we'll create is much as you would expect when you click on the "View as HTML" link you see on Google's PDF hits. Indeed, Google's use of these loose HTML knockoffs has adjusted users to the idea that lightweight, low-fidelity HTML can sometimes be more helpful than cumbersome, high-fidelity PDF. To see an online example of an HTML clone, visit http://www.AccessPDF.com/pdf_html_clone/.

"If Google already provides its own HTML clone of my PDF, why should I bother?" Here are a few reasons. First, Google's clone might not include your entire PDF. In a recent test of mine, Google converted only the first 62 pages of my 216-page PDF. Second, Google prefers indexing HTML over PDF. Where your online PDF might not get fully indexed (my same test PDF was only partially indexed), your HTML clone has a better chance. Third, our clone uses one HTML file per PDF page. So search engine results will link users directly to the relevant pages in your document. Fourth, you own the clone. You can configure it to include document images, your company logo, Google Ads ... anything. Finally, your online document will appear more attractive to search engine users as links to HTML pages instead of links to PDF. Sound good? Let's get started.

Install and Run PdfToHtml

Pdftohtml is a free-software command-line program that does a fair job of converting PDF into HTML. It is based on the Xpdf project (http://www.foolabs.com/xpdf/), which gives us other handy PDF tools such as pdftotext and pdfinfo. You could compile pdftohtml using DarwinPorts (http://darwinports.opendarwin.com), but you'll probably prefer to simply grab the installer I created.

Download my pdftohtml installer from http://www.AccessPDF.com/pdf_html_clone/. It should work on both Jaguar and Panther. The resulting binary appears in the /opt/local/bin directory. You can see a brief description of pdftohtml's options by opening a Terminal and running it with no input:

/opt/local/bin/pdftohtml

To run pdftohtml on a PDF, open the Terminal and locate a PDF document (e.g., cd ~/Desktop to get to your desktop). Create a test directory, copy your PDF into this test directory, and then invoke pdftohtml using its full pathname:

mkdir mydoc_html_test
cp mydoc.pdf mydoc_html_test
cd mydoc_html_test
/opt/local/bin/pdftohtml -c -i mydoc.pdf
cd -

This will create several HTML pages in the same directory as the PDF. To view the result, open mydoc.html in your browser. "Mac OS X Hacks" (O'Reilly) shows us how to open mydoc.html from the Terminal:

open mydoc_html_test/mydoc.html

When you are done, simply delete the test directory and its contents. (Warning: rm -r permanently deletes entire directories and all of their contents; it is an easy way to accidentally lose lots of data.):

rm -r mydoc_html_test

Pdftohtml creates one HTML page for every page in your PDF (e.g., mydoc-1.html, mydoc-2.html, etc.). It also creates a navigation page that simply links to every HTML page in the document (e.g., mydoc_ind.html). An HTML frameset (mydoc.html in our example) has one frame that shows the current page, and one frame that always shows the navigation page. If your PDF has bookmarks, they are used to create an outline (e.g., mydoc-outline.html). If your PDF has navigation links on page text, then those are converted into similar HTML links. That's a nice touch.

In our above example, you will find that images and other graphic elements were dropped; only the text appears. If you have GhostScript on your machine, you can omit the -i option and pdftohtml will convert PDF page graphics into background images for the HTML pages. This effect is nice, but it consumes considerable disk space.

The -c option tells pdftohtml to create "complex" output. This means it will position text on the HTML pages using CSS styles, so HTML page layout resembles the original PDF page layout. If your input PDF page has two columns or a table of data, the output HTML will preserve that formatting. For example, the HTML code for a row of tabular data might look like this:

<div style="position:absolute;top:181;left:363"><nobr><span class="ft1">2001</span></nobr></DIV>
<div style="position:absolute;top:181;left:463"><nobr><span class="ft1">2002</span></nobr></DIV>
<div style="position:absolute;top:181;left:562"><nobr><span class="ft1">2003</span></nobr></DIV>

It looks like a table without actually using HTML's table tags. This highlights an important difference between PDF and HTML. PDF was designed for high-fidelity presentation, not structure. Most PDFs have no concept of a paragraph or table. To the PDF, letters are simply marks on the page. HTML, however, was designed for structuring data; it knows all about paragraphs and tables. When you run pdftohtml, it tries to convert the PDF's alphabet soup into sensible paragraphs, but that is the most markup you will get. You won't get tagged HTML tables or lists from pdftohtml. Luckily for us, we don't need rigorous HTML tagging.

Sometimes, text that looks fine in the PDF comes out as gibberish (or not at all) in the HTML. You might be quick to blame pdftohtml, but it turns out that some PDFs do not know how their text is encoded (again, think marks on a page). To test, open one of these PDFs in Adobe Reader and try to copy/paste this trouble text. You'll probably get the same gibberish. If you can't extract the PDF's text, neither can Google nor any other search engine. When you simply must have a PDF's text, consider rasterizing the PDF's pages using GhostScript and then running the resulting TIFF through an OCR program.

Editing the HTML using Sed

Creating the HTML pages was pretty easy. Now we have a few things to change. First of all, let's eliminate the frames, because we don't want them confusing the search engine. That means we'll need to add navigation links to each HTML page. Next, we'll need to add links on each HTML page to the original PDF. Finally, looking at the HTML page output from pdftohtml version 0.36, I found that the opening DIV tag accidentally appears in the head section; we'll need to move that to just after the BODY tag.

When you must make these kinds of changes to several (or several hundred) text files, consider using sed, the Stream Editor. Sed is a handy command-line tool used on UNIX systems for decades. Like many command-line tools, its terse syntax appears cryptic. A handy reference (e.g., man sed, or http://www.hmug.org/man/1/sed.html or http://www.gnu.org/software/sed/manual/html_node/sed_toc.html) and some explanation makes it clearer. I'll offer a couple guide-posts that will make our final script easier to read.

Sed reads an input file line by line. Each line is copied into a "pattern space" where commands can operate on it. Commands are successively applied to this space. Depending on the commands, the data in this pattern space might be sent to standard output. From there the output can be piped into another program or redirected to a file. The original input file remains unchanged. Here is a basic example showing how you can create mydoc-new.html from mydoc-old.html by replacing all instances of 2004 with 2005:

sed 's/2004/2005/g' mydoc-old.html > mydoc-new.html

The script is: 's/2004/2005/g'. The command in the script is: s. The command's arguments are: /2004/2005/g.

You can combine successive commands using semicolons. For example, this sed script would replace 2004 with 2005 and then even with odd:

's/2004/2005/g; s/even/odd/g'

When you want sed to execute a command only on specific lines, then you can prefix the command with an address. For example, we'll address one of our commands with: /^<HEAD/,/^<BODY/ so that it will be applied only to the lines between the HEAD and BODY tags of our HTML file, inclusive.

You can nest addresses using braces ({}). For example, this sed script will execute the command only on lines between the HEAD and BODY tags (inclusive) that also have the string <DIV in them:

'/^<HEAD/,/^<BODY/{ /<DIV/{ h; d; }; }'

This last example is actually part of our sed script that moves that errant, opening DIV tag from the head section to the body section. The h command copies the line into the "hold." The d command "deletes" it from the output; it isn't output to standard output. Here is the entire sed script:

'/^<HEAD/,/^<BODY/{ /<DIV/{ h; d; }; }; /^<BODY/{ G; };'

The last bit, /^<BODY/{ G; }; is executed only on lines that begin with <BODY. The G command "gets" the data from the hold and pastes it after the addressed line--in this case, after the file's BODY tag. So, this is one way to copy and paste a line of text using sed. As you can see, this script relies on some of the input file's peculiarities. For example, the output from pdftohtml uses capital letters for its HEAD, BODY and DIV tags.

You can test this sed script on one of your output files from pdftohtml like so:

sed '/^<HEAD/,/^<BODY/{ /<DIV/{ h; d; }; }; /^<BODY/{ G; };' mydoc-1.html > mydoc-1.fixed.html

Now let's insert a navigation bar at the top of each page. I want each page to link to the previous and next HTML pages, and to the first and final HTML pages. We also want it to link to the original PDF document. To accomplish this, we will create a file called navbar.html that uses placeholders for page numbers (e.g., _FIRST_, _PREV_, _NEXT_) and other things we'll need. We'll insert navbar.html just after the BODY tag using sed's r command:

'^<BODY/ r navbar.html'

Then we'll make another pass through sed to replace the placeholders with real values. This script would simply involve several substitutions. For example, on page 5 (mydoc-5.html) we would run:

's/_FIRST_/1/g; s/_PREV_/4/g; s/_NEXT_/6/g; s/_FINAL_/216/g'

Of course, this sed script would be unique for each HTML page. How will we do that? It is time we begin thinking about how we are going to apply these sed scripts to our folder of HTML pages. Our larger script will also be responsible for tailoring this final sed script to each page. Many scripting languages are available, and I suspect you have your favorite. In the spirit of sed, I am going to use another UNIX tool: bash.

Scripting with Bash

Bash is the Bourne-Again Shell. It descends from--and effectively succeeds--the original Bourne shell. A "shell" is program that runs other programs. When you open the Terminal, you enter an interactive shell. There are many shells available. By default, the Terminal uses a shell called tcsh. You can enter an interactive bash session from the Terminal by simply running bash.

A shell script is a text file that directs a shell to run commands. Bash also supports variables (e.g., base_name=mydoc), conditionals (if ... then), looping (while ...), and much more. Bash is quite featureful. Visit the bash online reference, http://www.gnu.org/software/bash/manual/bash.html, to learn more. I hope that the bash scripts I describe here will give you an idea of how to use it for basic tasks. Before we get into that, let's talk about how you actually create a bash script.

Creating Bash Scripts

You can download our scripts from http://www.AccessPDF.com/pdf_html_clone/, or you can create them using your own plain text editor. You can use TextEdit for writing a plain text script, but you must select Format > Make Plain Text before saving it. When you save your script, add the extension .sh to the filename. When TextEdit asks whether to append the .txt extension, click "Don't append."

Bash scripts begin with this first line:

#!/bin/sh

The hash mark makes the line a comment, but by convention this is a special comment. It tells the calling program to use the program /bin/sh to run this script. This is bash's full filename. Bash goes by either /bin/bash or /bin/sh. If you were creating a Python script, the first line would be: #!/usr/bin/python.

There's one more thing you must do. You must make shell scripts "executable." You can do this from the Terminal like so:

chmod 744 pdf_html_clone.sh

Otherwise the system thinks it is just another text file.

You can learn about chmod (or bash or sed) by running man chmod or man bash or man sed from the Terminal. "Man" is short for manual. It is a handy documentation tool inherited from UNIX.

Scripting for Convenience

Our bash script will loop over all of these HTML pages so we can automatically edit them using sed. Our bash script will also perform tasks to simply make our work easier. Let's begin with something simple. Earlier, we created a subdirectory to hold all of the HTML files created by pdftohtml. Our bash script can do this for us, instead. We will call our script pdf_html_clone.sh, and it will take one PDF filename as an argument. Bash gives us access to a script's first argument with the variable $1.

When testing scripts, it is best to use disposable copies of files and directories for input. One script error could result in your input disappearing forever. In fact, this happened to me as I developed this script. So, be careful.

#!/bin/sh
# pdf_html_clone.sh
# pass your PDF as the first argument, e.g.:
#   pdf_html_clone.sh mydoc.pdf

# the input filename, $1, might be full or partial;
# get the partial name by dropping the path
pdf_fn=`basename $1`

# this is where we'll store output from pdftohtml
html_src_dir=$1-html_pages-src

# first, create $html_src_dir and then fill it with output
# from pdftohtml; if the subdirectory exists, do nothing
if [ ! -d $html_src_dir ]
then mkdir $html_src_dir
   if [ ! $? -eq 0 ] # if return code was not zero...
   then exit # error creating target directory, so exit
   fi

   echo Calling pdftohtml to Create HTML Pages from PDF...

   # copy the PDF into our new directory and
   # call pdftohtml; remove PDF copy when done
   cp $1 $html_src_dir
   cd $html_src_dir
   /opt/local/bin/pdftohtml -c -i $pdf_fn
   rm $pdf_fn
   cd -
fi

If we frequently use pdftohtml, then just this little script could make our lives a lot easier. Instead of cluttering the PDF's folder with dozens of HTML files, they get neatly packed into a sub-folder. After making this script executable:

chmod 744 pdf_html_clone.sh

then we can call it from the Terminal like so:

./pdf_html_clone.sh mydoc.pdf

This creates a subdirectory named mydoc.pdf-html_pages_src and fills it with HTML. You must invoke this script using its full pathname, which is why it begins with ./ in this example. If you file it in your home folder, then you can refer to it as ~/pdf_html_clone.sh. If you create a sub-folder in your home named bin and store your scripts in there, then you can refer to it as ~/bin/pdf_html_clone.sh.

Maybe this UNIX stuff is getting a little thick. For a breather, let's flip back over to the Mac side to talk about making a simple GUI for our script. You'll be able to simply drag-n-drop a PDF onto your finished script GUI, and leave the Terminal alone.

Convert Our Script into a GUI

From a Terminal script into a simple GUI? Brilliant! This marvelous transformation is performed by DropScript, another goodie brought to my attention by Mac OS X Hacks (O'Reilly). Download DropScript from http://www.wsanchez.net/software/, open the disk image, and copy DropScript to a convenient folder, such as Applications. Next, drag-n-drop pdf_html_clone.sh onto DropScript. It will create an application in the same folder as the original script. In our case, the application is called Droppdf_html_clone.

Now select your test PDF in the Finder, and drag-n-drop it onto Droppdf_html_clone. A sub-folder will appear next to your input PDF, containing the output from pdftohtml. It's that simple.

DropScript lets you run shell scripts without having to fool with the Terminal.

More Bash Script Highlights

Before we roll out our entire bash script, let me address a few more things you might notice.

The back-tick character (`), not to be confused with an apostrophe ('), denotes inline execution of a command. For example, at one point we want to know the base name of the input PDF without its extension. The UNIX function basename is exactly what we need. So, we assign the output from running basename to the variable base_name, like so:

base_name=`basename $1 .pdf`

If $1 (the first argument passed to the script) is /usr/mydoc.pdf, then this line sets base_name to mydoc. Learn more about basename by running man basename. You can do this for other commands you see, too, such as: ls, wc, expr, printf.

After setting a variable, you can access its value by prepending a dollar sign. From the above example, access the value of base_name by using $base_name.

Apostrophes are used to quote literal text. Backslashes are used to break single statements over multiple lines and must not be omitted. Variables usually act like strings, so we use expr to perform arithmetic or logical operations on variables.

Pipes (|) pass the output from one command into the input of the next, e.g.: ls | wc -w. The greater-than symbol (>) sends the output to a file, which is named after the symbol, e.g.: ls > dir_listing.txt.

Finally, the Bash Script

Enough already! Here are the goods. The script is pretty basic. First, it creates a directory and fills it with the output from pdftohtml (as we covered, above). Next, it creates a directory for our final HTML clone. Finally, it iterates over the HTML pages in reading order, applying a few sed scripts to each one.

The first sed script repairs the HTML error left by pdftohtml (as we discussed, above). The second one inserts our navigation bar template. The third one replaces the placeholders in navbar.html with values suitable to the current page. For example, if it is processing page five, then _NEXT_ is 6.

When the script is done, the output directory will hold your HTML clone. For example, if you ran:

./pdf_html_clone.sh mydoc.pdf

then the output will be in the subdirectory named mydoc-html_pages. You can discard the mydoc-html_pages-src subdirectory when you're done.

Visit http://www.AccessPDF.com/pdf_html_clone/ to download this code.

pdf_html_clone.sh

#!/bin/sh
# pdf_html_clone.sh, version 1.0
# visit: http://www.AccessPDF.com/pdf_html_clone/
#
# pass in PDF a filename, e.g.:
#
#   pdf_html_clone.sh mydoc.pdf
#
# and this script will create an HTML clone in
# the subdirectory named mydoc-html_pages
#
# requires pdftohtml (http://pdftohtml.sf.net);
#

# this is where we'll store output from pdftohtml
html_src_dir=$1-html_pages-src

# run pdftohtml in $html_src_dir, if it doesn't already exist
if [ ! -d $html_src_dir ]
then mkdir $html_src_dir # create target subdirectory
   if [ ! $? -eq 0 ] # if return code was not zero
   then exit # error creating target subdirectory, so exit
   fi

   # the input filename, $1, might be full or partial;
   # make it partial
   pdf_fn=`basename $1`

   echo Calling pdftohtml to Create HTML Pages from PDF...
   # copy the PDF into our new directory and
   # call pdftohtml; remove PDF copy when done
   cp $1 $html_src_dir
   cd $html_src_dir
   /opt/local/bin/pdftohtml -c -i $pdf_fn
   rm $pdf_fn # remove our copy, not the original
   cd - # change to previous working directory
fi

# create a directory for our refined html edition
html_dest_dir=$1-html_pages
if [ ! -d $html_dest_dir ]
then mkdir $html_dest_dir # create target subdirectory
   if [ ! $? -eq 0 ] # if return code was not zero
   then exit # error creating target subdirectory, so exit
   fi
fi

base_name=`basename $1 .pdf`

# iterate over pdftohtml output pages;
# the naming convetion used by pdftohtml yields HTML pages
# that are not in the correct order when alphabetized,
# so use a c-styled loop
#
num_pages=`ls $html_src_dir/$base_name-[0-9]*.html | wc -l`;
num_pages=`expr $num_pages`; # do this to drop initial whitespace

echo Copying HTML Pages and Adding Navigation Bar...
for (( ii=1; ii<= num_pages; ii++ ))
do
echo Running sed scripts on page: $ii

# this is the sed script we'll use later to replace
# navbar.html placeholders with real values
#
# global substitutions
navbar_sub_script=\
's/_DOCBASE_/'$base_name'/g;'\
's/_FIRST_/1/g;'\
's/_FINAL_/'$num_pages'/g;'\
's/_THIS_/'$ii'/g;'\
's/_0THIS_/'`printf %03d $ii`'/g;'
#
# these depend on our current page
if [ $ii -eq 1 ] # first page
then navbar_sub_script=$navbar_sub_script\
's/_PREV_/'$ii'/g;'\
's/_NEXT_/'`expr $ii + 1`'/g;'
elif [ $ii -eq $num_pages ] # final page
then navbar_sub_script=$navbar_sub_script\
's/_PREV_/'`expr $ii - 1`'/g;'\
's/_NEXT_/'$ii'/g;'
else navbar_sub_script=$navbar_sub_script\
's/_PREV_/'`expr $ii - 1`'/g;'\
's/_NEXT_/'`expr $ii + 1`'/g;'
fi
 
# make three passes through sed:
#  1. fix pdftohtml output error,
#  2. insert our navbar template right after <BODY>,
#  3. perform variable substitution
#
# note that the r command doesn't
# read the file into the pattern space where we could 
# make substitutions; that's why we have pass #3
#
sed \
'/^<HEAD/,/^<BODY/{ /<DIV/{ h; d; }; }
/^<BODY/{ G; }' \
$html_src_dir/$base_name-$ii.html | \
sed \
'# insert our navbar
/^<BODY/a\
<center>\
<div style="font-family: sans-serif; font-size: 10px">\
These HTML pages loosely resemble\
the original PDF document.<br>\
Some of these HTML pages might not resemble\
the original PDF pages at all.<br>\
Please consult the original PDF document (link below)\
for an accurate version of this page.<br>\
<br>\
</div>\
<div style="font-family: sans-serif; font-size: 12px">\
HTML Document Navigation:<br>\
<a href="_DOCBASE_-_FIRST_.html">First Page: _FIRST_</a> ||\
<a href="_DOCBASE_-_PREV_.html">Prev Page: _PREV_</a> ||\
<b><< Page: _THIS_ >></b> ||\
<a href="_DOCBASE_-_NEXT_.html">Next Page: _NEXT_</a> ||\
<a href="_DOCBASE_-_FINAL_.html">Final Page:\
_FINAL_</a><br>\
<a href="../_DOCBASE_.pdf">Source PDF Document</a>\
</div>\
<hr>\
</center>' | \
sed \
$navbar_sub_script > $html_dest_dir/$base_name-$ii.html

done

I ended up simply packing the contents of navbar.html into the script code. This makes the script more portable, and you can easily convert the script into an application using DropScript

How Else Can I Improve Online PDF?

This article covers one way to improve your PDF's online experience. It addresses the problems of users who are looking for your information via Google or some other popular search engine. This HTML clone of your PDF interfaces well with Google's page-based design. And, it gives users an easy way to skim your document before deciding to download the PDF.

You can also use our HTML clone to improve the reading experience for users visiting your web site. For example, you could adapt my pdfportal PHP script so that it links to these HTML clone pages instead of the PDF's pages. Pdfportal gives users a nice jumping-off point into your PDF. It creates a hyperlinked table of contents from your PDF's bookmarks, and it sports a full text search feature. Visit http://www.pdfhacks.com/pdfportal/ to see an online demo and download PHP code.

Sid Steward is a longtime PDF service provider and software developer. He developed the free PDF Toolkit (http://www.AccessPDF.com/pdftk/) and wrote the book PDF Hacks (O'Reilly Media). You can reach him at sid@accesspdf.com.

Software Updates via MacUpdate

Latest Forum Discussions

The Legend of Heroes: Trails of Cold Ste...

I adore game series that have connecting lore and stories, which of course means the Legend of Heroes is very dear to me, Trails lore has been building for two decades. Excitedly, the next stage is upon us as Userjoy has announced the upcoming... | Read more »

Go from lowly lizard to wicked Wyvern in...

Do you like questing, and do you like dragons? If not then boy is this not the announcement for you, as Loongcheer Game has unveiled Quest Dragon: Idle Mobile Game. Yes, it is amazing Square Enix hasn’t sued them for copyright infringement, but... | Read more »

Aether Gazer unveils Chapter 16 of its m...

After a bit of maintenance, Aether Gazer has released Chapter 16 of its main storyline, titled Night Parade of the Beasts. This big update brings a new character, a special outfit, some special limited-time events, and, of course, an engaging... | Read more »

Challenge those pesky wyverns to a dance...

After recently having you do battle against your foes by wildly flailing Hello Kitty and friends at them, GungHo Online has whipped out another surprising collaboration for Puzzle & Dragons. It is now time to beat your opponents by cha-cha... | Read more »

Pack a magnifying glass and practice you...

Somehow it has already been a year since Torchlight: Infinite launched, and XD Games is celebrating by blending in what sounds like a truly fantastic new update. Fans of Cthulhu rejoice, as Whispering Mist brings some horror elements, and tests... | Read more »

Summon your guild and prepare for war in...

Netmarble is making some pretty big moves with their latest update for Seven Knights Idle Adventure, with a bunch of interesting additions. Two new heroes enter the battle, there are events and bosses abound, and perhaps most interesting, a huge... | Read more »

Make the passage of time your plaything...

While some of us are still waiting for a chance to get our hands on Ash Prime - yes, don’t remind me I could currently buy him this month I’m barely hanging on - Digital Extremes has announced its next anticipated Prime Form for Warframe. Starting... | Read more »

If you can find it and fit through the d...

The holy trinity of amazing company names have come together, to release their equally amazing and adorable mobile game, Hamster Inn. Published by HyperBeard Games, and co-developed by Mum Not Proud and Little Sasquatch Studios, it's time to... | Read more »

Amikin Survival opens for pre-orders on...

Join me on the wonderful trip down the inspiration rabbit hole; much as Palworld seemingly “borrowed” many aspects from the hit Pokemon franchise, it is time for the heavily armed animal survival to also spawn some illegitimate children as Helio... | Read more »

PUBG Mobile teams up with global phenome...

Since launching in 2019, SpyxFamily has exploded to damn near catastrophic popularity, so it was only a matter of time before a mobile game snapped up a collaboration. Enter PUBG Mobile. Until May 12th, players will be able to collect a host of... | Read more »

Price Scanner via MacPrices.net

Apple is offering significant discounts on 16...

Apple has a full line of 16″ M3 Pro and M3 Max MacBook Pros available, Certified Refurbished, starting at $2119 and ranging up to $600 off MSRP. Each model features a new outer case, shipping is free... Read more

Apple HomePods on sale for $30-$50 off MSRP t...

Best Buy is offering a $30-$50 discount on Apple HomePods this weekend on their online store. The HomePod mini is on sale for $69.99, $30 off MSRP, while Best Buy has the full-size HomePod on sale... Read more

Limited-time sale: 13-inch M3 MacBook Airs fo...

Amazon has the base 13″ M3 MacBook Air (8GB/256GB) in stock and on sale for a limited time for $989 shipped. That’s $110 off MSRP, and it’s the lowest price we’ve seen so far for an M3-powered... Read more

13-inch M2 MacBook Airs in stock today at App...

Apple has 13″ M2 MacBook Airs available for only $849 today in their Certified Refurbished store. These are the cheapest M2-powered MacBooks for sale at Apple. Apple’s one-year warranty is included,... Read more

New today at Apple: Series 9 Watches availabl...

Apple is now offering Certified Refurbished Apple Watch Series 9 models on their online store for up to $80 off MSRP, starting at $339. Each Watch includes Apple’s standard one-year warranty, a new... Read more

The latest Apple iPhone deals from wireless c...

We’ve updated our iPhone Price Tracker with the latest carrier deals on Apple’s iPhone 15 family of smartphones as well as previous models including the iPhone 14, 13, 12, 11, and SE. Use our price... Read more

Boost Mobile will sell you an iPhone 11 for $...

Boost Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering an iPhone 11 for $149.99 when purchased with their $40 Unlimited service plan (12GB of premium data). No trade-in is required... Read more

Free iPhone 15 plus Unlimited service for $60...

Boost Infinite, part of MVNO Boost Mobile using AT&T and T-Mobile’s networks, is offering a free 128GB iPhone 15 for $60 per month including their Unlimited service plan (30GB of premium data).... Read more

$300 off any new iPhone with service at Red P...

Red Pocket Mobile has new Apple iPhones on sale for $300 off MSRP when you switch and open up a new line of service. Red Pocket Mobile is a nationwide MVNO using all the major wireless carrier... Read more

Clearance 13-inch M1 MacBook Airs available a...

Apple has clearance 13″ M1 MacBook Airs, Certified Refurbished, available for $759 for 8-Core CPU/7-Core GPU/256GB models and $929 for 8-Core CPU/8-Core GPU/512GB models. Apple’s one-year warranty is... Read more

Jobs Board

DMR Technician - *Apple* /iOS Systems - Haml...

…relevant point-of-need technology self-help aids are available as appropriate. ** Apple Systems Administration** **:** Develops solutions for supporting, deploying, Read more

Operating Room Assistant - *Apple* Hill Sur...

Operating Room Assistant - Apple Hill Surgical Center - Day Location: WellSpan Health, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Read more

Solutions Engineer - *Apple* - SHI (United...

**Job Summary** An Apple Solution Engineer's primary role is tosupport SHI customers in their efforts to select, deploy, and manage Apple operating systems and Read more

DMR Technician - *Apple* /iOS Systems - Haml...

…relevant point-of-need technology self-help aids are available as appropriate. ** Apple Systems Administration** **:** Develops solutions for supporting, deploying, Read more

Omnichannel Associate - *Apple* Blossom Mal...

Omnichannel Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

Improve PDF Search by Making HTML Clones

Help ensure that search engine users find your PDF content. Then, help them navigate it.

Install and Run PdfToHtml

Editing the HTML using Sed

Scripting with Bash

More Bash Script Highlights

Finally, the Bash Script

How Else Can I Improve Online PDF?

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board