TweetFollow Us on Twitter

Aug 94 Challenge
Volume Number:10
Issue Number:8
Column Tag:Programmer’s Challenge

Programmer’s Challenge

By Mike Scanlin, Mountain View, CA

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.


When writing programmer utilities like disassemblers, disk editors and memory viewers it’s useful to have around a very fast “dump” routine that takes a bunch of bytes and displays them in hex and ascii. The MPW tool DumpFile encompasses most of the desired functionality. This month’s challenge is to write a fast version of some of the DumpFile functionality.

The prototype of the function you write is:

/* 1 */
unsigned short
DumpBytes(inputBytes, outputText,
 numInputBytes, maxOutputBytes,
 width, grouping)
unsigned short numInputBytes;
unsigned short maxOutputBytes;
unsigned short width;
unsigned short grouping;

inputBytes and outputText are the pointers to the input bytes (which you’re trying to display) and the output text (which is all printable ascii, ready to display). numInputBytes is the number of input bytes you have to work with (more than zero) and maxOutputBytes is the size of the buffer that outputText points to. The return value of the function is the actual number of output bytes created by DumpBytes and will always be less than or equal to maxOutputBytes (or zero if there’s output buffer overflow). Like the DumpFile tool, the width parameter is the number of input bytes to display on each output line (it will be from 1 to 64 with 16 being given more weight than the other values) and grouping is the number of output bytes to group together without intervening spaces (also from 1 to 64 with 1, 2 and 4 being given more wight than the other values). The width parameter will always be a multiple of the grouping parameter.

Here are a few examples (the comments describe the parameters but are not part of the actual output):

/* 2 */
/* width = 8, grouping = 1 */
 0: 23 09 53 74 61 72 74 75 #.Startu
 8: 70 20 2D 20 4D 50 57 20 p.-.MPW.
 10: 53 68 65 6C 6C 20 53 74 Shell.St

/* width = 8, grouping = 8 */
 0: 2309537461727475 #.Startu
 8: 70202D204D505720 p.-.MPW.
 10: 5368656C6C205374 Shell.St

/* width = 9, grouping = 3 */
 0: 230953 746172 747570 #.Startup
 9: 202D20 4D5057 205368 .-.MPW.Sh
 12: 656C6C 205374 617274 ell.Start

Non-printable characters should be represented by a dot ‘.’ in the ascii section of the output. You can print a space character as a space or a dot (your choice). When in doubt on how to handle a certain situation, check the MPW DumpFile tool and do what it does (or something very similar). As always, I’m available for questions in case something is not clear (see the e-mail addresses section).

You should be careful about parameters that will cause you to output more bytes than the maxOutputBytes will allow. If you run out of output buffer space then just fill it up as much as you can and return 0. I won’t be testing the output overflow cases much because the goal of this exercise it to have a very fast hex and ascii displayer. If someone were to actually use the code it is assumed that they would know the context and provide an output buffer that was always large enough (and assert that the return value was not zero).

Two Months Ago Winner

Congratulations to Bob Boonstra (Westford, MA) for reclaiming the title of the Programmer Challenge Champion this month. This month’s win brings his 1st place totals to four, which is more than anyone else. Like Bob, second place winner Allen Stenger (Gardena, CA) also based his solution on Fermat’s algorithm but ended up with an implementation that was not quite as fast as Bob’s. Third place winner Ernst Munter (Kanata, ON, Canada) chose a different route and first implemented his solution in 386 assembly (!) and then wrote some graphics routines to illustrate the behavior of his code in order to help him optimize further. But in the end he says he didn’t have enough time to do as much as he would have liked to his C version.

Here are the code sizes and times. The time1 numbers are the times to factor some 64 bit numbers while the time2 numbers are the times to factor some 32 bit numbers (where highHalf is zero), which was not given much weight when ranking (but it’s interesting to see how some people optimized for this case). Numbers in parens after a person’s name indicate how many times that person has finished in the top 5 places of all previous Programmer Challenges, not including this one:

Name time1 time2 code

Bob Boonstra (9) 5 7 820

Allen Stenger (6) 11 24 896

Ernst Munter 15 2 1190

John Raley 25 186 520

Liber, Anspach, Phillips 436 14 620

Clement Goebel (3) 1094 1 1026

Jim Lloyd (1) 3920 20 4279

Alex Novickis 18800 53 9542

Bob’s code is well commented so I won’t go over it here. Also, for a discussion of Fermat’s factoring algorithm you can check out The Art of Computer Programming, v.2, by Donald Knuth.

One thing that made this problem slightly harder than normal was that you had to work with 64 bit integers. Allen Stenger ended up creating his own set of double-long macros which I’ll give here because they might come in handy some day if you ever have to work with 64 bit integers:

/* 3 */
#define OVERFLOW(x) (0 != (0x80000000 & (x)))

#define DOCARRY(x) { x ## high++; x ## low &= 0x7FFFFFFF;}
#define DOBORROW(x) { x ## high--; x ## low &= 0x7FFFFFFF;}
#define GT_ZERO(x) ((x ## high >= 0) && (x ## low != 0)) 
#define EQ_ZERO(x) ((x ## high == 0) && (x ## low == 0)) 
#define LT_ZERO(x) ((x ## high < 0)) 

#define INCR(x,a) {if (OVERFLOW(x ## low += a)) DOCARRY(x);}
#define DECR(x,a) {if (OVERFLOW(x ## low -= a)) DOBORROW(x);}
#define PLUS_EQUALS(x, y) { \
 x ## high += y ## high;  \
 if (OVERFLOW(x ## low += y ## low))\

#define MINUS_EQUALS(x, y) { \
 x ## high -= y ## high;  \
 if (OVERFLOW(x ## low -= y ## low))\

Here’s Bob’s winning solution:

Solution strategy

Factoring is a field which has been the subject of a great deal of research because of the implications for cryptography, especially techniques that depend on the difficulty of factoring very large numbers. Therefore, it is possible that some of these algorithms could be applied to the challenge.

However, in the event that no mathematician specializing in the field chooses to enter the Challenge, this relatively simple solution takes advantage of some of the simplifying conditions in the problem statement:

1) the numbers are relatively small (64 bits, or ~<20 digits)

2) the prime factors are even smaller (32 bits, or ~<10 digits)

This solution depends on no precomputed information. It is based on Fermat's algorithm, described in Knuth Vol II, which is especially well suited to the problem because it is most efficient when the two p, [sorry, the rest of the sentence was missing - Ed stb]

Fermat's algorithm requires ~(p-1)sqrt(n) iterations, where n=u*v and u~=p*sqrt(n), v~=sqrt(n)/p. Other algorithms require half as many iterations, but require more calculation per iteration.

Fermat's algorithm works as follows:

1) Let n - u*v, u and v odd primes.

2) Set a = (u+v)/2 and b = (u-v)/2.

3) Then n = uv = a**2 - b**2

4) Initialize a = trunc(sqrt(n)), b=0, r=a**2-b**2-n

5) Iterate looking for r==0, with an inner loop that keeps a=(u+v)/2 constant and increases b=(u-v)/2 by 1 each iteration until r becomes negative. When this happens, the halfsum a is increased by 1, and the difference loop is repeated.

The algorithm in Knuth uses auxiliary variables x,y for efficiency, where x = 2*a+1 and y = 2*b+1

This works fine in most cases, but causes overflow of a longword when x,y are the full 32-bits in size. So we have augmented the algorithm to deal with this case.

This solution also uses an efficient integer sqrt algorithm due to Ron Hunsinger, and extends that algorithm to 64 bits.

/* 4 */
#pragma options(assign_registers,honor_register)

#define ulong unsigned long
#define ushort unsigned short

#define kLo16Bits 0xFFFF
#define kHiBit 0x80000000UL
#define kLo2Bits 3
#define kLo1Bit 1

Macros RightShift2 and RightShift1 shift a 64-bit value right by 2 and 
1 bits, respectively.
#define RightShift32By2(xL,xH)                            \
{                                                         \
  xL >>= 2;                                               \
  xL |= (xH & kLo2Bits)<<30;                              \
  xH >>= 2;                                               \

#define RightShift32By1(xL,xH)                            \
{                                                         \
  xL >>= 1;                                               \
  xL |= (xH & kLo1Bit)<<31;                               \
  xH >>= 1;                                               \

Macros Add32To64 (Sub32From64) add (subtract) a 32-bit value to (from) 
a 64-bit value.
#define Add32To64(rL,rH, a)                               \
  temp = rL;                                              \
  if ((rL += a) < temp) ++rH;

#define Add2NPlus1To64(lowR,highR,a)                      \
  Add32To64(lowR,highR,a);                                \
  Add32To64(lowR,highR,a);                                \

#define Sub32From64(rL,rH, s)                             \
  temp = rL;                                              \
  if ((rL -= s) > temp) --rH;

#define Sub2NPlus1From64(lowR,highR,s)                    \
  Sub32From64(lowR,highR,s);                              \
  Sub32From64(lowR,highR,s);                              \

//Macros Add64 (Sub64) add (subtract) two 64-bit values.
#define Add64(qL,qH, eL,eH)                               \
  Add32To64(qL,qH,eL);                                    \
  qH += eH;

#define Sub64(qL,qH, eL,eH)                               \
  Sub32From64(qL,qH, eL);                                 \
  qH -= eH;

Macro Square64 multiplies a 32-bit value by itself to produce the square 
as a 64-bit value.  For this solution, we only need to execute this macro 
expansion once.
#define Square64(a,rL,rH,temp)                            \
{                                                         \
  ulong lohi,lolo,hihi;                                   \
  ushort aHi,aLo;                                         \
  aHi = a>>16;                                            \
  aLo = a;                                                \
  rL = (lolo = (ulong)aLo*aLo)&kLo16Bits;                 \
  lohi = (ulong)aLo*aHi;                                  \
  temp = ((lohi&kLo16Bits)<<1) + (lolo>>16);              \
  rL |= temp<<16;                                         \
  temp>>=16;                                              \
  temp += ((hihi = (ulong)aHi*aHi)&kLo16Bits) +           \
                             (lohi>>(16-1));              \
  rH = temp&kLo16Bits;                                    \
  temp>>=16;                                              \
  temp += hihi>>16;                                       \
  rH |= temp<<16;                                         \

Macros LessEqualZero64 and EqualZero64 determine if 64-bit (signed) values 
are <= 0 or == 0, respectively.
#define LessEqualZero64(vL,vH)                            \
    ( (0>(long)vH) || ((0==vH) && (0==vL)) )

#define EqualZero64(vL,vH)                                \
     ((0==vL) && (0==vH))

//Macro LessEqual64 determines if one 64-bit quantity is less than or 
equal to another.
#define LessEqual64(uL,uH, vL,vH)                         \
    ( (uH< vH) || ((uH==vH) && (uL<=vL)))

//Function prototypes.
ulong sqrt64 (ulong nLo,ulong nHi);
void Factor64(ulong lowHalf,ulong highHalf,
              ulong *prime1Ptr,ulong *prime2Ptr);
The solution ...
void Factor64(lowHalf,highHalf,prime1Ptr,prime2Ptr)
unsigned long lowHalf,highHalf;
unsigned long *prime1Ptr,*prime2Ptr;
register ulong x,y,lowR,highR;
register ulong temp;
ulong sqrtN;

Fermat's algorithm (Knuth)

Assume n=u*v, u<v, n odd, u,v odd
Let a=(u+v)/2  b=(u-v)/2  n=a**2-b**2  0<=y<x<=n
Search for a,b that satisfy x**2-y**2-n==0

NOTE:  u,v given as being < 2**32 (fit in one word).  Therefore a,b also 
are < 2**32 (and fit in one word).

C1: Set x=2*floor(srt(n))+1,
     x corresponds to 2a+1, y to 2b+1, r to a**2-b**2-n
C2: if r<=0 goto C4
       (algorithm modified to keep r positive)
C3: r=r-y, y=y+2,
     goto C2
C4: if (r==0) terminate with n = p*q,
         p=(x-y)/2, q=(x+y-2)/2
C5: r=r+x, x=x+2,
     goto C3

This solution modifies the algorithm to:
(1) reorder arithmetic on r to keep it positive
(2) extend r to 64 bits when necessary
(3) handle the trivial case where one of the primes is 2.
//Handle the trivial case with an even prime factor.
  if (0 == (lowHalf&1)) {
    *prime1Ptr = 2;
    *prime2Ptr = (lowHalf>>1) | ((highHalf&kLo1Bit)<<31);
//Compute truncated square root of input 64-bit number.
  sqrtN = temp = sqrt64(lowHalf,highHalf);
//Initialize r to s*s - n, but calculate n-s*s to keep r positive, and 
fix later when it is time
//to add x to r by calculating r = x - (n-s*s).
  Sub64(lowHalf,highHalf, lowR,highR);
//Handle perfect square case.
  if ((0==highHalf) && (0==lowHalf)) {
    *prime1Ptr = *prime2Ptr = sqrtN;
  y = 1;
  highR = 0;
//Separate out the overflow case where x=2a+1 does not fit into a long 
  if ((temp=sqrtN) >= kHiBit-1) goto doLargeX;
//If sqrt(n) < 0x80000000, then 2*sqrt(n)+1 fits in one long word.  
//Also, n-trunc(sqrt(n))**2 < 2*trunc(sqrt(n)) also fits in a long word.
  x = 1+2*temp;
  lowR = x - lowHalf;
  x += 2;
  do {
    if (lowR<=y) break;
    lowR -= y;
    y += 2;
  } while (true); /* exit when r<=y */
  if (y==lowR) {
    *prime1Ptr = (x-y-2)>>1;
    *prime2Ptr = (x+y)>>1;
  lowR += (x-y);
  y += 2;
//Fall through to modified algorithm if x overflows a long word.
  if ((x += 2) < (ulong)0xFFFFFFFF-2) goto C2;
Adjust x and y to guarantee they will not overflow.  This requires some 
extra arithmetic to add 2*a+1 and subtract 2*b+1, but that is preferable 
to using two longs to represent each of x and y.
  goto C3L;

//x=2*a+1 no longer fits in 32 bits, so we sacrifice a little loop efficiency 
and let x=a. //Likewise, we let y=b instead of 2*b+1.
  lowR = x = temp;
  Sub64(lowR,highR, lowHalf,highHalf);
  do {
    if ( LessEqualZero64(lowR,highR) ) break;
  } while (true); /* exit when lowR,highR<=0 */
  if ( EqualZero64(lowR,highR)) {
    *prime1Ptr = x-y;
    *prime2Ptr = x+y;
  goto C3L;

//sqrt_max4pow is the largest power of 4 that can be represented in an 
unsigned long.
#define sqrt_max4pow (1UL << 30)
//undef sqrt_truncate if rounded sqrts are desired; for the factoring 
problem we want
//truncated sqrts.
#define sqrt_truncate

//sqrt64 is based on code posted by Ron Hunsinger to comp.sys.mac.programmer. 
//Modified to handle 64-bit values.
ulong sqrt64 (ulong lowN, ulong highN) {
Compute the integer square root of the integer argument n.  Method is 
to divide n by x computing the quotient x and remainder r.  Notice that 
the divisor x is changing as the quotient x changes.
 Instead of shifting the dividend/remainder left, we shift the quotient/divisor 
right.  The binary point starts at the extreme left, and shifts two bits 
at a time to the extreme right.
 The residue contains n-x**2.  Since (x + 1/2)**2 == x**2 + x + 1/4, 

n - (x + 1/2)**2 == (n - x**2) - (x + 1/4)
 Thus, we can increase x by 1/2 if we decrease (n-x**2) by (x+1/4)
  register ulong lowResidue,highResidue; /* n - x**2 */
  register ulong lowRoot,highRoot;       /* x + 1/4 */
  register ulong half;                   /* 1/2     */
  ulong highhalf,lowhalf,temp;

  lowResidue = lowN;
  if (0 != (highResidue = highN)) {
//This code extends the original algorithm from 32 bits to 64 bits. 
// It parallels the 32-bit code; see below for comments.
    highRoot = sqrt_max4pow; lowRoot = 0;
    while (highRoot>highResidue)
    Sub64(lowResidue,highResidue, lowRoot,highRoot);
//The binary point for half is now in the high order of two 32-bit words 

//representing the 64-bit value.
    lowhalf = lowRoot; highhalf = highRoot;
    Add64(lowRoot,highRoot, lowhalf,highhalf);
    if (0==highhalf) goto sqrt2;
    half = highhalf<<1;
    do {
      if (LessEqual64(lowRoot,highRoot,lowResidue,highResidue))
        highResidue -= highRoot;
        highRoot += half;
      if (0 == (half>>=2)) {
        half = sqrt_max4pow<<1;
        goto sqrt2a;
      highRoot -= half;
      highRoot >>= 1;
    } while (true);
//The binary point for half is now in the lower of the two 32-bit words 

//representing the 64-bit value.
    half = lowhalf<<1;
    do {
      if ((0==highResidue) && (0==highRoot)) goto sqrt3;
      if (LessEqual64(lowRoot,highRoot,lowResidue,
                            highResidue)) {
        Sub64(lowResidue,highResidue, lowRoot,highRoot);
      half >>= 2;
    } while (half);
  } else /* if (0 == highResidue) */ {
#ifndef sqrt_truncate
    if (lowResidue <= 12)
      return (0x03FFEA94 >> (lowResidue *= 2)) & 3;
    if (lowResidue <= 15)
      return (0xFFFEAA54 >> (lowResidue *= 2)) & 3;
    lowRoot = sqrt_max4pow;
    while (lowRoot>lowResidue) lowRoot>>=2;

//Decrease (n-x**2) by (0+1/4)
    lowResidue -= lowRoot;
//1/4, with binary point shifted right 2
    half = lowRoot >> 2;
//x=1.  (lowRoot is now (x=1)+1/4.)
    lowRoot += half;
//1/2, properly aligned
    half <<= 1;

//Normal loop (there is at least one iteration remaining)
    do {
      if (lowRoot <= lowResidue) {
// Whenever we can, decrease (n-x**2) by (x+1/4)
        lowResidue -= lowRoot;
        lowRoot += half;
//Shift binary point 2 places right
      half >>= 2;
//x{+1/2}+1/4 - 1/8 == x{+1/2}+1/8
      lowRoot -= half;
//2x{+1}+1/4, shifted right 2 places
      lowRoot >>= 1;
//When 1/2 == 0, bin point is at far right
    } while (half);
#ifndef sqrt_truncate
  if (lowRoot < lowResidue) ++lowRoot;

//Return value guaranteed to be correctly rounded (or truncated)
    return lowRoot;


Community Search:
MacTech Search:

Software Updates via MacUpdate

Sago Mini Superhero (Education)
Sago Mini Superhero 1.0 Device: iOS Universal Category: Education Price: $2.99, Version: 1.0 (iTunes) Description: KAPOW! Jack the rabbit bursts into the sky as the Sago Mini Superhero! Fly with Jack as he lifts impossible weights,... | Read more »
Star Wars: Galaxy of Heroes guide - How...
Star Wars: Galaxy of Heroes is all about collecting heroes, powering them up, and using them together to defeat your foes. It's pretty straightforward stuff for the most part, but increasing your characters' stats can be a bit confusing because it... | Read more »
The best cooking apps (just in time for...
It’s that time of year again, where you’ll be gathering around the dinner table with your family and a huge feast in front of you. [Read more] | Read more »
Square Rave guide - How to grab those te...
Square Rave is an awesome little music-oriented puzzle game that smacks of games like Lumines, but with its own unique sense of gameplay. To help wrap your head around the game, keep the following tips and tricks in mind. [Read more] | Read more »
Snowboard Party 2 (Games)
Snowboard Party 2 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Crowned the best snowboarding game available on the market, Snowboard Party is back to fulfill all your adrenaline needs in... | Read more »
One Button Travel (Games)
One Button Travel 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: “To cut a long story short, If you like interactive fiction, just go buy this one.” - “Oozes the polish that... | Read more »
Light Apprentice Volume 1 (Games)
Light Apprentice Volume 1 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Light Apprentice Volume 1 includes Chapters 1 to 4, all gathered in a new exclusive game. When life in the world of... | Read more »
The best games like Animal Crossing on m...
Animal Crossing amiibo Festival is out right now for the Wii U, reminding us of just how much fun that world can be. Or at least to go back and check in on our villages once in a while. [Read more] | Read more »
Between 2 Taps - Tap for Tap interview M...
Hello, and welcome back to Between 2 Taps, Tap for Tap’s Indie Dev interview series. [Read more] | Read more »
Facility 47 (Games)
Facility 47 1.0.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0.1 (iTunes) Description: You wake up alone and freezing in an icy cell. You try the cell door but it’s locked, it seems that you are stuck with no... | Read more »

Price Scanner via

Black Friday deals on the Apple Watch and App...
Apple resellers are offering discounts and bundles with the purchase of an Apple Watch this Black Friday weekend. Below is a roundup of the deals being offered by authorized Watch resellers: Apple... Read more
Early Black Friday sale at B&H Photo, up...
B&H Photo has all new Macs on sale for up to $500 off MSRP as part of their early Black Friday sale including free shipping plus NY sales tax only: - 15″ 2.2GHz Retina MacBook Pro: $1699 $300 off... Read more
NewerTech/OWC/MacSales Black Friday Deals 201... • Free Shipping available on nearly EVERYTHING on orders $35.00 & up within USA + • International Delivery Specials from $2.99+ Special Purolator... Read more
Walmart Black Friday deals: $100 off select i...
Walmart has released their Black Friday deals for 2015, now available online. Choose free shipping or free local store pickup (if available): - 16GB iPad Air 2: $399, $100 off MSRP - 16GB iPad Air: $... Read more
Photo Cleaner 1.0 Reclaims iPhone Storage Spa...
Seoul, Korea based mix1009 has announced the release and immediate availability of Photo Cleaner 1.0, their handy iPhone app that deletes the video portion of Live Photos, in order to reclaim space... Read more
Black Friday and Holiday sales on our price t...
Scan our Mac Price Trackers for the latest Black Friday and Holiday season information on sales, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. We update the... Read more
Best Buy Black Friday deals: Up to $200 off M...
Best Buy has posted their Black Friday sale prices for 2015. Save on MacBook Pros, MacBooks, MacBook Airs, iMacs, iPads, and Apple Watches. Choose free shipping or free local store pickup (if... Read more
Save $30-$40 on new Apple TVs after rebate
Adorama has new Apple TVs on sale for up to $40 off MSRP after mail-in rebate, good through December 15th. Shipping is free, and Adorama charges NY & NJ sales tax only: - 32GB Apple TV: $119.99... Read more
13-Inch Haswell MacBook Air At Two Years – Th...
The 13-inch mid-2013 “Haswell” MacBook Air I ordered in Apple’s November 2013 Black Friday sale was my first new Mac in four and a half years — the longest interval I’ve gone between system upgrades... Read more
Target Black Friday Early Access deals: $100...
Target is offering early access to their Black Friday deals on Apple products on their online store for today, the 25th, only. Choose free shipping or free local store pickup (if available): - Apple... Read more

Jobs Board

*Apple* Enterprise / Government Professional...
# Apple Enterprise / Gove ment Professional Services Engineer Job Number: 42292976 Reston, Virginia, United States Posted: Aug. 18, 2015 Weekly Hours: 40.00 **Job Read more
iOS Wallet & *Apple* Pay Engineer - App...
# iOS Wallet & Apple Pay Engineer Job Number: 40586801 Santa Clara Valley, Califo ia, United States Posted: Nov. 16, 2015 Weekly Hours: 40.00 **Job Summary** The iOS Read more
Software Engineer, *Apple* Watch - Clock Fa...
# Software Engineer, Apple Watch - Clock Face Team Job Number: 44368761 Santa Clara Valley, Califo ia, United States Posted: Nov. 14, 2015 Weekly Hours: 40.00 **Job Read more
Administrative Assistant, *Apple* Online St...
# Administrative Assistant, Apple Online Store Job Number: 43992352 Santa Clara Valley, Califo ia, United States Posted: Nov. 9, 2015 Weekly Hours: 40.00 **Job Summary** Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.