TweetFollow Us on Twitter

Aug 94 Challenge
Volume Number:10
Issue Number:8
Column Tag:Programmer’s Challenge

Programmer’s Challenge

By Mike Scanlin, Mountain View, CA

Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.


When writing programmer utilities like disassemblers, disk editors and memory viewers it’s useful to have around a very fast “dump” routine that takes a bunch of bytes and displays them in hex and ascii. The MPW tool DumpFile encompasses most of the desired functionality. This month’s challenge is to write a fast version of some of the DumpFile functionality.

The prototype of the function you write is:

/* 1 */
unsigned short
DumpBytes(inputBytes, outputText,
 numInputBytes, maxOutputBytes,
 width, grouping)
unsigned short numInputBytes;
unsigned short maxOutputBytes;
unsigned short width;
unsigned short grouping;

inputBytes and outputText are the pointers to the input bytes (which you’re trying to display) and the output text (which is all printable ascii, ready to display). numInputBytes is the number of input bytes you have to work with (more than zero) and maxOutputBytes is the size of the buffer that outputText points to. The return value of the function is the actual number of output bytes created by DumpBytes and will always be less than or equal to maxOutputBytes (or zero if there’s output buffer overflow). Like the DumpFile tool, the width parameter is the number of input bytes to display on each output line (it will be from 1 to 64 with 16 being given more weight than the other values) and grouping is the number of output bytes to group together without intervening spaces (also from 1 to 64 with 1, 2 and 4 being given more wight than the other values). The width parameter will always be a multiple of the grouping parameter.

Here are a few examples (the comments describe the parameters but are not part of the actual output):

/* 2 */
/* width = 8, grouping = 1 */
 0: 23 09 53 74 61 72 74 75 #.Startu
 8: 70 20 2D 20 4D 50 57 20 p.-.MPW.
 10: 53 68 65 6C 6C 20 53 74 Shell.St

/* width = 8, grouping = 8 */
 0: 2309537461727475 #.Startu
 8: 70202D204D505720 p.-.MPW.
 10: 5368656C6C205374 Shell.St

/* width = 9, grouping = 3 */
 0: 230953 746172 747570 #.Startup
 9: 202D20 4D5057 205368 .-.MPW.Sh
 12: 656C6C 205374 617274 ell.Start

Non-printable characters should be represented by a dot ‘.’ in the ascii section of the output. You can print a space character as a space or a dot (your choice). When in doubt on how to handle a certain situation, check the MPW DumpFile tool and do what it does (or something very similar). As always, I’m available for questions in case something is not clear (see the e-mail addresses section).

You should be careful about parameters that will cause you to output more bytes than the maxOutputBytes will allow. If you run out of output buffer space then just fill it up as much as you can and return 0. I won’t be testing the output overflow cases much because the goal of this exercise it to have a very fast hex and ascii displayer. If someone were to actually use the code it is assumed that they would know the context and provide an output buffer that was always large enough (and assert that the return value was not zero).

Two Months Ago Winner

Congratulations to Bob Boonstra (Westford, MA) for reclaiming the title of the Programmer Challenge Champion this month. This month’s win brings his 1st place totals to four, which is more than anyone else. Like Bob, second place winner Allen Stenger (Gardena, CA) also based his solution on Fermat’s algorithm but ended up with an implementation that was not quite as fast as Bob’s. Third place winner Ernst Munter (Kanata, ON, Canada) chose a different route and first implemented his solution in 386 assembly (!) and then wrote some graphics routines to illustrate the behavior of his code in order to help him optimize further. But in the end he says he didn’t have enough time to do as much as he would have liked to his C version.

Here are the code sizes and times. The time1 numbers are the times to factor some 64 bit numbers while the time2 numbers are the times to factor some 32 bit numbers (where highHalf is zero), which was not given much weight when ranking (but it’s interesting to see how some people optimized for this case). Numbers in parens after a person’s name indicate how many times that person has finished in the top 5 places of all previous Programmer Challenges, not including this one:

Name time1 time2 code

Bob Boonstra (9) 5 7 820

Allen Stenger (6) 11 24 896

Ernst Munter 15 2 1190

John Raley 25 186 520

Liber, Anspach, Phillips 436 14 620

Clement Goebel (3) 1094 1 1026

Jim Lloyd (1) 3920 20 4279

Alex Novickis 18800 53 9542

Bob’s code is well commented so I won’t go over it here. Also, for a discussion of Fermat’s factoring algorithm you can check out The Art of Computer Programming, v.2, by Donald Knuth.

One thing that made this problem slightly harder than normal was that you had to work with 64 bit integers. Allen Stenger ended up creating his own set of double-long macros which I’ll give here because they might come in handy some day if you ever have to work with 64 bit integers:

/* 3 */
#define OVERFLOW(x) (0 != (0x80000000 & (x)))

#define DOCARRY(x) { x ## high++; x ## low &= 0x7FFFFFFF;}
#define DOBORROW(x) { x ## high--; x ## low &= 0x7FFFFFFF;}
#define GT_ZERO(x) ((x ## high >= 0) && (x ## low != 0)) 
#define EQ_ZERO(x) ((x ## high == 0) && (x ## low == 0)) 
#define LT_ZERO(x) ((x ## high < 0)) 

#define INCR(x,a) {if (OVERFLOW(x ## low += a)) DOCARRY(x);}
#define DECR(x,a) {if (OVERFLOW(x ## low -= a)) DOBORROW(x);}
#define PLUS_EQUALS(x, y) { \
 x ## high += y ## high;  \
 if (OVERFLOW(x ## low += y ## low))\

#define MINUS_EQUALS(x, y) { \
 x ## high -= y ## high;  \
 if (OVERFLOW(x ## low -= y ## low))\

Here’s Bob’s winning solution:

Solution strategy

Factoring is a field which has been the subject of a great deal of research because of the implications for cryptography, especially techniques that depend on the difficulty of factoring very large numbers. Therefore, it is possible that some of these algorithms could be applied to the challenge.

However, in the event that no mathematician specializing in the field chooses to enter the Challenge, this relatively simple solution takes advantage of some of the simplifying conditions in the problem statement:

1) the numbers are relatively small (64 bits, or ~<20 digits)

2) the prime factors are even smaller (32 bits, or ~<10 digits)

This solution depends on no precomputed information. It is based on Fermat's algorithm, described in Knuth Vol II, which is especially well suited to the problem because it is most efficient when the two p, [sorry, the rest of the sentence was missing - Ed stb]

Fermat's algorithm requires ~(p-1)sqrt(n) iterations, where n=u*v and u~=p*sqrt(n), v~=sqrt(n)/p. Other algorithms require half as many iterations, but require more calculation per iteration.

Fermat's algorithm works as follows:

1) Let n - u*v, u and v odd primes.

2) Set a = (u+v)/2 and b = (u-v)/2.

3) Then n = uv = a**2 - b**2

4) Initialize a = trunc(sqrt(n)), b=0, r=a**2-b**2-n

5) Iterate looking for r==0, with an inner loop that keeps a=(u+v)/2 constant and increases b=(u-v)/2 by 1 each iteration until r becomes negative. When this happens, the halfsum a is increased by 1, and the difference loop is repeated.

The algorithm in Knuth uses auxiliary variables x,y for efficiency, where x = 2*a+1 and y = 2*b+1

This works fine in most cases, but causes overflow of a longword when x,y are the full 32-bits in size. So we have augmented the algorithm to deal with this case.

This solution also uses an efficient integer sqrt algorithm due to Ron Hunsinger, and extends that algorithm to 64 bits.

/* 4 */
#pragma options(assign_registers,honor_register)

#define ulong unsigned long
#define ushort unsigned short

#define kLo16Bits 0xFFFF
#define kHiBit 0x80000000UL
#define kLo2Bits 3
#define kLo1Bit 1

Macros RightShift2 and RightShift1 shift a 64-bit value right by 2 and 
1 bits, respectively.
#define RightShift32By2(xL,xH)                            \
{                                                         \
  xL >>= 2;                                               \
  xL |= (xH & kLo2Bits)<<30;                              \
  xH >>= 2;                                               \

#define RightShift32By1(xL,xH)                            \
{                                                         \
  xL >>= 1;                                               \
  xL |= (xH & kLo1Bit)<<31;                               \
  xH >>= 1;                                               \

Macros Add32To64 (Sub32From64) add (subtract) a 32-bit value to (from) 
a 64-bit value.
#define Add32To64(rL,rH, a)                               \
  temp = rL;                                              \
  if ((rL += a) < temp) ++rH;

#define Add2NPlus1To64(lowR,highR,a)                      \
  Add32To64(lowR,highR,a);                                \
  Add32To64(lowR,highR,a);                                \

#define Sub32From64(rL,rH, s)                             \
  temp = rL;                                              \
  if ((rL -= s) > temp) --rH;

#define Sub2NPlus1From64(lowR,highR,s)                    \
  Sub32From64(lowR,highR,s);                              \
  Sub32From64(lowR,highR,s);                              \

//Macros Add64 (Sub64) add (subtract) two 64-bit values.
#define Add64(qL,qH, eL,eH)                               \
  Add32To64(qL,qH,eL);                                    \
  qH += eH;

#define Sub64(qL,qH, eL,eH)                               \
  Sub32From64(qL,qH, eL);                                 \
  qH -= eH;

Macro Square64 multiplies a 32-bit value by itself to produce the square 
as a 64-bit value.  For this solution, we only need to execute this macro 
expansion once.
#define Square64(a,rL,rH,temp)                            \
{                                                         \
  ulong lohi,lolo,hihi;                                   \
  ushort aHi,aLo;                                         \
  aHi = a>>16;                                            \
  aLo = a;                                                \
  rL = (lolo = (ulong)aLo*aLo)&kLo16Bits;                 \
  lohi = (ulong)aLo*aHi;                                  \
  temp = ((lohi&kLo16Bits)<<1) + (lolo>>16);              \
  rL |= temp<<16;                                         \
  temp>>=16;                                              \
  temp += ((hihi = (ulong)aHi*aHi)&kLo16Bits) +           \
                             (lohi>>(16-1));              \
  rH = temp&kLo16Bits;                                    \
  temp>>=16;                                              \
  temp += hihi>>16;                                       \
  rH |= temp<<16;                                         \

Macros LessEqualZero64 and EqualZero64 determine if 64-bit (signed) values 
are <= 0 or == 0, respectively.
#define LessEqualZero64(vL,vH)                            \
    ( (0>(long)vH) || ((0==vH) && (0==vL)) )

#define EqualZero64(vL,vH)                                \
     ((0==vL) && (0==vH))

//Macro LessEqual64 determines if one 64-bit quantity is less than or 
equal to another.
#define LessEqual64(uL,uH, vL,vH)                         \
    ( (uH< vH) || ((uH==vH) && (uL<=vL)))

//Function prototypes.
ulong sqrt64 (ulong nLo,ulong nHi);
void Factor64(ulong lowHalf,ulong highHalf,
              ulong *prime1Ptr,ulong *prime2Ptr);
The solution ...
void Factor64(lowHalf,highHalf,prime1Ptr,prime2Ptr)
unsigned long lowHalf,highHalf;
unsigned long *prime1Ptr,*prime2Ptr;
register ulong x,y,lowR,highR;
register ulong temp;
ulong sqrtN;

Fermat's algorithm (Knuth)

Assume n=u*v, u<v, n odd, u,v odd
Let a=(u+v)/2  b=(u-v)/2  n=a**2-b**2  0<=y<x<=n
Search for a,b that satisfy x**2-y**2-n==0

NOTE:  u,v given as being < 2**32 (fit in one word).  Therefore a,b also 
are < 2**32 (and fit in one word).

C1: Set x=2*floor(srt(n))+1,
     x corresponds to 2a+1, y to 2b+1, r to a**2-b**2-n
C2: if r<=0 goto C4
       (algorithm modified to keep r positive)
C3: r=r-y, y=y+2,
     goto C2
C4: if (r==0) terminate with n = p*q,
         p=(x-y)/2, q=(x+y-2)/2
C5: r=r+x, x=x+2,
     goto C3

This solution modifies the algorithm to:
(1) reorder arithmetic on r to keep it positive
(2) extend r to 64 bits when necessary
(3) handle the trivial case where one of the primes is 2.
//Handle the trivial case with an even prime factor.
  if (0 == (lowHalf&1)) {
    *prime1Ptr = 2;
    *prime2Ptr = (lowHalf>>1) | ((highHalf&kLo1Bit)<<31);
//Compute truncated square root of input 64-bit number.
  sqrtN = temp = sqrt64(lowHalf,highHalf);
//Initialize r to s*s - n, but calculate n-s*s to keep r positive, and 
fix later when it is time
//to add x to r by calculating r = x - (n-s*s).
  Sub64(lowHalf,highHalf, lowR,highR);
//Handle perfect square case.
  if ((0==highHalf) && (0==lowHalf)) {
    *prime1Ptr = *prime2Ptr = sqrtN;
  y = 1;
  highR = 0;
//Separate out the overflow case where x=2a+1 does not fit into a long 
  if ((temp=sqrtN) >= kHiBit-1) goto doLargeX;
//If sqrt(n) < 0x80000000, then 2*sqrt(n)+1 fits in one long word.  
//Also, n-trunc(sqrt(n))**2 < 2*trunc(sqrt(n)) also fits in a long word.
  x = 1+2*temp;
  lowR = x - lowHalf;
  x += 2;
  do {
    if (lowR<=y) break;
    lowR -= y;
    y += 2;
  } while (true); /* exit when r<=y */
  if (y==lowR) {
    *prime1Ptr = (x-y-2)>>1;
    *prime2Ptr = (x+y)>>1;
  lowR += (x-y);
  y += 2;
//Fall through to modified algorithm if x overflows a long word.
  if ((x += 2) < (ulong)0xFFFFFFFF-2) goto C2;
Adjust x and y to guarantee they will not overflow.  This requires some 
extra arithmetic to add 2*a+1 and subtract 2*b+1, but that is preferable 
to using two longs to represent each of x and y.
  goto C3L;

//x=2*a+1 no longer fits in 32 bits, so we sacrifice a little loop efficiency 
and let x=a. //Likewise, we let y=b instead of 2*b+1.
  lowR = x = temp;
  Sub64(lowR,highR, lowHalf,highHalf);
  do {
    if ( LessEqualZero64(lowR,highR) ) break;
  } while (true); /* exit when lowR,highR<=0 */
  if ( EqualZero64(lowR,highR)) {
    *prime1Ptr = x-y;
    *prime2Ptr = x+y;
  goto C3L;

//sqrt_max4pow is the largest power of 4 that can be represented in an 
unsigned long.
#define sqrt_max4pow (1UL << 30)
//undef sqrt_truncate if rounded sqrts are desired; for the factoring 
problem we want
//truncated sqrts.
#define sqrt_truncate

//sqrt64 is based on code posted by Ron Hunsinger to comp.sys.mac.programmer. 
//Modified to handle 64-bit values.
ulong sqrt64 (ulong lowN, ulong highN) {
Compute the integer square root of the integer argument n.  Method is 
to divide n by x computing the quotient x and remainder r.  Notice that 
the divisor x is changing as the quotient x changes.
 Instead of shifting the dividend/remainder left, we shift the quotient/divisor 
right.  The binary point starts at the extreme left, and shifts two bits 
at a time to the extreme right.
 The residue contains n-x**2.  Since (x + 1/2)**2 == x**2 + x + 1/4, 

n - (x + 1/2)**2 == (n - x**2) - (x + 1/4)
 Thus, we can increase x by 1/2 if we decrease (n-x**2) by (x+1/4)
  register ulong lowResidue,highResidue; /* n - x**2 */
  register ulong lowRoot,highRoot;       /* x + 1/4 */
  register ulong half;                   /* 1/2     */
  ulong highhalf,lowhalf,temp;

  lowResidue = lowN;
  if (0 != (highResidue = highN)) {
//This code extends the original algorithm from 32 bits to 64 bits. 
// It parallels the 32-bit code; see below for comments.
    highRoot = sqrt_max4pow; lowRoot = 0;
    while (highRoot>highResidue)
    Sub64(lowResidue,highResidue, lowRoot,highRoot);
//The binary point for half is now in the high order of two 32-bit words 

//representing the 64-bit value.
    lowhalf = lowRoot; highhalf = highRoot;
    Add64(lowRoot,highRoot, lowhalf,highhalf);
    if (0==highhalf) goto sqrt2;
    half = highhalf<<1;
    do {
      if (LessEqual64(lowRoot,highRoot,lowResidue,highResidue))
        highResidue -= highRoot;
        highRoot += half;
      if (0 == (half>>=2)) {
        half = sqrt_max4pow<<1;
        goto sqrt2a;
      highRoot -= half;
      highRoot >>= 1;
    } while (true);
//The binary point for half is now in the lower of the two 32-bit words 

//representing the 64-bit value.
    half = lowhalf<<1;
    do {
      if ((0==highResidue) && (0==highRoot)) goto sqrt3;
      if (LessEqual64(lowRoot,highRoot,lowResidue,
                            highResidue)) {
        Sub64(lowResidue,highResidue, lowRoot,highRoot);
      half >>= 2;
    } while (half);
  } else /* if (0 == highResidue) */ {
#ifndef sqrt_truncate
    if (lowResidue <= 12)
      return (0x03FFEA94 >> (lowResidue *= 2)) & 3;
    if (lowResidue <= 15)
      return (0xFFFEAA54 >> (lowResidue *= 2)) & 3;
    lowRoot = sqrt_max4pow;
    while (lowRoot>lowResidue) lowRoot>>=2;

//Decrease (n-x**2) by (0+1/4)
    lowResidue -= lowRoot;
//1/4, with binary point shifted right 2
    half = lowRoot >> 2;
//x=1.  (lowRoot is now (x=1)+1/4.)
    lowRoot += half;
//1/2, properly aligned
    half <<= 1;

//Normal loop (there is at least one iteration remaining)
    do {
      if (lowRoot <= lowResidue) {
// Whenever we can, decrease (n-x**2) by (x+1/4)
        lowResidue -= lowRoot;
        lowRoot += half;
//Shift binary point 2 places right
      half >>= 2;
//x{+1/2}+1/4 - 1/8 == x{+1/2}+1/8
      lowRoot -= half;
//2x{+1}+1/4, shifted right 2 places
      lowRoot >>= 1;
//When 1/2 == 0, bin point is at far right
    } while (half);
#ifndef sqrt_truncate
  if (lowRoot < lowResidue) ++lowRoot;

//Return value guaranteed to be correctly rounded (or truncated)
    return lowRoot;


Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

How to evolve Eevee in Pokemon GO
By now, almost everyone should be hip to how to evolve Pokemon in Pokemon GO (and if not, there's a guide for that). Just gather enough candy of the appropriate type, feed them all to the Pokemon, and evolution happens. It's a miracle that would... | Read more »
CSR Racing 2: Guide to all game modes
It might not seem like there are all that many ways to go fast in a straight line, but CSR Racing 2 begs to differ. [Read more] | Read more »
Bulb Boy (Games)
Bulb Boy 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Multi-award winning 2D point & click horror adventure about a boy with a glowing head. | Read more »
5 top free emoji keyboard apps
If we're not at peak emoji yet as a society, it feels like we definitely should be. The emoji concept has gone far beyond what anyone in Japan could have envisioned when the people there unleashed it on an unsuspecting world, but the West has... | Read more »
How to unlock more characters in Disney...
One of the big charms of Disney Emoji Blitz is seeing a wide variety of beloved Disney and Pixar characters transformed into smiling emojis. Even someone like the sneaky Randall from Monsters Inc., who probably never cracked a smile on film, is... | Read more »
Cubway (Games)
Cubway 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Cubway is a journey with an abstract story of lifecycle of rebirth, called Samsara. Guide the cube through the long way full of dangers... | Read more »
Colorcube (Games)
Colorcube 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Turn pieces and blend colours in this minimal yet visually stunning puzzler.Over 200 handcrafted and challenging levels. Features... | Read more »
Doodle God Griddlers (Games)
Doodle God Griddlers 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: | Read more »
Crusader Kings: Chronicles (Games)
Crusader Kings: Chronicles 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Crusader Kings: Chronicles is an interactive text based game that puts you in the shoes of Guy de Rose as you make... | Read more »
Roads of Rome: New Generation (Games)
Roads of Rome: New Generation 1.0 Device: iOS Universal Category: Games Price: $5.99, Version: 1.0 (iTunes) Description: | Read more »

Price Scanner via

Apple price trackers, updated continuously
Scan our Apple Price Trackers for the latest information on sales, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. We update the trackers continuously: - 15″... Read more
13-inch 2.5GHz MacBook Pro (Apple refurbished...
Apple has Certified Refurbished 13″ 2.5GHz MacBook Pros available for $829, or $270 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.5GHz MacBook Pros... Read more
21-inch iMacs on sale for up to $120 off MSRP
B&H Photo has 21″ iMacs on sale for up to $120 off MSRP including free shipping plus NY sales tax only: - 21″ 3.1GHz iMac 4K: $1379 $120 off MSRP - 21″ 2.8GHz iMac: $1199.99 $100 off MSRP - 21″ 1... Read more
Charitybuzz Set to Auction Unique Apple-1 Com...
Offering an opportunity to own the computer that sparked a revolution, on Monday, July 25, leading online charity auction platform Charitybuzz will auction what is claimed to be the world’s most... Read more
MacBook Airs on sale for up to $150 off MSRP
Amazon has 11″ and 13″ MacBook Airs on sale for up to $150 off MSRP for a limited time. Shipping is free: - 13″ 1.6GHz/128GB MacBook Air (sku MMGF2LL/A): $899.99 $100 off MSRP - 13″ 1.6GHz/256GB... Read more
Apple refurbished 13-inch Retina MacBook Pros...
Apple has Certified Refurbished 13″ Retina MacBook Pros available for up to $270 off the cost of new models. An Apple one-year warranty is included with each model, and shipping is free: - 13″ 2.7GHz... Read more
Apple refurbished 11-inch MacBook Airs availa...
Apple has Certified Refurbished 11″ MacBook Airs (the latest models), available for up to $170 off the cost of new models. An Apple one-year warranty is included with each MacBook, and shipping is... Read more
Apple iPad Pro Sales Far Outpacing Microsoft...
A report on Appleinsider notes that despite Microsoft Surface tablet PC sales growing by 9 percent year over year, revenues remained below $1 billion, and are down sequentially from the $1.1 billion... Read more
DEVONthink 2.9 Features Ultra-fast, Robust, A...
DEVONthink 2.9 allows users to keep databases synchronized using many means of transport. It transmits them between Macs on the local network or stores them in a syncable form on removable hard... Read more
12-inch WiFi Apple iPad Pros on sale for up t...
B&H Photo has 12″ WiFi iPad Pros on sale for up to $100 off MSRP, each including free shipping. B&H charges sales tax in NY only: - 12″ Space Gray 32GB WiFi iPad Pro: $749 $50 off MSRP - 12″... Read more

Jobs Board

*Apple* Solutions Consultant - APPLE (United...
Job Summary As an Apple Solutions Consultant, you'll be the link between our future customers and our products. You'll showcase your entrepreneurial spirit as you Read more
*Apple* Professional Learning Specialist - A...
Job Summary The Apple Professional Learning Specialist is a full-time position for one year with Apple in the Phoenix, AZ area. This position requires a high Read more
*Apple* Picker - Apple Hill Orchard (United...
Apple Hill Orchard, Co. Rte. 21,Whitehall, NY 9/7/16-10/228/16. Pick fresh market or processing apples Productivity of 60 boxes and 80 boxes processing fruit per Read more
*Apple* Solutions Consultant - APPLE (United...
Job Summary As an Apple Solutions Consultant, you'll be the link between our future customers and our products. You'll showcase your entrepreneurial spirit as you Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description:SalesSpecialist - Retail Customer Service and SalesTransform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.