TweetFollow Us on Twitter

Aug 00 Challenge Volume Number: 16 (2000)
Issue Number: 8
Column Tag: Programmer's Challenge

Programmer's Challenge

By Bob Boonstra, Westford, MA

Longest Word Sort

This month's problem puts a twist on a well-known computing problem, the common text sort. Your Challenge is to sort a block of text into ascending or descending order. Yawn.... Except that the comparison function used to do the sort is a little unusual....

The prototype for the code you should write is:

void LongestWordSort(
   char *textToSort,            /* the text to be sorted */
   long numChars,               /* the length of the text in bytes */
   Boolean descending,         /* sort in descending order if true */
   Boolean caseSensitive      /* sort is case sensitive if true */
);

The textToSort provided to your LongestWordSort routine consists of a sequence of lines of text, each terminated by a return character (0x0D). You are to sort the lines of text in place into either ascending or descending order, based on the value of the descending flag, considering the text as either case sensitive or not case sensitive, based on the caseSensitive flag. The twist is that the primary sort criterion is the length of the words in the line, regardless of the position of the word in the line.

Words are a sequence of alphanumeric characters, separated by anything else (spaces, punctuation, etc.). A line with a longest word of N characters is considered to be "greater" than any line with a longest word of fewer characters. Among lines with longest words of the same length, the sort order is based on the length of the next-longest word, and so on. Finally, among lines with words of exactly the same length, the sort order is based on text order of the individual words, compared in order of length, and then in order of occurrence.

As an example, the following two lines have the same length pattern, with each containing two 5-letter words and one 3-letter word:

            the quick foxes
            early birds win

Sorted in ascending order, the second line is smaller, based on a comparison of "early" and "quick".

The text may include non-alphanumeric characters, which should ge ignored for purposes of comparison.

This will be a native PowerPC Challenge, using the CodeWarrior Pro 5 environment. Solutions may be coded in C, C++, or Pascal. This month, we'll also allow solutions that are completely or partially coded in assembly language. The winner will be the solution that correctly sorts a number of blocks of text in the least amount of execution time.

This problem was suggested by Ernst Munter, who earns two Challenge points for the suggestion.

Three Months Ago Winner

The May BigNum Math Challenge required contestants to write a collection of numerical routines for very large integers, numbers with hundreds of digits or more, well beyond what might fit in the memory normally allocated for an integer. The problem required design of an internal representation for these big numbers, and creation of routines to add, subtract, multiply, and divide them. It also required code to raise one such number to the power of another, and code to calculate the square root of such numbers. And it required routines to convert the internal representation back and forth from a decimal representation. Congratulations to Ernst Munter (Kanata, Ontario) for taking first place in the BigNum Math Challenge.

Ernst chose to represent his BigNums as a sequence of unsigned 16-bit terms, to ensure that intermediate results fit into a 32-bit long word. The multiplication algorithm is inspired by an algorithm from Knuth, optimized to maximize the use of registers. The exponentiation tests were the most computationally intensive; Ernst's solution scans the bits of the exponent, squaring the intermediate result with each step, and optionally multiplying by the base. For conversion back to decimal, Ernst saves some time by converting to an intermediate representation in base 10000. These and other functions are well commented in the published code.

The second place solution by Mark Day was actually faster than the winning solution in many of the test cases. However, Mark's solution was significantly slower in the exponentiation test case, putting it in second place overall.

The table below lists, for each of the solutions submitted, and the cumulative execution time in milliseconds. It also provides the code size, data size, and programming language used for each entry. As usual, the number in parentheses after the entrant's name is the total number of Challenge points earned in all Challenges prior to this one.

NameTime (secs)Code SizeData SizeLang
Ernst Munter (607)1.9549012452C++
Mark Day (20)4.2223800162C
Tom Saxton (158)65.111700196C++
Willeke Rieken (88)98.6313044280C++
Steve Lobosso749.06416082813C++
Jonny Taylor*162163868C

Top Contestants

Listed here are the Top Contestants for the Programmer's Challenge, including everyone who has accumulated 10 or more points during the past two years. The numbers below include points awarded over the 24 most recent contests, including points earned by this month's entrants.

Rank Name Points
1. Munter, Ernst 245
2. Saxton, Tom 146
3. Maurer, Sebastian 78
4. Boring, Randy 52
5. Shearer, Rob 47
6. Rieken, Willeke 45
7. Heathcock, JG 23
8. Taylor, Jonathan 26
9. Brown, Pat 20
10. Downs, Andrew 12
11. Jones, Dennis 12
12. Day, Mark 10
13. Duga, Brady 10
14. Fazekas, Miklos 10
15. Hewett, Kevin 10
16. Murphy, ACC 10
17. Selengut, Jared 10
18. Strout, Joe 10

There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:

1st place 20 points
2nd place 10 points
3rd place 7 points
4th place 4 points
5th place 2 points
finding bug 2 points
suggesting Challenge 2 points

Here is Ernst's winning BigNum Math solution:

Bignum.cp
Copyright © 2000
Ernst Munter

/*
               "BigNum Math"

  Version 3.1

  Task
  --
  Implement a set of arithmetic function for multi-digit numbers.
  The design of the big number data structure is unspecified.
  
  Data Structure
  -------
  BigNum contains a length field (number of decimal digits required to represent the
  number) and an opaque (void) pointer to a data structure.  I have defined a simple 
  structure for the internal representation of big numbers:
  struct MyNum {
      unsigned short term[]; is the binary representation of the absolute value, 
         stored least significant first in unsigned 16-bit chunks.
         The size of term[] is allocated as required by the magnitude of the big number.
      unsigned long numTermsAllocated; records the number of terms allocated.
         a sign bit is hidden as the MSB of numTermsAllocated.
      long lastTerm; records the index of the highest term actually in use. 
  };

  Two special values are defined for BigNum:
     - the numeric value of 0 is represented by setting the bigNumData pointer to 0,
     - an invalid result (e.g. divide by 0) is represented by BigNum.lengthInDigits = -1.
     
  Program Structure
  ---------
  The externally available C functions deal directly with most special cases, and then
  call a function in a layer which allocates MyNum objects and returns a BigNum.
  
  The allocation layer invokes the appropriate arithmetic method on the MyNum 
  object to obtain the result which is passed back to the allocation layer, for conversion 
  into a BigNum. 
     
  Conversion Algorithms
  -----------
  Conversions between the decimal digit representation and the binary representation 
  of MyNum are done in two stages.  The intermediate quasi-decimal representation, 
  uses MyNum formats, but with terms on a base of 10,000 (instead of 65,536).  The 
  routines with the fragment "10k" in their names participate in the conversion 
  process.
  
  Conversion from decimal to 10k representation is fairly simple: each four decimal 
  digits  make up one 10k term.  Conversion from 10k to binary is based on the 
  expansion
        ...(((x[n]*10k + x[n-1])*10k + x[n-2])*10k ....  + x[0], 
        evaluated in normal binary arithmetic.
     
  From binary we convert back to the 10k representation in a similar way:
        ...(((y[n]*64K + y[n-1])*64K + y[n-2])*64K ....  + y[0], 
        but evaluated with base 10k arithmetic.
        
  The splitting of each 10k term into 4 decimal digits is then trivial.
  
  Binary Arithmetic Algorithms
  --------------
  These are commented in the program.  
  The choice of 16-bit instead of 32-bit terms was primarily made to have a simple
  way of dealing with carries and with partial products.
  
  Floating point arithmetic is used in the square root function to obtain a quick good 
  first estimate from the first few bytes of the input value.
  
  Limitations
  ------
  The Power function is limited to providing results within the memory limitations of 
  a practical machine.  The smallest possible base (2) to the power of the largest 
  unsigned long exponent (about 4 billion) would result in a number of 4 billion bits 
  (about 1.3 billion digits).  Since this alone is beyond the storage capacity (RAM) of 
  the target machines, I have limited the power function to exponents with at most 32 
  bits (binary).
   
  Memory for temporary values is allocated using new[].  In the default C++ compiler 
  mode, an exception will be thrown if memory cannot be allocated, and the program 
  will just quit.
  
  If you expect to run up to your memory limits, and want to recover from failed new[] 
  calls, the program should be modified to either use new with the no_throw option 
  and test for null, or employ try {} catch() {} blocks.  
  
  Version 2 Changes
  ---------
  A better initial estimate for square roots is obtained by exploiting the full
  power of sqrt(double) to operate on up to 52 integer bits (instead of just 48).
  
  A faster multiplier is implemented which grows at less than the square law with
  increasing number size.  It is based on the observation that 2 two-digit 
  numbers can be multiplied with just 3 digit multiplications instead of four.
  Used recursively, this turns out to be much faster overall, for large numbers even 
  though temporary memory must be allocated and initialized frequently, and some 
  adds and subtracts are needed to combine the terms.
  
  Version 3 Changes
  ---------
  The fast multiply function was moved to a separate file "BigMult.cp".  The final
  recursion step is fanned out to a serious of completely unrolled matrix multipli-
  cation routines.  This technique alone improved speed again by almost a factor of 3.
  
  Analysis of the disassembled output led to a few small improvements for v.3.1.
  Specifically, the use of the struct member "lastTerm" as a loop control resulted
  in unnecessary reloads from memory, even though the parameter is declared const.  
  This inefficiency is avoided by making a local copy of the value, which in turn
  leads to the unrolling of the loop by the compiler.
    
*/ 
#define DBG 0

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#if DBG==0
#define NDEBUG
#endif
#include <assert.h>
#include "bignum.h"
#include "BigMult.h"
 
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;

typedef uchar* ucharPtr;

// HI and LO are used frequently to separate 32-bit values into 16-bit halves
#define HI(chunk) ((chunk)>>16)
#define LO(chunk) ((chunk)&0xFFFF)

enum {
   kAddMode         = 0,
   kSubMode         = 1,
   kDivideMode         = 0,
   kModuloMode         = 1,
   kPos            = 0,
   kNeg            = 0x80000000,   // sign bit constant
   kBase             = 0x10000UL,
   k10k            = 10000,
   kDefaultNumTerms   = 8   // controls minimum size of allocated MyNum objects 
};

// constant for a fairly accurate estimate of decimal digit lengths
const double kDigitsPerBit=log10(2.0);

Binary
static ulong Binary(uchar* digits,int num)
// converts up to 4 decimal digits to binary
// digits are assumed to be binary digits (0-9), NOT digit characters ('0'-'9')
{
   assert((num>=1) && (num<=4));
   ulong v=*digits;
   switch (num)
   {
case 4:v = v * 10 + *++digits;
case 3:v = v * 10 + *++digits;
case 2:v = v * 10 + *++digits;
   }
   return v;
}

// Forward declarations of static functions only as needed
class MyNum;
typedef MyNum* MyNumPtr; 
static MyNumPtr MultiplyMyNum(const MyNum & A,const MyNum & B,int sign);

/******************************************************************
/
/                        Class MyNum
/
/       The class MyNum represents the real data underlying BigNum.bigNumData
/      Most computation and arithmetic work, is done in methods of this class.
/
/******************************************************************/
class MyNum {
private:
    ulong   numTermsAllocated;      // hides sign bit in MSB of this member
public:
    long   lastTerm;            // should be a signed type, initialized to -1
    ushort   term[kDefaultNumTerms];   // we store binary terms LSB first
    
// const functions return computed values without changing MyNum:
    ulong NumTermsAllocated() const {return numTermsAllocated & ~0x80000000;}
    ulong SignBit() const {return numTermsAllocated & 0x80000000;}
    ulong IsNegative() const {return numTermsAllocated>>31;}
    ulong MSB() const {return term[lastTerm];}
    bool  IsSmall() const {return (0==(lastTerm & ~1));}
         // lastTerm == 0 or 1
    ulong Ulong() const { 
         // returns the value of a small MyNum as a 32-bit ulong
       ulong x=term[0];
       if (lastTerm) return x+(term[1] << 16);
       return x;
    }
    ulong IsOdd() const {return term[0] & 1;}    
    ulong Log2() const   
    {
      ulong lastTermValue=term[lastTerm];
      ulong numBits=lastTerm * 16;
      while (lastTermValue) 
      {
         numBits++;
         lastTermValue>>=1;
      }
      return numBits;
    }
    
    void  SetNumTermsAllocated(ulong numTerms,ulong sign)
       {   numTermsAllocated=numTerms | (kNeg & sign);}
       
    void ClearMemory(){memset(term,0,sizeof(ushort)*NumTermsAllocated());}
    
    void Init(uchar* digits,int numDigits)
// Initializes MyNum from the decimal digits
    {
       int numHighDigits=numDigits % 4;
       uchar* endDigits=digits+numDigits;
       if (numHighDigits)   // Convert first few (<4) digits
       {
           term[0]=Binary(digits,numHighDigits);
          digits+=numHighDigits;
       } else term[0]=0;
       lastTerm=0;
       while (digits<endDigits) // Convert remaining digits in groups of 4
       {
          ulong fourDigits=Binary(digits,4);
          MultiplyAddBy10k(fourDigits);
          digits+=4;
       }
    }
    
    void MultiplyAddBy10k(ulong chunk)
// Multiplies the current number (base 64K) by 10k and adds chunk to it
    {
       long carry=chunk,i=0,lt=lastTerm;
       for (;i<=lt;i++)
       {
          ulong L=(term[i])*k10k + carry;
          carry=HI(L);
          term[i]=L;
       }
       if (carry)
       {
          lastTerm=i;
          term[i]=carry;
       }
       assert(0== HI(carry));
    }
    
    void BinaryTo10k(const MyNum & A)
// Converts MyNum from 64K (i.e. binary) base to 10k (i.e. pseudo decimal) base
    {
       const ushort* lastAterm=A.term+A.lastTerm;
       ulong L=*lastAterm,carry=L/k10k;
       term[0]=L%k10k;
       int j=0;
       while (carry)
       {
          L=carry%k10k;
          carry/=k10k;
          term[++j]=L;
       } 
       lastTerm=j;
       
       while (lastAterm>A.term)
          MultiplyAddMod10k(*-lastAterm);
    }
    
    void MultiplyAddMod10k(ulong chunk)
// Used in BinaryTo10k.  Multiplies MyNum by 64K, using 10k base arithmetic. 
    {
       long carry=chunk,i=0,lt=lastTerm;
       for (;i<=lt;i++)
       {
          ulong L=(term[i]<<16) + carry;
          carry=L/k10k;
          term[i]=L%k10k;
       }
       while (carry)
       {
          lt=i;
          term[i++]=carry%k10k;
          carry /= k10k;
       }
       lastTerm=lt;
    }
    
    long k10ToDigits(uchar* digits) const
    {
// Converts 10k based terms to simple decimal digits.
// Note: terms are stored LSB first, digits are stored MSB first
       ulong D=term[lastTerm];
       int i=
          (D>=1000)?4:(D>=100)?3:(D>=10)?2:1;
       switch (i)
       {
 case 4:   digits[3]=D%10;D/=10;
 case 3:   digits[2]=D%10;D/=10;
 case 2:   digits[1]=D%10;D/=10;
 case 1:   digits[0]=D;
       }  
       
       int lt=lastTerm;
       for (int j=lt-1;j>=0;j-,i+=4)
       {
          D=term[j];
         digits[i+3]=D%10;D/=10;
         digits[i+2]=D%10;D/=10;
         digits[i+1]=D%10;D/=10;
         digits[i]=D;
       }
       return i;
    }
    
    int CompAbsolute(const MyNum & A) const
// Compares the absolute values of MyNum and A
// Assumes both numbers are normalized (no leading zeros)
    {
       assert(term[lastTerm]);
       assert(A.term[A.lastTerm]);
       int lt=lastTerm;
       if (lt > A.lastTerm) return 1;
       if (lt < A.lastTerm) return -1;
       do {
          if (term[lt] > A.term[lt]) return 1;
          if (term[lt] < A.term[lt]) return -1;
      } while (-lt >= 0);
//       for (int i=lt;i>=0;i-)
//       {
//          if (term[i] > A.term[i]) return 1;
//          if (term[i] < A.term[i]) return -1;
//       }
       return 0;
    }
    
    void Sum(const MyNum & A,const MyNum & B)
// MyNum = A + B (absolute values)
// Assumes A has at least as many terms as B
// MyNum has enough terms allocated to handle the result
    {
       assert(A.lastTerm>=B.lastTerm);
       ulong L=A.term[0]+B.term[0],carry=HI(L);
       term[0]=L;
       
       int i=1,blt=B.lastTerm;
       for (;i<=blt;i++)
       {
          L=A.term[i]+B.term[i]+carry;
          carry=HI(L);
          term[i]=L;
       }
       
       int lt=A.lastTerm;
       for (;i<=lt;i++)
       {
          L=A.term[i]+carry;
          carry=HI(L);
          term[i]=L;
       }
       
       if (carry)
       {
          assert(i < NumTermsAllocated());
          lt=i;
          term[i]=carry;   
       }
       lastTerm=lt;
    }
    
    void Difference(const MyNum & A,const MyNum & B)
// MyNum = A - B (absolute values)
// Assumes A has at least as many terms as B, and A >= B
// MyNum has enough terms allocated to handle A
    {
       assert(A.lastTerm>=B.lastTerm);
       
       long L=A.term[0]-B.term[0],borrow=HI(L);
       term[0]=L;
       int i=1,blt=B.lastTerm;
       for (;i<=blt;i++)
       {
          L=A.term[i]-B.term[i]+borrow;
          borrow=HI(L);
          term[i]=L;
       }
       
       int lt=A.lastTerm;
       for (;i<=lt;i++)
       {
          L=A.term[i]+borrow;
          borrow=HI(L);
          term[i]=L;
       }
       // remove leading zeros
       while ((lt>0) && (term[lt]==0)) lt-;
       lastTerm=lt;
    }
    
    void SmallProduct(const MyNum & B,const MyNum & A,ulong hiTerm)
// MyNum = A * B (absolute values)
// MyNum has enough terms allocated to handle A*B
    {
       assert(NumTermsAllocated()>=1+hiTerm);
       memset(term,0,(1+hiTerm)*sizeof(ushort));
       int alt=A.lastTerm,blt=B.lastTerm;
       for (int i=0;i<=alt;i++)
       {
          ulong a=A.term[i];
          for (int j=0;j<=blt;j++)
          {
             int k=i+j;
             ulong x=term[k]+a*B.term[j];            
             term[k]=x;
             ulong carry=HI(x);
             while (carry)
             {
                assert(k+1<NumTermsAllocated());
                x=term[++k]+carry;
                term[k]=x;
                carry=HI(x);
             }
          }
       }
       // remove leading zeros
       while (hiTerm && (0==term[hiTerm])) hiTerm-;
       lastTerm=hiTerm;                        
    }
    
    void Product(const MyNum & A,const MyNum & B);
// MyNum = A * B (absolute values)

    void SmallQuotient(MyNum & R,const ulong divisor)
// Divides R by the small divisor, a single ulong.
// MyNum = R / divisor (absolute values).
// MyNum will be the quotient.  R will contain the remainder.
    {
       int ir=R.lastTerm;
       ulong upperDigit=0;
       for (int iq=ir;iq>=0;iq-)
       {
          upperDigit<<=16;
          ulong x=R.term[iq] + upperDigit;
          if (upperDigit) 
          {
             R.term[iq+1]=0;
             R.lastTerm=iq;
          }
          if (x < divisor)
          {
             term[iq]=0;
             upperDigit=x;
          } else
          {
             ulong y=x/divisor;
             term[iq]=y;
             if (lastTerm<iq)
                lastTerm=iq;
             R.term[iq]=upperDigit=x - y*divisor;
          }
       }
    }
    
    void Quotient(MyNum & U,const MyNum & V);
    
    void DeNormalize(int shift)
// Reverse shift operation for the remainder after (big)Quotient division 
    {
       int i=0,lt=lastTerm;
       for (;i<lt;i++)
       {
          ulong v=((term[i+1]<<16) + term[i]) >> shift;
          term[i]=v;
       }
       term[i] >>= shift;
    }
    
    void Power(const MyNum & A,const ulong e)
// MyNum = A ** e, all values must be positive
// The power is evaluated by scanning the exponent, MSB first.
// For each except the first bit, the current value of MyNum is squared,
// if the bit is a 1, the current value of MyNum is then multiplied by A.
// Multiplication requires a distinct product destination, so a temporary MyNum T 
// is allocated.  Each multiplication uses either MyNum or T as the source, and 
// the other as the destination.
    {
       assert(numTermsAllocated>A.lastTerm);
       assert(e != 0);
       
       // copy A to this
       lastTerm=A.lastTerm;
       memcpy(term,A.term,(1+lastTerm)*sizeof(ushort));       
       if (e==1) return;
       
       // find MSB of the exponent
       ulong mask=0x80000000;
       while ((e & mask) == 0) mask >>= 1;
       
       mask >>= 1;
       
       // make a second MyNum to hold intermediate products
       int allocatedSize =   (numTermsAllocated-kDefaultNumTerms) * 
            sizeof(ushort) + sizeof(MyNum);
       MyNumPtr T=MyNumPtr(new uchar[allocatedSize]);
       T->numTermsAllocated=numTermsAllocated;
       
       // we'll multiply by ping-ponging between MyNum and *T 
       // but the result should end up in MyNum
       
      MyNumPtr temp[2] = {this,T};
      
      int k=0;
       while (mask)   // for each additional bit in e
       {
          // always square
          temp[1-k]->Product(*temp[k],*temp[k]);   k=1-k;
          if (e & mask)
          {
             // multiply by A if bit is set
             temp[1-k]->Product(*temp[k],A);      k=1-k;
          }
          mask >>= 1;
       }
       
       if (k==1)   // result ended up in *T: must copy to *this
       {
          lastTerm=T->lastTerm;
          memcpy(term,T->term,(1+lastTerm)*sizeof(ushort));
       }
       delete [] T;
    }
    
    void SquareRootEstimate(const MyNum & A)
// Returns the square root of the three or four most significant terms of MyNum
    {
       assert(A.lastTerm>=2);
       int lt=lastTerm=A.lastTerm/2;
       double top3Digits = A.Ulong()*65536.0 + A.term[A.lastTerm-2];
       
       if (A.lastTerm==2)   // just use the three digits
       {
          ulong root=sqrt(top3Digits);
          // the top 3 digits of A are converted to a 24 bit root
          // the result is loaded into the top 2 digits of MyNum
          
          if (A.lastTerm & 1) // even number of digits in A
          {
             term[lt]   = root >> 8;
             term[lt-1]= root << 8;
          } else
          {
             term[lt]   = HI(root);
             term[lt-1]= root;
          }
       } else   // use top 51 or 52 bits to get the most from sqrt(double)
       {      // resulting in a 26-bit estimate
         ulong lastTermValue=A.MSB();
         ulong shift=0;
         while (lastTermValue>0xF)    // get minimum even size of MSB word
         {                     // keep at least 4 bits
            shift+=2;
            lastTermValue>>=2;
         }
         ulong factor = 1<<(16-shift);
          double top52Bits=
             top3Digits * factor + (A.term[A.lastTerm-3]>>shift);
             
          ulong root=sqrt(top52Bits);
          // the top 4 digits of A (minus shifted bits) are converted to a 26 bit root
          // the result is loaded into the top 2 or 3 digits of MyNum
          shift /= 2;
          if (A.lastTerm & 1) // even number of digits in A
          {
             term[lt]   = root >> (16-shift);
             term[lt-1]= root << shift;
          } else
          {
             term[lt]  = root >> (24-shift);
             term[lt-1]= root >> (8-shift);
             term[lt-2]= root << (shift+8);
          }
       }
    }
    
    ulong Average(MyNum & A)
// MyNum becomes the average of MyNum and A.
// Returns true if the original MyNum and the new average are different.
    {
       assert(lastTerm==A.lastTerm);
       ulong carry=0;
       ulong compare=0;   // detect and track changes in compare
       for (int i=lastTerm;i>=0;i-)
       {
          ulong oldTerm=term[i];
          ulong x = (carry + oldTerm + A.term[i]);
          carry = (1 & x) << 16;
          x >>= 1;
          term[i]=x;
          compare |= oldTerm ^ x;
       }                              
       return compare & 0xFFFF;
    }
    
   long EstimateLength() const
// The length in decimal digits is estimated from the number of bits used.
// The result should always be within 1 digit of the true count.
   {
      double numDigits=Log2()*kDigitsPerBit;
      return 1+long(numDigits);
   }
   
    void Normalize()
// Removes leading zeros
    {
       int k=lastTerm;
       while ((k>0) && (term[k]==0)) k-;
       lastTerm=k;   
    }
    
    BigNum Convert2BigNum();
// Returns the BigNum representation of MyNum
};

// The following MyNum functions are not inlined (reduced size, no loss of speed):

BigNum MyNum::Convert2BigNum() 
// Returns the BigNum representation of MyNum
    {
       Normalize();
       BigNum result;
       result.bigNumData=this;
       result.lengthInDigits=EstimateLength();
       return result; 
    }
       
void MyNum::Product(const MyNum & A,const MyNum & B)
// MyNum = A * B (absolute values)
// MyNum has enough terms allocated to handle A*B
    {
       ulong alt=A.lastTerm,blt=B.lastTerm,hiTerm=alt+blt;
       assert(NumTermsAllocated()>=2+hiTerm);
       assert(alt >= blt);
       ulong aSize=alt+1,bSize=blt+1;
      if (bSize >= kMultiplyThreshold)
      {
// Big numbers use fast multiply which requires N to be a small 
// integer (<=15) times a power of two.  RoundUp ensures this.   
          ulong N=A.lastTerm+1;
          N=RoundUp(N);
// Allocate memory for:
//      factor  U   N terms    (copy and pad A to U)
//      factor  V   N terms  (copy and pad B to V)
//      product P   2N terms (compute P = U*V and copy result back to MyNum)
//      scratch      4N terms needed for intermediate products 
          ushort* U=new ushort[8*N];                  
          ushort* V=U + N;
          ushort* P=V + N;
          memcpy(U,A.term,aSize*sizeof(ushort));
          memset(U+aSize,0,(N-aSize)*sizeof(ushort));
          memcpy(V,B.term,bSize*sizeof(ushort));
          memset(V+bSize,0,(N-bSize)*sizeof(ushort));
          
          Multiply(U,V,P,N,bSize);
          
         // remove leading zeros from P
          N += N-1;                     
          while (N && (0==P[N])) N-;   
          
          assert(N<NumTermsAllocated());
          memcpy(term,P,(1+N)*sizeof(ushort));
          lastTerm=N;
          delete [] U;
       } else 
// The smallest factor is not so large, we use a simple multiplication loop       
       SmallProduct(A,B,1+hiTerm);
    }
    
void MyNum::Quotient(MyNum & U,const MyNum & V)
// Divides U by the MyNum divisor V.
// MyNum will contain the quotient,  U the remainder.
// The implementation closely follows Donald Knuth's algorithm 'D' (Vol.2, page 272)
    {
       // U and V already normalized by caller, i.e. V.MSB >= kBase/2
       int n=1+V.lastTerm,   // size of the divisor
          m=U.lastTerm-n,   // msb index of the quotient 
          k=U.lastTerm,   // msb dividend index
          j=m;         // qotient index
          
       assert(n>=2);      // for smaller divisors, should use SmallQuotient()
       
      ulong divisorMSB = V.term[V.lastTerm];
      for (;j>=0;j-,k-)
      {
         ulong x= ulong(U.term[k])*kBase + U.term[k-1];               
         ulong q= x / divisorMSB;
         ulong r= x - q * divisorMSB;
         if ((q==kBase) || (q*V.term[n-2] > (kBase*r + U.term[k-2])))
         {
            q-; r+=divisorMSB;                                 
         }
         
         ulong p,carry=0,borrow=0;
         int i=0;
         for (;i<n;i++)
         {
            p = carry + q * V.term[i];   
            x=U.term[j+i] - LO(p) + borrow;
            borrow=HI(long(x));            
            carry=HI(p);
            U.term[j+i]=x;            
         }
         
         x=U.term[j+i] - LO(carry) + borrow;
         borrow=HI(long(x));
         U.term[j+i]=x;
         
         term[j]=q;
         if (lastTerm<j) lastTerm=j;
         if (borrow)
         {
            term[j]-;                                       
            carry=0;
            for (int i=0;i<n;i++)
            {
               x=carry + U.term[j+i] + V.term[i];
               U.term[j+i] = x; 
               carry=HI(x);                  
            }
            U.term[j+i] = 0;
         }
      }
// Denormalization to be done by the caller only if the remainder is needed.
    }
/******************************************************************
/
/                  Utility functions returning a MyNumPtr
/
/******************************************************************/
 
static MyNumPtr New(int minNumTerms,int sign)
// returns a fresh MyNum, with at least minNumTerms terms
{
    int numTermsAllocated=
       (minNumTerms+kDefaultNumTerms-1) & ~(kDefaultNumTerms-1); 
    assert(numTermsAllocated>=minNumTerms);
    int allocatedSize = (numTermsAllocated-kDefaultNumTerms) * 
            sizeof(ushort) + sizeof(MyNum);
    MyNumPtr B=MyNumPtr(new uchar[allocatedSize]);
    B->SetNumTermsAllocated(numTermsAllocated,sign);
    B->lastTerm=-1;
    return B;
} 

static MyNumPtr NewCopy(const MyNum & A,int sign)
// returns a fresh exact copy of A, (the sign may differ from the sign of A)
{
    MyNumPtr B=New(A.NumTermsAllocated(),sign);
    B->lastTerm=A.lastTerm;
    memcpy(B->term,A.term,A.NumTermsAllocated()*sizeof(ushort));
    return B;
}

static MyNumPtr NewCopyShifted(const MyNum & A,
         int sign,int shift,int extraDigit)
// returns a fresh shifted copy of A, possibly expanded by 1 term
{
   int requestedSize=1+extraDigit+A.lastTerm;// allow for extra "digit"
    MyNumPtr B=New(requestedSize,sign);
    B->lastTerm=A.lastTerm+extraDigit;
   if (shift==0)      // straight copy, set extra digit to 0
   {
       memcpy(B->term,A.term,(1+A.lastTerm)*sizeof(ushort));
       if (extraDigit) B->term[B->lastTerm]=0;
    } else         
    {               // copy shifted.
       const ushort* at=A.term;
       const ushort* lastAterm=at+A.lastTerm;
      ushort* bt=B->term;
      ulong x=(ulong(*at) << shift);
      *bt=x;
      ulong carry=HI(x);
       while (at<lastAterm)
       {
          x=carry + (ulong(*++at) << shift);
          *++bt=x;   
          carry=HI(x);
       }       
       if (extraDigit) *++bt=carry;
    }
    return B;
}

static MyNumPtr BinaryTo10k(BigNum bigNumA)
// converts a BigNum to a base-10k MyNum
{
    const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
    MyNumPtr B=New((bigNumA.lengthInDigits+3)/4,A->SignBit());
 // allocate 1 ushort for every 4 digits
    B->BinaryTo10k(*A);
    return B;
}

/******************************************************************
/
/            Layer of the arithmetic functions returning a MyNumPtr:
/
/   Functions determine the size of the result and allocate a new MyNum object 
/  accordingly.  
/
/   Then they call a method of the allocated object for most of the actual arithmetic.
/
/******************************************************************/
 
static MyNumPtr AddMyNum(const MyNum & A,
         const MyNum & B,int sign)
// returns C = A + B
{      
   ulong numTermsC=2+A.lastTerm;
   MyNumPtr C=New(numTermsC,sign);
   C->Sum(A,B);
   return C;
 }
    
static MyNumPtr SubMyNum(const MyNum & A,
         const MyNum & B,int sign)
// returns C = A - B
{      
   ulong numTermsC=1+(A.lastTerm);
   MyNumPtr C=New(numTermsC,sign);
   C->Difference(A,B);
   return C;
}

static MyNumPtr MultiplyMyNum(const MyNum & A,
         const MyNum & B,int sign)
// returns C = A * B, 
{   
   ulong numTermsC=2+A.lastTerm+B.lastTerm;
   MyNumPtr C=New(numTermsC,sign);
   if (A.lastTerm >= B.lastTerm) 
      C->Product(A,B);
   else
      C->Product(B,A);
   return C; 
}
    
static int ShiftNeeded(const MyNum & D)
// returns bit shift needed to ensure MSB of a divisor is >= kBase/2 
{
   ulong msb=D.MSB(),shift=0;
   while (msb < (kBase/2)) {msb <<= 1;shift++;}
   return shift;
}
    
static MyNumPtr DivideMyNum(const MyNum & P,
         const MyNum & D,int sign)
// returns the result of P/D, either 
// the quotient Q = P/D, 
// or the remainder R = P%D, according to the mode parameter
{   
   assert(P.lastTerm >= D.lastTerm);
   long numTermsQ=1+P.lastTerm-D.lastTerm;
   MyNumPtr Q=New(numTermsQ,sign),R;
   
   if (D.lastTerm==0)   // single "digit" divider
   {
      R=NewCopy(P,sign);
      Q->SmallQuotient(*R,D.term[0]);
   } else            // general case
   {
      int shift=ShiftNeeded(D);
      // Create shifted copies of dividend and divisor 
      R=NewCopyShifted(P,sign,shift,1);   
      MyNumPtr V=NewCopyShifted(D,sign,shift,0);
      Q->Quotient(*R,*V);
      delete [] V;
   }
   delete [] R;
   return Q;
}

static MyNumPtr ModMyNum(const MyNum & P,const MyNum & D,int sign)
// returns the result of P/D, either 
// the quotient Q = P/D, 
// or the remainder R = P%D, according to the mode parameter
{   
   assert(P.lastTerm >= D.lastTerm);
   long numTermsQ=1+P.lastTerm-D.lastTerm;
   MyNumPtr Q=New(numTermsQ,sign),R;
   
   if (D.lastTerm==0)   // single "digit" divider
   {
      R=NewCopy(P,sign);
      Q->SmallQuotient(*R,D.term[0]);
   } else            // general case
   {
      int shift=ShiftNeeded(D);
      // Create shifted copies of dividend and divisor 
      R=NewCopyShifted(P,sign,shift,1);   
      MyNumPtr V=NewCopyShifted(D,sign,shift,0);
      Q->Quotient(*R,*V);
            
      delete [] V;
      R->DeNormalize(shift);// undo bit shift
   }
   delete [] Q;
   return R; 
}


static MyNumPtr PowerMyNum(const MyNum & A,
         const ulong exponent,int sign)
// returns C = A ** B
{   
   ulong numTermsC=2+A.Log2()*exponent/16;
   MyNumPtr C=New(numTermsC,sign);
   C->Power(A,exponent);
   return C; 
}
    
static MyNumPtr SquareRootMyNum(const MyNum & A)
// returns C = sqrt(A)
// The algorithm starts with an estimate and refines the estimate successively
// according to the equation
//
//      C = (C + A/C) / 2
//
// until C no longer changes.
{   
   ulong numTermsC=(2+A.lastTerm)/2;
   MyNumPtr C=New(numTermsC,kPos);
   C->ClearMemory();
   C->SquareRootEstimate(A);
   if (A.lastTerm<=2)
      return C;
   int n=0;
   for (;n<=A.lastTerm;n++)
   {                              
      MyNumPtr D=DivideMyNum(A,*C,kPos);
      D->Normalize();
      if (0==C->Average(*D))
      {
         delete [] D;
         break;
      }
      delete [] D;            
   }               
   return C; 
}
       
static MyNumPtr AddSubtract(const MyNum & A,
         const MyNum & B,int selection)
// Addition and subtraction cases are classified according to the signs
// and relative sizes of the operands.  The resulting 24 possible cases
// can then be treated as add/subtracts of positive numbers only.
{
   switch (selection) {
case 0:case 3:   
case 8:case 11:   return AddMyNum(B,A,kPos);
case 1:case 2:   return SubMyNum(B,A,kNeg);
case 5:case 6:   return AddMyNum(B,A,kNeg);
case 4:case 7:   return SubMyNum(B,A,kPos);
case 9:case 10:case 12:case 15: return 0;
      // null-pointer represents the value 0.
case 13:case 14:
case 21:case 22:return AddMyNum(A,B,kNeg);
case 16:case 19:return AddMyNum(A,B,kPos);
case 17:case 18:return SubMyNum(A,B,kPos);
case 20:case 23:return SubMyNum(A,B,kNeg);
   };
   return 0; 
}

/******************************************************************
/
/         Utility functions returning a BigNum for special cases
/
/******************************************************************/

static BigNum Zero(int length)   
// returns the BigNum version of 0 (length = 1)
// or the error code (length = -1)
{
   BigNum result;
   result.lengthInDigits=length;
   result.bigNumData=0;
   return result;
}

static BigNum Copy(BigNum bigNumA,ulong sign)
// returns a fresh copy of bigNumA, sign may change
{
   BigNum result;
    MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   result.lengthInDigits=bigNumA.lengthInDigits;
   result.bigNumData=NewCopy(*A,sign);
   return result;
}

static BigNum SmallValue(ulong value,ulong sign)
// returns the signed BigNum version of value, 32 bits max.
{
   MyNumPtr C=New(2,sign);
   C->lastTerm=(value>0xFFFF);
   C->term[0]=value;
   C->term[1]=HI(value);
   return C->Convert2BigNum();
}

/******************************************************************
/
/                     External C Functions
/      deal with special cases, then call arithmetic if and as needed.
/
/******************************************************************/
 
BigNum NewBigNum (            /* create a BigNum */
    char sign,             /* +1 or -1 */
    char digits[],         /* digits to be made into a BigNum */
    long numDigits         /* number of digits */
)
{   
   MyNumPtr B;
    if ((numDigits==0) || (*digits == 0))   
    {
       // a leading zero will be interpreted as: the whole number == 0
       // numDigits is ignored
       B=0;
    } else
    {
       B=New((numDigits+3)/4,sign);
       B->Init(ucharPtr(digits),numDigits);
    }
    
    BigNum result;   
    result.bigNumData=B;
    result.lengthInDigits=numDigits; 
    return result; 
}
 
long /* numDigits */ BigNumToDigits( /* convert a bigNum to decimal digits */
    BigNum bigNumA,        /* bigNum to be converted to decimal digits 0-9 */
    char *sign,            /* return +1 or -1 */
    char digits[]          /* decimal digits of bigNumA */
        /* storage for digits preallocated based on bigNumA.lengthInDigits */
)
{
    const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
    if (A==0)// BigNum has the special value 0
    {
       *sign=+1;
       digits[0]=0;
       return 1;
    } else
    {
       const MyNumPtr B=BinaryTo10k(bigNumA);// temporary number 
       *sign = (B->SignBit()?-1:+1);
       long numDigits=B->k10ToDigits(ucharPtr(digits));
       delete [] B;
       return numDigits;
    }
}
 
void DisposeBigNum (     /* dispose of a BigNum */
    BigNum theBigNum       /* the BigNum to be disposed of */
)
{
    if (theBigNum.bigNumData)
    {
       delete [] theBigNum.bigNumData;
       theBigNum.bigNumData=0;
    }// else don't delete if null (BigNum had the special value of 0)
}
 
BigNum AddBigNums (      /* sum two BigNums, returning a new one */
    BigNum bigNumA,        /* return the sum A+B */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   if (A==0)
   {
      if (B)
         return Copy(bigNumB,B->SignBit());   // 0 + B = B
      else
         return Zero(1);                  // 0 + 0 = 0
   } else if (B==0)
      return Copy(bigNumA,A->SignBit());      // A + 0 = A
   
   int selection=
      8*(1+A->CompAbsolute(*B)) +
      4*A->IsNegative() +
      2*B->IsNegative() + kAddMode;
   
   MyNumPtr C=AddSubtract(*A,*B,selection);   // general case
   if (C) return C->Convert2BigNum();
   else return Zero(1);
}
 
BigNum SubtractBigNums ( /* subtract two BigNums, returning a new one */
    BigNum bigNumA,        /* return the difference A-B */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   if (A==0)
   {
      if (B)
         return Copy(bigNumB,kNeg ^ B->SignBit());// 0 - B = -B
      else
         return Zero(1);                      // 0 - 0 = 0
   } else if (B==0)
      return Copy(bigNumA,A->SignBit());          // A - 0 = A
   
   int selection=
      8*(1+A->CompAbsolute(*B)) +
      4*A->IsNegative() +
      2*B->IsNegative() + kSubMode;
   
   MyNumPtr C=AddSubtract(*A,*B,selection);       // general case
   if (C) return C->Convert2BigNum();
   else return Zero(1);
}

BigNum MultiplyBigNums ( /* multiply two BigNums, returning a new one */
    BigNum bigNumA,        /* return the product A*B */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   if ((A==0) || (B==0))      // A*B = 0 if either A or B == 0
      return Zero(1);
      
   const ulong sign =         // A*B is negative if sign(A) != sign(B)
      A->SignBit() ^ B->SignBit();
   
   MyNumPtr C=MultiplyMyNum(*A,*B,sign);   // general case
   
   return C->Convert2BigNum();
}
 
BigNum DivideBigNums (   /* divide two BigNums, returning a new one */
    BigNum bigNumA,        /* return the quotient A/B, discarding the remainder */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   if (B==0)                // A/B is an error if B==0
      return Zero(-1);
   if (A==0)                // A/B = 0 if A==0 
      return Zero(1);
      
   int comparison=A->CompAbsolute(*B);
   if (comparison < 0)         // A/B = 0 if (A < B) 
      return Zero(1);
      
   const ulong sign =         // A/B is negative if sign(A) != sign(B)
      A->SignBit() ^ B->SignBit();
   
   if (comparison == 0)      // A/B = +-1 if (A == B) 
      return SmallValue(1,sign);
      
   MyNumPtr C=DivideMyNum(*A,*B,sign);   // general case
      
   return C->Convert2BigNum();
}
 
BigNum ModBigNums (      /* divide two BigNums, returning a new one */
    BigNum bigNumA,        /* return the remainder A%B, discarding the quotient */
    BigNum bigNumB
)
{   
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   if (A==0)               // A mod B = 0 if (A==0) 
      return Zero(1);
   
   const ulong sign=A->SignBit();
      // sign(A mod B) = sign(A), sign(B) does not matter
   
   if (B==0)                // A mod B = A if (B==0)
      return Copy(bigNumA,sign);
   
   const int comparison=A->CompAbsolute(*B);
   if (comparison == 0)      // A mod B = 0 if (A==B)
      return Zero(1);
   
   MyNumPtr C=(comparison < 0)   ?      
      NewCopy(*A,sign) :      // A mod B = A if (A<B)
      ModMyNum(*A,*B,sign);   // general case
      
   return C->Convert2BigNum();
}
 
BigNum PowerBigNums (    
   /* calculate one Bignum to the power of another, returning a new one */
    BigNum bigNumA,        /* return A raised to the power B */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
   // special cases:
   if (A==0) return Zero(1);            // 0 ** N = 0 
   if (B==0) return SmallValue(1,kPos);   // A ** 0 = 1

   //  Need to take care of the sign:
   //   sign of result is negative only if A is negative, and B is odd   
   const ulong sign=(A->SignBit() & (B->IsOdd()<<31));
   
   if (A->IsSmall() && (A->Ulong()==1))
      return SmallValue(1,sign);         // +1 ** +-B = +1
                                 // -1 ** +-B = +1 (B even) or -1 (B odd)
   
   // all other cases of B negative:
   if (B->IsNegative())                // A ** -B = 1/(A ** B)
      return Zero(1);                  // A ** -B is <1, hence truncated to 0
   
   if (!B->IsSmall())   // this means, we have an exponent that is too large
      return Zero(-1);// large powers are not supported (too much memory implied)
   
   const ulong exponent=B->Ulong();
   MyNumPtr C=PowerMyNum(*A,exponent,sign);// general case
   
   return C->Convert2BigNum();
}
 
BigNum SqrtBigNum (      /* find the sqrt of a BigNum, returning a new one */
    BigNum bigNumA         /* return the square root of A */
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   
   if (A==0) return Zero(1);            // sqrt(0) = 0
   if (A->IsNegative()) return Zero(-1);   // sqrt(-x) is caller's error
   
   if (A->IsSmall())                  // sqrt(small value) handled directly
   {
      ulong x=A->Ulong();
      return SmallValue(sqrt(x),kPos);
   }
   
   MyNumPtr C=SquareRootMyNum(*A);         // general case
      
   return C->Convert2BigNum();
}

// Additional function, provided for testing only:
int CompareBigNum(
    BigNum bigNumA,      /* returns + 0 - for bigNumA > == < bigNumB */
    BigNum bigNumB
)
{
   const MyNumPtr A=MyNumPtr(bigNumA.bigNumData);
   const MyNumPtr B=MyNumPtr(bigNumB.bigNumData);
   
    int result=A->CompAbsolute(*B);
    if (result==0)
       return 0;
    else
    {
       ulong asign=A->SignBit();
       ulong bsign=B->SignBit();
       if (asign==bsign)
       {
          if (asign)// both negative
          return -result;
          else return result;
       } else
       {
          if (asign) // a only negative
             return -1;
          else return 1;   
       }
   }
}

Bigmult.cp

#include <string.h>
#include "BigMult.h"

/******************************************************************
/
/   Fast multiple precision multiplier
/   Copyright © 2000, Ernst Munter, Kanata, ON, Canada.
/
/
/   Inspired by "How fast can we multiply" in The Art of Computer Programming
/   by Donald Knuth (vol.2, page 295): Split and combine terms to reduce
/   running time from order N-squared to N**(lg 3) for large BigNums.
/
/   The routine Multiply() calls itself recursively until the size of the
/   factors is reduced to a range below 240 bits. 
/
/   A seperate unrolled function takes care of the final matrix multiplication
/   for each size (minimum size is 8 by 8 terms).
/
/******************************************************************/

#define HI(chunk) ((chunk)>>16)
#define LO(chunk) ((chunk)&0xFFFF)


// We try to work the final multiplication within registers.  To conserve registers,
// pairs of short factors, e.g. U[i] and U[i+1] are read into a single long register. 
// Note: this technique was not implemented to be portable to little-endian CPUs.

// The strategy works perfectly for N=8 and N=9.  For N>9, the compiler (CW-5.3) 
// fails to accomplish this, and does save some values on the stack.

// For N >= 12, a more straight forward technique is actually faster: we read the 
// factors U[i] and V[j] from memory (cache) directly, each time they are used.

#define U0 (HI(U01))
#define U1 (LO(U01))
#define U2 (HI(U23))
#define U3 (LO(U23))
#define U4 (HI(U45))
#define U5 (LO(U45))
#define U6 (HI(U67))
#define U7 (LO(U67))
#define U8 (HI(U89))
#define U9 (LO(U89))
#define U10 (HI(U1011))
#define U11 (LO(U1011))
#define U12 (HI(U1213))
#define U13 (LO(U1213))
#define U14 (HI(U1415))
#define U15 (LO(U1415))

#define V0 (HI(V01))
#define V1 (LO(V01))
#define V2 (HI(V23))
#define V3 (LO(V23))
#define V4 (HI(V45))
#define V5 (LO(V45))
#define V6 (HI(V67))
#define V7 (LO(V67))
#define V8 (HI(V89))
#define V9 (LO(V89))
#define V10 (HI(V1011))
#define V11 (LO(V1011))
#define V12 (HI(V1213))
#define V13 (LO(V1213))
#define V14 (HI(V1415))
#define V15 (LO(V1415))
#define F(a,b) {x=LO(y) + (U##a) * (V##b);   y=HI(y)+HI(x);}
#define G(a,b) {x=LO(x) + (U##a) * (V##b);   y+=HI(x);}

static void FinalMultiply8(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong U01=*((ulong*)(U+0));   
   ulong U23=*((ulong*)(U+2));   
   ulong U45=*((ulong*)(U+4));   
   ulong U67=*((ulong*)(U+6));      
   
   ulong V01=*((ulong*)(V+0));   
   ulong V23=*((ulong*)(V+2));   
   ulong V45=*((ulong*)(V+4));   
   ulong V67=*((ulong*)(V+6));   
         
   ulong x=U0*V0,   y=HI(x);   
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
            P[7]=x;
   F(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); P[8]=x;
   F(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); P[9]=x;
   F(3,7); G(4,6); G(5,5); G(6,4); G(7,3); P[10]=x;
   F(4,7); G(5,6); G(6,5); G(7,4); P[11]=x;
   F(5,7); G(6,6); G(7,5); P[12]=x;
   F(6,7); G(7,6); P[13]=x;
   F(7,7); P[14]=x;
   P[15]=y;
}

static void FinalMultiply9(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong U01=*((ulong*)(U+0));   
   ulong U23=*((ulong*)(U+2));   
   ulong U45=*((ulong*)(U+4));   
   ulong U67=*((ulong*)(U+6));   
   ulong U89=*((ulong*)(U+8));      
   
   ulong V01=*((ulong*)(V+0));   
   ulong V23=*((ulong*)(V+2));   
   ulong V45=*((ulong*)(V+4));   
   ulong V67=*((ulong*)(V+6));   
   ulong V89=*((ulong*)(V+8));   
         
   ulong x=U0*V0,   y=HI(x);   
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); G(8,1); 
         P[9]=x;
   F(2,8); G(3,7); G(4,6); G(5,5); G(6,4); G(7,3); G(8,2); 
         P[10]=x;
   F(3,8); G(4,7); G(5,6); G(6,5); G(7,4); G(8,3); P[11]=x;
   F(4,8); G(5,7); G(6,6); G(7,5); G(8,4); P[12]=x;
   F(5,8); G(6,7); G(7,6); G(8,5); P[13]=x;
   F(6,8); G(7,7); G(8,6); P[14]=x;
   F(7,8); G(8,7); P[15]=x;
   F(8,8); P[16]=x;
   P[17]=y;
}

static void FinalMultiply10(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong U01=*((ulong*)(U+0));   
   ulong U23=*((ulong*)(U+2));   
   ulong U45=*((ulong*)(U+4));   
   ulong U67=*((ulong*)(U+6));   
   ulong U89=*((ulong*)(U+8));      
   
   ulong V01=*((ulong*)(V+0));   
   ulong V23=*((ulong*)(V+2));   
   ulong V45=*((ulong*)(V+4));   
   ulong V67=*((ulong*)(V+6));   
   ulong V89=*((ulong*)(V+8));   
         
   ulong x=U0*V0,   y=HI(x);   
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); G(7,3); G(8,2); 
         G(9,1); P[10]=x;
   F(2,9); G(3,8); G(4,7); G(5,6); G(6,5); G(7,4); G(8,3); G(9,2); 
         P[11]=x;
   F(3,9); G(4,8); G(5,7); G(6,6); G(7,5); G(8,4); G(9,3); 
         P[12]=x;
   F(4,9); G(5,8); G(6,7); G(7,6); G(8,5); G(9,4); P[13]=x;
   F(5,9); G(6,8); G(7,7); G(8,6); G(9,5); P[14]=x;
   F(6,9); G(7,8); G(8,7); G(9,6); P[15]=x;
   F(7,9); G(8,8); G(9,7); P[16]=x;
   F(8,9); G(9,8); P[17]=x;
   F(9,9); P[18]=x;
   P[19]=y;
}

static void FinalMultiply11(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong U01=*((ulong*)(U+0));   
   ulong U23=*((ulong*)(U+2));   
   ulong U45=*((ulong*)(U+4));   
   ulong U67=*((ulong*)(U+6));   
   ulong U89=*((ulong*)(U+8));   
   ulong U1011=*((ulong*)(U+10));      
   
   ulong V01=*((ulong*)(V+0));   
   ulong V23=*((ulong*)(V+2));   
   ulong V45=*((ulong*)(V+4));   
   ulong V67=*((ulong*)(V+6));   
   ulong V89=*((ulong*)(V+8));   
   ulong V1011=*((ulong*)(V+10));   
         
   ulong x=U0*V0,   y=HI(x);   
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(0,10); G(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); 
         G(7,3); G(8,2); G(9,1); G(10,0); P[10]=x;
   F(1,10); G(2,9); G(3,8); G(4,7); G(5,6); G(6,5); G(7,4); 
         G(8,3); G(9,2); G(10,1); P[11]=x;
   F(2,10); G(3,9); G(4,8); G(5,7); G(6,6); G(7,5); G(8,4); 
         G(9,3); G(10,2); P[12]=x;
   F(3,10); G(4,9); G(5,8); G(6,7); G(7,6); G(8,5); G(9,4); 
         G(10,3); P[13]=x;
   F(4,10); G(5,9); G(6,8); G(7,7); G(8,6); G(9,5); G(10,4);
          P[14]=x;
   F(5,10); G(6,9); G(7,8); G(8,7); G(9,6); G(10,5); P[15]=x;
   F(6,10); G(7,9); G(8,8); G(9,7); G(10,6); P[16]=x;
   F(7,10); G(8,9); G(9,8); G(10,7); P[17]=x;
   F(8,10); G(9,9); G(10,8); P[18]=x;
   F(9,10); G(10,9); P[19]=x;
   F(10,10); P[20]=x;
   P[21]=y;
}

// Macros redefined to read factors directly from memory
#undef F
#undef G
#define F(a,b) {x=LO(y) + (U[a]) * (V[b]);   y=HI(y)+HI(x);}
#define G(a,b) {x=LO(x) + (U[a]) * (V[b]);   y+=HI(x);}

static void FinalMultiply12(const ushort* U,
         const ushort* V,ushort* P)
{      
   ulong x=U[0]*V[0],   y=HI(x);   
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(0,10); G(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); 
         G(7,3); G(8,2); G(9,1); G(10,0); P[10]=x;
   F(0,11); G(1,10); G(2,9); G(3,8); G(4,7); G(5,6); G(6,5); 
         G(7,4); G(8,3); G(9,2); G(10,1); G(11,0); P[11]=x;
   F(1,11); G(2,10); G(3,9); G(4,8); G(5,7); G(6,6); G(7,5); 
         G(8,4); G(9,3); G(10,2); G(11,1); P[12]=x;
   F(2,11); G(3,10); G(4,9); G(5,8); G(6,7); G(7,6); G(8,5); 
         G(9,4); G(10,3); G(11,2); P[13]=x;
   F(3,11); G(4,10); G(5,9); G(6,8); G(7,7); G(8,6); G(9,5); 
         G(10,4); G(11,3); P[14]=x;
   F(4,11); G(5,10); G(6,9); G(7,8); G(8,7); G(9,6); G(10,5); 
         G(11,4); P[15]=x;
   F(5,11); G(6,10); G(7,9); G(8,8); G(9,7); G(10,6); G(11,5); 
         P[16]=x;
   F(6,11); G(7,10); G(8,9); G(9,8); G(10,7); G(11,6); P[17]=x;
   F(7,11); G(8,10); G(9,9); G(10,8); G(11,7); P[18]=x;
   F(8,11); G(9,10); G(10,9); G(11,8); P[19]=x;
   F(9,11); G(10,10); G(11,9); P[20]=x;
   F(10,11); G(11,10); P[21]=x;
   F(11,11); P[22]=x;
   P[23]=y;
}

static void FinalMultiply13(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong x=U[0]*V[0],   y=HI(x);
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(0,10); G(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); 
         G(7,3); G(8,2); G(9,1); G(10,0); P[10]=x;
   F(0,11); G(1,10); G(2,9); G(3,8); G(4,7); G(5,6); G(6,5); 
         G(7,4); G(8,3); G(9,2); G(10,1); G(11,0); P[11]=x;   
   F(0,12); G(1,11); G(2,10); G(3,9); G(4,8); G(5,7); G(6,6); 
         G(7,5); G(8,4); G(9,3); G(10,2); G(11,1); G(12,0); P[12]=x;
   F(1,12); G(2,11); G(3,10); G(4,9); G(5,8); G(6,7); G(7,6); 
         G(8,5); G(9,4); G(10,3); G(11,2); G(12,1); P[13]=x;
   F(2,12); G(3,11); G(4,10); G(5,9); G(6,8); G(7,7); G(8,6); 
         G(9,5); G(10,4); G(11,3); G(12,2); P[14]=x;
   F(3,12); G(4,11); G(5,10); G(6,9); G(7,8); G(8,7); G(9,6); 
         G(10,5); G(11,4); G(12,3); P[15]=x;
   F(4,12); G(5,11); G(6,10); G(7,9); G(8,8); G(9,7); G(10,6); 
         G(11,5); G(12,4); P[16]=x;
   F(5,12); G(6,11); G(7,10); G(8,9); G(9,8); G(10,7); G(11,6); 
         G(12,5); P[17]=x;
   F(6,12); G(7,11); G(8,10); G(9,9); G(10,8); G(11,7); G(12,6); 
         P[18]=x;
   F(7,12); G(8,11); G(9,10); G(10,9); G(11,8); G(12,7); P[19]=x;
   F(8,12); G(9,11); G(10,10); G(11,9); G(12,8); P[20]=x;
   F(9,12); G(10,11); G(11,10); G(12,9); P[21]=x;
   F(10,12); G(11,11); G(12,10); P[22]=x;
   F(11,12); G(12,11); P[23]=x;
   F(12,12); P[24]=x;
   P[25]=y;
}

static void FinalMultiply14(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong x=U[0]*V[0],   y=HI(x);
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(0,10); G(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); 
         G(7,3); G(8,2); G(9,1); G(10,0); P[10]=x;
   F(0,11); G(1,10); G(2,9); G(3,8); G(4,7); G(5,6); G(6,5); 
         G(7,4); G(8,3); G(9,2); G(10,1); G(11,0); P[11]=x;   
   F(0,12); G(1,11); G(2,10); G(3,9); G(4,8); G(5,7); G(6,6); 
         G(7,5); G(8,4); G(9,3); G(10,2); G(11,1); G(12,0); P[12]=x;
   F(0,13); G(1,12); G(2,11); G(3,10); G(4,9); G(5,8); G(6,7); 
         G(7,6); G(8,5); G(9,4); G(10,3); G(11,2); G(12,1); G(13,0); 
         P[13]=x;
   F(1,13); G(2,12); G(3,11); G(4,10); G(5,9); G(6,8); G(7,7); 
         G(8,6); G(9,5); G(10,4); G(11,3); G(12,2); G(13,1); P[14]=x;
   F(2,13); G(3,12); G(4,11); G(5,10); G(6,9); G(7,8); G(8,7); 
         G(9,6); G(10,5); G(11,4); G(12,3); G(13,2); P[15]=x;
   F(3,13); G(4,12); G(5,11); G(6,10); G(7,9); G(8,8); G(9,7); 
         G(10,6); G(11,5); G(12,4); G(13,3); P[16]=x;
   F(4,13); G(5,12); G(6,11); G(7,10); G(8,9); G(9,8); G(10,7); 
         G(11,6); G(12,5); G(13,4); P[17]=x;
   F(5,13); G(6,12); G(7,11); G(8,10); G(9,9); G(10,8); G(11,7); 
         G(12,6); G(13,5); P[18]=x;
   F(6,13); G(7,12); G(8,11); G(9,10); G(10,9); G(11,8); G(12,7); 
         G(13,6); P[19]=x;
   F(7,13); G(8,12); G(9,11); G(10,10); G(11,9); G(12,8); G(13,7); 
         P[20]=x;
   F(8,13); G(9,12); G(10,11); G(11,10); G(12,9); G(13,8); 
         P[21]=x;
   F(9,13); G(10,12); G(11,11); G(12,10); G(13,9); P[22]=x;
   F(10,13); G(11,12); G(12,11); G(13,10); P[23]=x;
   F(11,13); G(12,12); G(13,11); P[24]=x;
   F(12,13); G(13,12); P[25]=x;
   F(13,13); P[26]=x;
   P[27]=y;
}

static void FinalMultiply15(const ushort* U,
         const ushort* V,ushort* P)
{
   ulong x=U[0]*V[0],   y=HI(x);
   P[0]=x;
   F(0,1); G(1,0);   P[1]=x;
   F(0,2); G(1,1); G(2,0); P[2]=x;
   F(0,3); G(1,2); G(2,1); G(3,0); P[3]=x;
   F(0,4); G(1,3); G(2,2); G(3,1); G(4,0); P[4]=x;
   F(0,5); G(1,4); G(2,3); G(3,2); G(4,1); G(5,0); P[5]=x;
   F(0,6); G(1,5); G(2,4); G(3,3); G(4,2); G(5,1); G(6,0); P[6]=x;
   F(0,7); G(1,6); G(2,5); G(3,4); G(4,3); G(5,2); G(6,1); G(7,0); 
         P[7]=x;   
   F(0,8); G(1,7); G(2,6); G(3,5); G(4,4); G(5,3); G(6,2); G(7,1); 
         G(8,0); P[8]=x;
   F(0,9); G(1,8); G(2,7); G(3,6); G(4,5); G(5,4); G(6,3); G(7,2); 
         G(8,1); G(9,0); P[9]=x;
   F(0,10); G(1,9); G(2,8); G(3,7); G(4,6); G(5,5); G(6,4); 
         G(7,3); G(8,2); G(9,1); G(10,0); P[10]=x;
   F(0,11); G(1,10); G(2,9); G(3,8); G(4,7); G(5,6); G(6,5); 
         G(7,4); G(8,3); G(9,2); G(10,1); G(11,0); P[11]=x;   
   F(0,12); G(1,11); G(2,10); G(3,9); G(4,8); G(5,7); G(6,6); 
         G(7,5); G(8,4); G(9,3); G(10,2); G(11,1); G(12,0); P[12]=x;
   F(0,13); G(1,12); G(2,11); G(3,10); G(4,9); G(5,8); G(6,7); 
         G(7,6); G(8,5); G(9,4); G(10,3); G(11,2); G(12,1); G(13,0);
         P[13]=x;
   F(0,14); G(1,13); G(2,12); G(3,11); G(4,10); G(5,9); G(6,8); 
         G(7,7); G(8,6); G(9,5); G(10,4); G(11,3); G(12,2); G(13,1); 
            G(14,0); P[14]=x;
   F(1,14); G(2,13); G(3,12); G(4,11); G(5,10); G(6,9); G(7,8); 
         G(8,7); G(9,6); G(10,5); G(11,4); G(12,3); G(13,2); G(14,1); 
            P[15]=x;
   F(2,14); G(3,13); G(4,12); G(5,11); G(6,10); G(7,9); G(8,8); 
         G(9,7); G(10,6); G(11,5); G(12,4); G(13,3); G(14,2); 
            P[16]=x;
   F(3,14); G(4,13); G(5,12); G(6,11); G(7,10); G(8,9); G(9,8); 
         G(10,7); G(11,6); G(12,5); G(13,4); G(14,3); P[17]=x;
   F(4,14); G(5,13); G(6,12); G(7,11); G(8,10); G(9,9); G(10,8); 
         G(11,7); G(12,6); G(13,5); G(14,4); P[18]=x;
   F(5,14); G(6,13); G(7,12); G(8,11); G(9,10); G(10,9); G(11,8); 
         G(12,7); G(13,6); G(14,5); P[19]=x;
   F(6,14); G(7,13); G(8,12); G(9,11); G(10,10); G(11,9); G(12,8); 
         G(13,7); G(14,6); P[20]=x;
   F(7,14); G(8,13); G(9,12); G(10,11); G(11,10); G(12,9); 
         G(13,8); G(14,7); P[21]=x;
   F(8,14); G(9,13); G(10,12); G(11,11); G(12,10); G(13,9); 
         G(14,8); P[22]=x;
   F(9,14); G(10,13); G(11,12); G(12,11); G(13,10); G(14,9); 
         P[23]=x;
   F(10,14); G(11,13); G(12,12); G(13,11); G(14,10); P[24]=x;
   F(11,14); G(12,13); G(13,12); G(14,11); P[25]=x;
   F(12,14); G(13,13); G(14,12); P[26]=x;
   F(13,14); G(14,13); P[27]=x;
   F(14,14); P[28]=x;
   P[29]=y;
}

static int Subtract(const ushort* A,const ushort* B,
         ushort* D,int N)
// returns 1 if A < B
// computes difference of |A - B| into space at D
{
   int sign=0;
   for (int i=N;i>0;)
   {
      i-;
      if (A[i] < B[i]) 
      {
         sign=1;break;
      } else if (A[i] > B[i])
      {
         sign=0;break;
      }
   }
   if (sign) {const ushort* T=A;A=B;B=T;}
       
    long L=*A-*B,borrow=HI(L);
    *D=L;
   for (int i=1;i<N;i++)
   {
      L = *++A - *++B + borrow;
      borrow=HI(L);
      *++D = L;
   }
   return sign;
}

static void Accumulate(ushort* dest,const ushort* A,int N)
// accumulates the sum of dest + A into the space at dest
{
    long L=*dest+*A,carry=HI(L);
    *dest=L;
   for (int i=1;i<N;i++)
   {
      L = *++dest + *++A + carry;
      carry=HI(L);
      *dest = L;
   }
   while (carry)
   {
      L = *++dest + carry;
      carry=HI(L);
      *dest = L;
   }
}

static void NegAccumulate(ushort* dest,const ushort* A,int N)
// accumulates the difference of dest - A into the space at dest
{
    long L=*dest-*A,borrow=HI(L);
    *dest=L;
   for (int i=1;i<N;i++)
   {
      L = *++dest - *++A + borrow;
      borrow=HI(L);
      *dest = L;
   }
   while (borrow)
   {
      L = *++dest + borrow;
      borrow=HI(L);
      *dest = L;
   }
}

typedef void FinalMultiplyFun(const ushort* ,
      const ushort*, ushort*);

static void Dummy(const ushort*,const ushort*,ushort*){}

static FinalMultiplyFun *FinalMultiply[16] = {
   Dummy,Dummy,Dummy,Dummy,Dummy,Dummy,Dummy,Dummy,
   FinalMultiply8,
   FinalMultiply9,
   FinalMultiply10,
   FinalMultiply11,
   FinalMultiply12,
   FinalMultiply13,
   FinalMultiply14,
   FinalMultiply15
};

void Multiply(const ushort* U,const ushort* V,ushort* P,const int N,const long nv)
// Recursive function to multiply arrays U and V, where V may have many leading 0s
// N is the size of U and V.  P is expected to have size 2N
// nv is the actually used size of V.  
// If nv is less than 1/2N, computation of 0-value higher terms is avoided
{
   if (N<=15)   // actually, N should always be >= 8
   {
      FinalMultiply[N](U,V,P);
      return;
   }
      
    memset(P,0,2*N*sizeof(ushort));
   if (nv>N/2)   
   {                                 
      const ushort* u0=U;                        
      const ushort* u1=U+N/2;                  
      const ushort* v0=V;                     
      const ushort* v1=V+N/2;                     
      ushort* p0=P;
      ushort* p1=P+N/2;
      ushort* p2=P+N;
      ushort* temp0=P+2*N;
      ushort* temp1=temp0+N/2;
      ushort* temp2=temp0+N;   
// Main level of recursion;  each level results in 3 calls to Multiply at a deeper level.         
// the idea is:
// P = U * V = (u1,u0) * (v1,v0) 
//       where U is split into high order terms (u1) and low order terms (u0):
//       P = u1*v1, u1*v0 + u0*v1, u0*v0.
// The terms can be rearranged, so that only 3 instead of 4 multiplications are needed:
//      P = u1*v1, u1*v1 + (u1-u0) * (v0-v1) + u0*v0, u0*v0
// The downside is that products must be computed into temporary space and copied
//       but the upside is that, with recursion, much fewer operations are carried out.
//      Running time is proportional to N ** 1.58 for square problems 
//          (size(U)=size(V)=N)
      Multiply(u0,v0,temp2,N/2,nv);         
   
      Accumulate(p0,temp2,N);            
      Accumulate(p1,temp2,N);   
      
      Multiply(u1,v1,temp2,N/2,nv);
      Accumulate(p1,temp2,N);
      Accumulate(p2,temp2,N);
      
      int negative=Subtract(u1,u0,temp0,N/2);   
      negative ^= Subtract(v0,v1,temp1,N/2);
      
      Multiply(temp0,temp1,temp2,N/2,nv);
      
      if (negative) 
         NegAccumulate(p1,temp2,N);
      else
         Accumulate(p1,temp2,N);

   } else   // v1=0: the terms involving v1 are not needed
   {
      const ushort* u0=U;      // pointers into the allocated space
      const ushort* u1=U+N/2;
      const ushort* v0=V;
      ushort* p0=P;
      ushort* p1=P+N/2;
      ushort* temp0=P+2*N;
// Optimizes the case of asymmetrical problems (V much smaller than U).
// The first or first few recursions may need only 2 multiplications, but deeper levels
// may not be asymmetrical and the main scheme (above) will kick in then.
      Multiply(u0,v0,temp0,N/2,nv);
      Accumulate(p0,temp0,N);
      
      Multiply(u1,v0,temp0,N/2,nv);
      Accumulate(p1,temp0,N);
   }
}

ulong RoundUp(ulong N)
// Rounds N up to a value X * 2**Y, where X <= kMultiplyThreshold.
// This ensures that terms can be split into equal halves recursively
// until there are <= kMultiplyThreshold terms.
{
   ulong mask=-1,m2;
   while ((m2=mask>>1)>N) mask=m2;
   mask -= mask>>kMTbits;
   if ((N & mask) < N)
      N = (N & mask) + (mask & (mask>>(kMTbits-1)));
   return N;  
}

BigMult.h

typedef unsigned short ushort;
typedef unsigned long ulong;

// Constants and prototypes for the faster multiplier.
enum {
   kMTbits            = 4, 
   kMultiplyThreshold    = (1<<kMTbits)-1 
};
void  Multiply(const ushort* U,const ushort* V,ushort* P,const int N,const long nv);
ulong RoundUp(ulong N);
 
AAPL
$99.76
Apple Inc.
+2.09
MSFT
$44.08
Microsoft Corpora
+0.45
GOOG
$520.84
Google Inc.
+9.67

MacTech Search:
Community Search:

Software Updates via MacUpdate

TechTool Pro 7.0.5 - Hard drive and syst...
TechTool Pro is now 7, and this is the most advanced version of the acclaimed Macintosh troubleshooting utility created in its 20-year history. Micromat has redeveloped TechTool Pro 7 to be fully 64... Read more
PDFKey Pro 4.0.2 - Edit and print passwo...
PDFKey Pro can unlock PDF documents protected for printing and copying when you've forgotten your password. It can now also protect your PDF files with a password to prevent unauthorized access and/... Read more
Yasu 2.9.1 - System maintenance app; per...
Yasu was originally created with System Administrators who service large groups of workstations in mind, Yasu (Yet Another System Utility) was made to do a specific group of maintenance tasks... Read more
Hazel 3.3 - Create rules for organizing...
Hazel is your personal housekeeper, organizing and cleaning folders based on rules you define. Hazel can also manage your trash and uninstall your applications. Organize your files using a... Read more
Autopano Giga 3.7 - Stitch multiple imag...
Autopano Giga allows you to stitch 2, 20, or 2,000 images. Version 3.0 integrates impressive new features that will definitely make you adopt Autopano Pro or Autopano Giga: Choose between 9... Read more
MenuMeters 1.8 - CPU, memory, disk, and...
MenuMeters is a set of CPU, memory, disk, and network monitoring tools for Mac OS X. Although there are numerous other programs which do the same thing, none had quite the feature set I was looking... Read more
Coda 2.5 - One-window Web development su...
Coda is a powerful Web editor that puts everything in one place. An editor. Terminal. CSS. Files. With Coda 2, we went beyond expectations. With loads of new, much-requested features, a few... Read more
Arq 4.6.1 - Online backup to Google Driv...
Arq is super-easy online backup for the Mac. Back up to your own Google Drive storage (15GB free storage), your own Amazon Glacier ($.01/GB per month storage) or S3, or any SFTP server. Arq backs up... Read more
Airfoil 4.8.10 - Send audio from any app...
Airfoil allows you to send any audio to AirPort Express units, Apple TVs, and even other Macs and PCs, all in sync! It's your audio - everywhere. With Airfoil you can take audio from any... Read more
Apple iMovie 10.0.6 - Edit personal vide...
With an all-new design, Apple iMovie lets you enjoy your videos like never before. Browse your clips more easily, instantly share your favorite moments, and create beautiful HD movies and Hollywood-... Read more

Latest Forum Discussions

See All

Angry Birds Transformers Review
Angry Birds Transformers Review By Jennifer Allen on October 20th, 2014 Our Rating: :: TRANSFORMED BIRDSUniversal App - Designed for iPhone and iPad Transformed in a way you wouldn’t expect, Angry Birds Transformers is a quite... | Read more »
GAMEVIL Announces the Upcoming Launch of...
GAMEVIL Announces the Upcoming Launch of Mark of the Dragon Posted by Jessica Fisher on October 20th, 2014 [ permalink ] Mark of the Dragon, by GAMEVIL, put | Read more »
Find Free Food on Campus with Ypay
Find Free Food on Campus with Ypay Posted by Jessica Fisher on October 20th, 2014 [ permalink ] iPhone App - Designed for the iPhone, compatible with the iPad | Read more »
Strung Along Review
Strung Along Review By Jordan Minor on October 20th, 2014 Our Rating: :: GOT NO STRINGSUniversal App - Designed for iPhone and iPad A cool gimmick and a great art style keep Strung Along from completely falling apart.   | Read more »
P2P file transferring app Send Anywhere...
File sharing services like Dropbox have security issues. Email attachments can be problematic when it comes to sharing large files. USB dongles don’t fit into your phone. Send Anywhere, a peer-to-peer file transferring application, solves all of... | Read more »
Zero Age Review
Zero Age Review By Jordan Minor on October 20th, 2014 Our Rating: :: MORE THAN ZEROiPad Only App - Designed for the iPad With its mind-bending puzzles and spellbinding visuals, Zero Age has it all.   | Read more »
Hay Ewe Review
Hay Ewe Review By Campbell Bird on October 20th, 2014 Our Rating: :: SAVE YOUR SHEEPLEUniversal App - Designed for iPhone and iPad Pave the way for your flock in this line drawing puzzle game from the creators of Worms.   | Read more »
My Very Hungry Caterpillar (Education)
My Very Hungry Caterpillar 1.0.0 Device: iOS Universal Category: Education Price: $3.99, Version: 1.0.0 (iTunes) Description: Care for your very own Very Hungry Caterpillar! My Very Hungry Caterpillar will captivate you as he crawls... | Read more »
Dungeon Dick (Games)
Dungeon Dick 1.1 Device: iOS Universal Category: Games Price: $.99, Version: 1.1 (iTunes) Description: Dungeon Dick is a fantasy adventure where you must discover the wicked plot to destroy the lands . 'Fling' at your foes and land... | Read more »
Here’s How the Apple Watch Could Transfo...
With the Apple Watch’s generic release date of, “early 2015” hovering on the horizon, it’s only a matter of time before gamers begin to ask “What’s in it for us?” The obvious choice would be to place entire games directly on the face of the watch,... | Read more »

Price Scanner via MacPrices.net

2013 15-inch 2.0GHz Retina MacBook Pro availa...
B&H Photo has leftover previous-generation 15″ 2.0GHz Retina MacBook Pros now available for $1599 including free shipping plus NY sales tax only. Their price is $400 off original MSRP. B&H... Read more
Updated iPad Prices
We’ve updated our iPad Air Price Tracker and our iPad mini Price Tracker with the latest information on prices and availability from Apple and other resellers, including the new iPad Air 2 and the... Read more
Apple Pay Available to Millions of Visa Cardh...
Visa Inc. brings secure, convenient payments to iPad Air 2 and iPad mini 3as well as iPhone 6 and 6 Plus. Starting October 20th, eligible Visa cardholders in the U.S. will be able to use Apple Pay,... Read more
Textkraft Pocket – the missing TextEdit for i...
infovole GmbH has announced the release and immediate availability of Textkraft Pocket 1.0, a professional text editor and note taking app for Apple’s iPhone. In March 2014 rumors were all about... Read more
C Spire to offer iPad Air 2 and iPad mini 3,...
C Spire on Friday announced that it will offer iPad Air 2 and iPad mini 3, both with Wi-Fi + Cellular, on its 4G+ LTE network in the coming weeks. C Spire will offer the new iPads with a range of... Read more
Belkin Announces Full Line of Keyboards and C...
Belkin International has unveiled a new lineup of keyboard cases and accessories for Apple’s newest iPads, featuring three QODE keyboards and a collection of thin, lightweight folios for both the... Read more
Verizon offers new iPad Air 2 preorders for $...
Verizon Wireless is accepting preorders for the new iPad Air 2, cellular models, for $100 off MSRP with a 2-year service agreement: - 16GB iPad Air 2 WiFi + Cellular: $529.99 - 64GB iPad Air 2 WiFi... Read more
Price drops on refurbished Mac minis, now ava...
The Apple Store has dropped prices on Apple Certified Refurbished previous-generation Mac minis, with models now available starting at $419. Apple’s one-year warranty is included with each mini, and... Read more
Apple refurbished 2014 MacBook Airs available...
The Apple Store has Apple Certified Refurbished 2014 MacBook Airs available for up to $180 off the cost of new models. An Apple one-year warranty is included with each MacBook, and shipping is free.... Read more
Refurbished 2013 MacBook Pros available for u...
The Apple Store has Apple Certified Refurbished 13″ and 15″ MacBook Pros available starting at $929. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.5GHz MacBook Pros (4GB RAM/... Read more

Jobs Board

*Apple* Retail - Multiple Positions (US) - A...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
Position Opening at *Apple* - Apple (United...
…customers purchase our products, you're the one who helps them get more out of their new Apple technology. Your day in the Apple Store is filled with a range of Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** At the Apple Store, you connect business professionals and entrepreneurs with the tools they need in order to put Apple solutions to work in their Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** The Apple Store is a retail environment like no other - uniquely focused on delivering amazing customer experiences. As an Expert, you introduce people Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** As businesses discover the power of Apple computers and mobile devices, it's your job - as a Solutions Engineer - to show them how to introduce these Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.