TweetFollow Us on Twitter

Inline Code
Volume Number:1
Issue Number:9
Column Tag:Forth Forum

"Inline Code for MacForth"

By Jörg Langowski, Chemical Engineer, Fed. Rep. of Germany, MacTutor Editorial Board

Speeding up Forth with Inline Code

When you use your computer for applications that require a lot of data shuffling and calculations, work with large arrays and matrices and so on, you tend to become a little paranoid about speed. Although Forth code is very compact through its threaded structure, and word execution (i.e. subroutine calling) is reasonably well optimized in MacForth (see MacTutor V1 No2), I have always felt uncomfortable with the overhead that goes into the execution of a simple word like DROP, whose 'active part' consists of one 16-bit word of machine code.

Just as a reminder: when the Forth em executes the token for DROP in a definition, it calls a subroutine that looks like this:

DROP  ADDQ.L#4,A7
 JMP  (A4)

So it is a simple 4-byte increment of the stack pointer that does the DROP job. But, then the next token has to be fetched and executed by jumping to the NEXT routine, whose address is contained in A4, the base pointer. This makes for a several hundred precent overhead, as compared to the increment itself. This overhead is not so dramatic with other words, but it is still there: and all in all the Sieve benchmark needs 21 seconds to run in MacForth, compare this to 9 seconds in compiled C (Consulair).

How can we speed up the code? After all, we have complete control over what goes into the dictionary and could put the machine code that we need right in there, no need for time-expensive subroutine calling. This is what the Forth 2.0 assembler enables you to do. However, if you create a piece of code in Forth assembler, it tends to look much more cryptic than 'normal' assembler, which after all is readable with adequate documentation.

It would be much nicer if we had a means to create the assembly code that corresponds to a DROP by writing a similar word, such as %DROP: something like a macro. No need to worry about which registers to use, and you could use 'almost normal' Forth code for writing your routine.

It shouldn't be that difficult to persuade the Forth system to execute machine code that is embedded in a definition. Every Forth word starts with at least one executable piece of machine code, trap calls for Forth-defined words such as colon definitions and 'real' 68000 code for machine code definitions. However, this gives you either machine code or Forth, not both. Our goal is to define words that allow switching between 68000 and Forth code within one definition. Similar words do exist in the Forth 2.0 assembler, but it lacks a set of macros that allow you to write inline Forth code instead of assembly code. Furthermore, you cannot define control structures that easily.

Assume we have Forth code that looks like this:

 ...
 <token 1>
 <token 2>
 <token X>
 <machine instruction 1>
 <machine instruction 2>
 ...

etc. This sequence of instruction will get executed just fine if <token X> is a word that transfers execution to the word just following. We'll call this word >CODE and define it as follows:

: >CODE 
    here 2+ make.token w, [compile] [ ;
    immediate

This word, which is executed during compilation, takes the next free address in the dictionary, adds 2 (this is where execution of the machine code is to start) and compiles this address as a token into the dictionary. Since a token just tells the Forth interpreter 'jump to the address that I refer to', machine code execution will start at the address following >CODE.

This is what happens at execution time. At compilation time, the words following >CODE in the input stream are executed, not compiled (this is what the [COMPILE] [ does). Therefore, if the words following >CODE are macros that stuff assembly code into the dictionary, you have your inline code right there.

We'll get to those macros in a minute. First, what remains is the problem how to get out of the machine code. You might recall that all machine-level Forth definitions finish with a

 JMP  (A4)

and the NEXT routine, pointed to by A4, gets the next token from the Forth code. The pointer to the next token is in register A3. Unfortunately, after we executed >CODE, A3 remained unchanged and still points to the word following the >CODE token. Which is 68000 code and certainly nothing that the interpreter will swallow. Therefore we have to reset A3 before we jump back into the Forth interpreter. This is what the word >FORTH does:

: >FORTH 47fa0004 , 4ed4 w, [compile] ] ;

 LEA  4(PC),A3
 JMP  (A4)

Remember, when >FORTH appears in the input stream, we are still in execution mode, from the preceding >CODE (unless we mixed things up). So >FORTH gets executed when used in a definition; it assembles code that loads A3 with the address following the JMP, then executes the JMP. Then the mode is switched back to regular Forth compilation again.

Between >CODE and >FORTH we can now place our macros that generate inline machine code corresponding to Forth primitives. The code for any of the primitives is found very easily by disassembling from the original Forth system. Of course, you may define your own code, use different registers than the MacForth definitions do or optimize the code. For instance, the built-in multiplication routine is a prime candidate for removing overhead. The routine *, which calls the multiplication primitive, M*, always does a 32- by 32-bit multiply and then drops the upper 32 bits of the double precision product. Some sloppiness on the part of Creative Solutions, I presume. Of course, a direct 16- by 16-bit multiply would be much faster.

I have written the macros in hex code, so that they'll work without the assembler, in case you are using Forth 1.1. The machine code is given as a comment in the program text.

Literals

The %LIT and %WLIT macros serve as a means to put constants and addresses on the stack. They compile a long move, resp. word move instruction with the number on the stack at compilation time compiled as the data word(s) following the instruction. So the way to put the address of a variable on the stack in inline code is just to write: <variable> %LIT.

Control Structures

The goal was to speed up the Sieve benchmark (as an example). Of course, the code would be far from optimal if we still had to use the Forth control structures; they should be coded inline, too. This means we have to keep track of addresses that we want to branch to.

The program below provides two examples, %IF...%THEN...%ELSE and %DO...%LOOP. The other control structures are not included, since they weren't necessary for this particular example. But after reading through, you should be able to write your own code for that.

%IF compiles a branch which is taken when the number on top of stack is zero. This branch has a zero displacement when first compiled. At the same time, the dictionary address is pushed on the compilation time stack (HERE). When %THEN is encountered, the branch displacement is calculated and put into the correct address. Same holds for %ELSE, only that another unconditional branch is compiled that is taken at the end of the %IF part. This branch is resolved at the %THEN.

The code compiled by %DO takes the initial and final values from the stack and puts them on the return stack. During compilation, HERE is put on the stack as a reference for the backward branch taken by %LOOP. %LOOP compiles code that increments the loop counter by one and tests it against the limit; if it is still below the limit, the backward branch is taken (calculated at compilation time). %+LOOP behaves just like %LOOP, only that the increment is the number on top of stack. Note that there is one difference between %+LOOP and the usual Forth +LOOP: while the latter works with positive and negative loop increments, ours works only with positive. I did this in the interest of speed.

The Sieve Benchmark

With all these macros available we can now recode the Sieve of Erastothenes prime number benchmark into inline machine code. The changes that have to be made to the Forth code are only minor ones. At the point where the inline code is supposed to start, we insert >CODE; all Forth words thereafter are inline macros. They are distinguished from the regular Forth words by the preceding percent sign. When the inline part ends, we write >FORTH to jump back into interpreter mode.

The resulting code works (!!) and executes in 9.7 seconds, as compared to 21 seconds for the Forth code.

Inline compiler definitions ( 060585 jl )

(c) June 1985 MacTutor by J. Langowski

This code is meant as an example for speeding up time-critical Forth code through the insertion of inline machine code. The words defined here are by no means a complete Forth compiler. No attempt was made to use the same words as standard Forth and do context switching; I felt that this would have been a) more complicated and b) actually confusing, because you tend to lose track of when you are in inline mode and when in interpreted Forth mode. Therefore, all inline words are compiled into the standard Forth vocabulary and have the names of the corresponding Forth words preceded by a '%'. The only control structures are %IF...%ELSE...%THEN and %DO...%LOOP/%+LOOP, where the %+LOOP works only for positive increments. You are encouraged to build other control structures, using the same principles.

( inline assembly macros)  ( 060285 jl )
hex
: >code here 2+ make.token w, [compile] [ ;  
immediate

: >forth 47fa0004 , 4ed4 w, [compile] ] ;
{LEA  4(PC),A3 }
{JMP  (A4)} 
: %swap 202f0004 , 2f570004 , 2e80 w, ;
{MOVE.L 4(A7),D0 }
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }
 
: %drop 588f w, ; { ADDQ.L  #4,A7  }
: %dup 2f17 w, ;  { MOVE.L  (A7),-(A7) }
: %over 2f2f0004 , ; { MOVE.L 4(A7),-(A7) }

: %+! 205f201f , d190 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A0)  }

: %rot 202f0008 , 2f6f0004 , 0008 w,
       2f570004 , 2e80 w, ;
{MOVE.L 8(A7),D0 }
{MOVE.L 4(A7),8(A7)}
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }

: %+ 201fd197 , ;  
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A7)  }
  
: %- 201f9197 , ;
{MOVE.L (A7),D0  }
{SUB.L  D0,(A7)  }

: %i 2f16 w, ;     { MOVE.L   (A6),-(A7) }
: %j 2f2e0008 , ;  { MOVE.L  8(A6),-(A7) }
: %k 2f2e0010 , ;  { MOVE.L 16(A6),-(A7) }
{ %k is a word that does not exist in 
  MacForth, but is very useful to extract 
  a loop index one level further down    }

: %i+ 2017d096 , 2e80 w, ;
{MOVE.L (A7),D0  }
{ADD.L  (A6),D0  }
{MOVE.L D0,(A7)  }

: %c@ 42802057 , 10102e80 , ;
{CLR.L  D0}
{MOVE.L (A7),A0  }
{MOVE.B (A0),D0  }
{MOVE.L D0,(A7)  }
  
: %w@ 20574257 , 3f500002 , ;
{MOVE.L (A7),A0  }
{CLR.W  (A7)}
{MOVE (A0),2(A7) }

: %@ 20572e90 , ;
{MOVE.L (A7),A0  }
{MOVE.L (A0),(A7)}
  
: %c! 205f201f , 1080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE.B D0,(A0)  }

: %w! 205f201f , 3080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE D0,(A0)  }
   
: %! 205f209f , ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,(A0) }

: %>r 2d1f w, ;  { MOVE.L  (A7)+,-(A6)  }  
: %r> 2f1e w, ;  { MOVE.L  (A6)+,-(A7)  }

: %ic!  201f2056 , 1080 w, ;
{MOVE.L (A7)+,D0 }
{MOVE.L (A6),A0  }
{MOVE.B D0,(A0)  }

: %lit 2f3c w, , ;
{MOVE.L #xxxx,-(A7)}
{ where xxxx is compiled from the stack 
  into the next four bytes }
  
: %wlit 3f3c w, w, ;
{MOVE #xxxx,-(A7)}
{ and compile top of stack into next word }

: %< 4280bf8f , 6c025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BGE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %> 4280bf8f , 6f025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %= 4280bf8f , 66025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0= 42804a97 , 66025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0< 42804a97 , 6a025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BPL  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0> 42804a97 , 6f025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %and 201fc197 , ;
{MOVE.L (A7)+,D0 }
{AND.L  D0,(A7)  }
  
: %or 201f8197 , ;
{MOVE.L (A7)+,D0 }
{OR.L D0,(A7)  }

: %if 4a9f6700 , here 0 w, ;
{TST.L  (A7)+  }
{BEQ  xxxx}
{ xxxx is a 16 bit displacement that is 
  resolved by %THEN   }

: %then here over - swap w! ;
: %else 6000 w, here 0 w, swap %then ;
{BRA  xxxx}
{ resolves preceding %IF and leaves new
  empty unconditional branch to be filled
  by %THEN     }

: %do 2d2f0004 , 2d1f588f , here ;
{MOVE.L 4(A7),-(A6)}
{MOVE.L (A7)+,-(A6)}
{ADDQ.L #4,A7  }
{ leaves HERE on the stack for back branch
  by %LOOP or %+LOOP      }

: %loop 5296204e , b1886e00 , 
                 here - w, ddfc w, 8 , ;
{ADDQ.L #1,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }
{ the last instruction cleans up the return
  stack. Branch resolved in this word     }

: %+loop 201fd196 , 204e w, b1886e00 , 
                 here - w, ddfc w, 8 , ;
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }

decimal

( Eratosthenes Sieve Benchmark,
             inline code) ( 060285 jl )
 8192 constant size  
 create flags  size allot
: primes   flags  size 01 fill 
  >code 0 %lit size %lit 0 %lit
    %do  flags %lit %i+ %c@
       %if 3 %lit %i+ %i+ %dup %i+ 
             size %lit %<
         %if size %lit flags %lit %+ 
           %over %i+ flags %lit %+
           %do 0 %lit %ic! %dup %+loop
         %then %drop 1 %lit %+
       %then
    %loop >forth . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop
   1 sysbeep ;

( Eratosthenes Sieve Benchmark,
                standard version)
 8192 constant size       
 create flags  size allot
: primes flags size 01 fill 
  0  size 0  
    do  flags  i+ c@
      if  3 i+ i+ dup i+  size <  
         if  size flags +  over i+  flags +
             do  0 ic!  dup  +loop
         then  drop 1+  
       then
    loop  . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop  
   1 sysbeep ;
 
AAPL
$99.76
Apple Inc.
+2.09
MSFT
$44.08
Microsoft Corpora
+0.45
GOOG
$520.84
Google Inc.
+9.67

MacTech Search:
Community Search:

Software Updates via MacUpdate

Macgo Blu-ray Player 2.10.9.1750 - Blu-r...
Macgo Mac Blu-ray Player can bring you the most unforgettable Blu-ray experience on your Mac. Overview Macgo Mac Blu-ray Player can satisfy just about every need you could possibly have in a Blu-ray... Read more
Apple iOS 8.1 - The latest version of Ap...
The latest version of iOS can be downloaded through iTunes. Apple iOS 8 comes with big updates to apps you use every day, like Messages and Photos. A whole new way to share content with your family.... Read more
TechTool Pro 7.0.5 - Hard drive and syst...
TechTool Pro is now 7, and this is the most advanced version of the acclaimed Macintosh troubleshooting utility created in its 20-year history. Micromat has redeveloped TechTool Pro 7 to be fully 64... Read more
PDFKey Pro 4.0.2 - Edit and print passwo...
PDFKey Pro can unlock PDF documents protected for printing and copying when you've forgotten your password. It can now also protect your PDF files with a password to prevent unauthorized access and/... Read more
Yasu 2.9.1 - System maintenance app; per...
Yasu was originally created with System Administrators who service large groups of workstations in mind, Yasu (Yet Another System Utility) was made to do a specific group of maintenance tasks... Read more
Hazel 3.3 - Create rules for organizing...
Hazel is your personal housekeeper, organizing and cleaning folders based on rules you define. Hazel can also manage your trash and uninstall your applications. Organize your files using a... Read more
Autopano Giga 3.7 - Stitch multiple imag...
Autopano Giga allows you to stitch 2, 20, or 2,000 images. Version 3.0 integrates impressive new features that will definitely make you adopt Autopano Pro or Autopano Giga: Choose between 9... Read more
MenuMeters 1.8 - CPU, memory, disk, and...
MenuMeters is a set of CPU, memory, disk, and network monitoring tools for Mac OS X. Although there are numerous other programs which do the same thing, none had quite the feature set I was looking... Read more
Coda 2.5 - One-window Web development su...
Coda is a powerful Web editor that puts everything in one place. An editor. Terminal. CSS. Files. With Coda 2, we went beyond expectations. With loads of new, much-requested features, a few... Read more
Arq 4.6.1 - Online backup to Google Driv...
Arq is super-easy online backup for the Mac. Back up to your own Google Drive storage (15GB free storage), your own Amazon Glacier ($.01/GB per month storage) or S3, or any SFTP server. Arq backs up... Read more

Latest Forum Discussions

See All

This Week at 148Apps: October 13-17, 201...
Expert App Reviewers   So little time and so very many apps. What’s a poor iPhone/iPad lover to do? Fortunately, 148Apps is here to give you the rundown on the latest and greatest releases. And we even have a tremendous back catalog of reviews; just... | Read more »
Angry Birds Transformers Review
Angry Birds Transformers Review By Jennifer Allen on October 20th, 2014 Our Rating: :: TRANSFORMED BIRDSUniversal App - Designed for iPhone and iPad Transformed in a way you wouldn’t expect, Angry Birds Transformers is a quite... | Read more »
GAMEVIL Announces the Upcoming Launch of...
GAMEVIL Announces the Upcoming Launch of Mark of the Dragon Posted by Jessica Fisher on October 20th, 2014 [ permalink ] Mark of the Dragon, by GAMEVIL, put | Read more »
Interview With the Angry Birds Transform...
Angry Birds Transformers recently transformed and rolled out worldwide. This run-and-gun title is a hit with young Transformers fans, but the ample references to classic Transformers fandom has also earned it a place in the hearts of long-time... | Read more »
Find Free Food on Campus with Ypay
Find Free Food on Campus with Ypay Posted by Jessica Fisher on October 20th, 2014 [ permalink ] iPhone App - Designed for the iPhone, compatible with the iPad | Read more »
Strung Along Review
Strung Along Review By Jordan Minor on October 20th, 2014 Our Rating: :: GOT NO STRINGSUniversal App - Designed for iPhone and iPad A cool gimmick and a great art style keep Strung Along from completely falling apart.   | Read more »
P2P file transferring app Send Anywhere...
File sharing services like Dropbox have security issues. Email attachments can be problematic when it comes to sharing large files. USB dongles don’t fit into your phone. Send Anywhere, a peer-to-peer file transferring application, solves all of... | Read more »
Zero Age Review
Zero Age Review By Jordan Minor on October 20th, 2014 Our Rating: :: MORE THAN ZEROiPad Only App - Designed for the iPad With its mind-bending puzzles and spellbinding visuals, Zero Age has it all.   | Read more »
Hay Ewe Review
Hay Ewe Review By Campbell Bird on October 20th, 2014 Our Rating: :: SAVE YOUR SHEEPLEUniversal App - Designed for iPhone and iPad Pave the way for your flock in this line drawing puzzle game from the creators of Worms.   | Read more »
My Very Hungry Caterpillar (Education)
My Very Hungry Caterpillar 1.0.0 Device: iOS Universal Category: Education Price: $3.99, Version: 1.0.0 (iTunes) Description: Care for your very own Very Hungry Caterpillar! My Very Hungry Caterpillar will captivate you as he crawls... | Read more »

Price Scanner via MacPrices.net

2013 15-inch 2.0GHz Retina MacBook Pro availa...
B&H Photo has leftover previous-generation 15″ 2.0GHz Retina MacBook Pros now available for $1599 including free shipping plus NY sales tax only. Their price is $400 off original MSRP. B&H... Read more
Updated iPad Prices
We’ve updated our iPad Air Price Tracker and our iPad mini Price Tracker with the latest information on prices and availability from Apple and other resellers, including the new iPad Air 2 and the... Read more
Apple Pay Available to Millions of Visa Cardh...
Visa Inc. brings secure, convenient payments to iPad Air 2 and iPad mini 3as well as iPhone 6 and 6 Plus. Starting October 20th, eligible Visa cardholders in the U.S. will be able to use Apple Pay,... Read more
Textkraft Pocket – the missing TextEdit for i...
infovole GmbH has announced the release and immediate availability of Textkraft Pocket 1.0, a professional text editor and note taking app for Apple’s iPhone. In March 2014 rumors were all about... Read more
C Spire to offer iPad Air 2 and iPad mini 3,...
C Spire on Friday announced that it will offer iPad Air 2 and iPad mini 3, both with Wi-Fi + Cellular, on its 4G+ LTE network in the coming weeks. C Spire will offer the new iPads with a range of... Read more
Belkin Announces Full Line of Keyboards and C...
Belkin International has unveiled a new lineup of keyboard cases and accessories for Apple’s newest iPads, featuring three QODE keyboards and a collection of thin, lightweight folios for both the... Read more
Verizon offers new iPad Air 2 preorders for $...
Verizon Wireless is accepting preorders for the new iPad Air 2, cellular models, for $100 off MSRP with a 2-year service agreement: - 16GB iPad Air 2 WiFi + Cellular: $529.99 - 64GB iPad Air 2 WiFi... Read more
Price drops on refurbished Mac minis, now ava...
The Apple Store has dropped prices on Apple Certified Refurbished previous-generation Mac minis, with models now available starting at $419. Apple’s one-year warranty is included with each mini, and... Read more
Apple refurbished 2014 MacBook Airs available...
The Apple Store has Apple Certified Refurbished 2014 MacBook Airs available for up to $180 off the cost of new models. An Apple one-year warranty is included with each MacBook, and shipping is free.... Read more
Refurbished 2013 MacBook Pros available for u...
The Apple Store has Apple Certified Refurbished 13″ and 15″ MacBook Pros available starting at $929. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.5GHz MacBook Pros (4GB RAM/... Read more

Jobs Board

*Apple* Retail - Multiple Positions (US) - A...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
Position Opening at *Apple* - Apple (United...
…customers purchase our products, you're the one who helps them get more out of their new Apple technology. Your day in the Apple Store is filled with a range of Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** At the Apple Store, you connect business professionals and entrepreneurs with the tools they need in order to put Apple solutions to work in their Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** The Apple Store is a retail environment like no other - uniquely focused on delivering amazing customer experiences. As an Expert, you introduce people Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** As businesses discover the power of Apple computers and mobile devices, it's your job - as a Solutions Engineer - to show them how to introduce these Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.