TweetFollow Us on Twitter

Speaking to Software

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: Speech Recognition

Speaking to Your Software

by Erik Sea

Making your application work well with IBM ViaVoice Enhanced Edition 2.0

What Can I Say?

It's here! Talking to machines and having them respond and react has been the stuff of science fiction for decades. The promise has been so long in coming that the release of ViaVoice Millennium for Mac last year seemed to take some people by surprise - many a passerby at MacWorld San Francisco was astonished by the speed and accuracy of the system, even in noisy showfloor conditions. Nonetheless, the combination of computational power and algorithm design has finally produced speech recognition software for the Mac that permit routine and productive use, especially as fast, new copper IBM PowerPC chips find their way into more and more Macs.

ViaVoice Millennium, the first release, was a low-end product, providing dictation into a single application, SpeakPad, and non-customizable transfer scripts. Good for basic dictation, with a large, extensible vocabulary, dictation macros, and AppleScript support. ViaVoice Enhanced builds on this capability, adding new features such as direct dictation into selected applications and allowing customization of "built-in" functions through AppleScript.

"Aha!" you say - "Direct dictation into selected applications, but what if I'm not among the 'selected' few?" Fair enough - IBM can only test and support a few high-profile programs (although the development team is always interested in testing new software for compatibility, particularly games). However, the ViaVoice software doesn't prevent dictation into any application and, in many cases, the Mac OS and ViaVoice extensions that ship with our software are all you need - your application may already support dictation and correction without you writing a single line of code!

Probably, though, you should write a line or two of code. This is essential for maintaining the awe and admiration from your employer, and I know that you really do want to anyway.

ViaVoice Speech Technology

But, before we write code, let's talk about speech. Or speak about talk, and how the ViaVoice engine decides what words it thinks you uttered.

Unlike earlier "discrete" speech recognition systems, which ...required ... distinct ... pauses ... between... words, ViaVoice works with "continuous" speech, with no unnatural breaks between words. In consumer products, we're not quite to the stage where you can have conversations with your computer, or even record or transcribe a speech or a meeting, but for one person, sitting at a computer, speaking clearly and providing cues such as punctuation and formatting, recognition is really quite good. In any case, there are other technologies that will need some work before you can say, "Tea, Earl Grey, Hot" and get what you would like.

Training

Recognition accuracy is also improved by training, which allows ViaVoice to construct a mathematical picture of your voice, which it can then use with its models.

The user reads a prepared story is read to the system to train it - the system knows what the words are, and what they ought to sound like. By comparing these sounds to actual sounds, a difference can be calculated down to the individual sounds that make up a word. For example, suppose when I talk I make my Ts sound like Ds much of the time: "butter" sounds more like "budder" when I say it, but someone else my very crisply say "butter".

Once this picture of my voice exists, ViaVoice can predict with some level of accuracy how I might say a specific word.

Vocabularies and Language Models

The ViaVoice vocabularies (sometimes called "dictionaries") and Language Models are basically large databases of word pronunciations and word positions relative to other words, respectively. You can add your own words to the vocabulary, and teach ViaVoice what they sound like - an extension to the spell dictionary sort of operation you may be familiar with for wordprocessors.

Language models are a bit trickier to explain. Start with a large number of typical dictation documents, feed them into a shredder, and out pops a language model at the other end. Well, maybe not a shredder. The secret is they are dissected by elves. Honest.

However they are created, you can think of a language model as collections of trigram (comprising three consecutive words). The probability that a given spoken word really is a specific written word is influenced by the word around it, and trigrams capture this relationship. By no means should you equate "language model" with "grammar" - they are not at all the same, as grammars reflect complex usage rules that are difficult even for most humans to apply correctly all the time.

To illustrate further, consider the sentence "Please write to Mr. Wright right away." We hand this to the elves, who produce trigrams along the lines of those shown in Figure 1. Now, assuming that you don't pronounce "write", "Wright", and "right" in noticeably different ways, how does this system figure out which sounds correspond to which words? The trigrams from this sentence, combined with other trigrams from other sentences, end up with results like "If it's preceded by 'Mr.', it's most likely 'Wright'", and so on. ViaVoice is very good at this kind of thing and, where it makes mistakes, it can even learn not to make them in the future through correction, which improves accuracy.


Figure 1. Basic Trigram Construction.

By the way, you might want to take a moment to consider the word "to" in the Figure 1 example. How does ViaVoice know the difference between "to" and "two" and "too" when they all sound the same? The answer, once again, is by context, as captured in the trigrams!

The more you use it, the better it gets

"I find it interesting that the software is learning about me while I am learning about it."

This customer remark is based on the realization that, as you use ViaVoice, it actually continues to improve your voice model. For example, if you add words that the system doesn't know, those get added to the model. If you make corrections using the correction window (See Figure 2), those corrections get applied to the voice model.


Figure 12. Correction Window - if your associate is actually Mr. "Right".

Also, if you have text documents containing phrases, words, and names that you expect to dictate often, you can analyze those documents, which will further update the voice models. ViaVoice Enhanced has extra facilities that integrate the document analysis features into dictation, so that you don't need to store up documents and run the analysis program yourself.

It is not unusual to hear of accuracy improving from 95% to 98% with persistent use.

Speaking of Japanese

And now, on to the code, and how ViaVoice adopts and extends the Mac OS to bring you dictation as seamlessly as Japanese.

The Mac was probably the first platform to make internationalization a key design objective. As a result, many components of Mac OS, including almost all toolbox functions, are ready-made compatible with other languages and script systems.

You've probably all dealt with localization issues before, ranging from not hardcoding strings to not making assumptions about the size of the label on a button when translated. Some of you have no doubt dealt with the multibyte matters arising from making a product work correctly with Japanese, ranging from not being able to make byte = character assumptions to working nicely with Input Methods.

With the distribution of Mac OS 9, it became easier for any Mac user to install multiple input methods, for languages such as Japanese or Korean or different eastern European script systems as well as the "default" (typically Roman) keyboard system. Previously, while such capabilities were available, input methods were not widely used outside of countries where they were absolutely required.

An input method traditionally allows you to enter text in a different script system by typing a few characters, pulling up palettes based on those characters, and selecting similar symbols from those palettes.

There are two forms of input associated with an input method: inline and bottomline, shown in Figures 3 and 4 respectively. Inline input is generally preferred by users; bottomline input requires you to enter data in one place and have it show up in another place, in another font. Bottomline input also lacks the ability to go back and edit later.


Figure 3. Direct inline support in Japanese.


Figure 4. Japanese bottomline input (text entered in lower window).

In designing ViaVoice Enhanced direct dictation, we decided that we could use the input method architecture developed by Apple for non-Roman languages like Japanese, and use it for word-based input and correction. We did end up needing to extend the model slightly, as I'll describe later, but we did it in the background so that, in some circumstances, if you do the work to allow Japanese inline input, you also get ViaVoice speech inline input and correction for free!

TSM, TSMTE, IBM VV & U

These acronyms represent the key players in IBM ViaVoice dictation into any application.

Introduced way back in Mac OS 7.1, the Text Services Manager (TSM) has provided functionality for other languages (primarily multibyte languages) for years. With TSM, a savvy developer could write a few lines of initialization code, and then install 4 Apple event handlers that, much like AppleScript, performed operations like inserting text, showing and hiding the bottomline input window, and telling the input method where a given text offset was. TSM is well-documented in Inside Macintosh: Text, forming all of chapter 8.

Later, Apple introduced the Text Services Manager for TextEdit (TSMTE), which eliminated the need to write any of the event handling code in applications that only used TextEdit - from this point, it was only necessary to initialize the manager and let TSMTE handle the rest. This functionality is well documented in Apple's Tech Note TE27.

Full inline input was not achieved until a new fifth Apple event was added. This is the GetText or 'gtxt' event, and it is only documented in a develop article by Tague Griffith (issue 29), or in Apple's Tech Note 10002 (which is only available in Japanese).

For speech, we determined that the above was not enough. While we could have gone our own way with a completely different model and then tried to sell it to developers, we decided to, instead, augment the existing TSM calls, by adding a couple of extra parameters to GetText, and adding another event which we call SetSelection. With just these two changes, we have the necessary and sufficient conditions for dictation and correction. Sure, we could do more with more events (and, may extend the system to enhance functionality in the future), but you're busy trying to figure out how to get your software to run under Carbon, so we thought we'd cut you a break! Oh, and as you may have inferred, if you've relied on TSMTE for Inline Support in Japanese, the changes to GetText and the addition of SetSelection is done for you automatically. We'll talk about these additions later when I present the implementation code.

If you don't use TextEdit exclusively, you cannot rely on TSMTE for Japanese input, and you will likewise need to do the work of handling the calls and adding the parameters yourself. But, even so, if you've done the work for Japanese (Japan being the second largest Mac market in the world), the incremental work for adding ViaVoice support is bordering on the trivial! As I write this, it may also be necessary to write these handlers for Mac OS X, whether or not you use TextEdit. By the time you read this, we'll have a better idea of what the story is. Either way, the solution is not difficult, it's just a matter of not knowing which path will be required for Mac OS X.

Adding the Ears

By now, you're probably frothing at the prospect of dictation-enabling your code. Let's get right to it. In Listing 1, you'll see how to enable TSM as part of your startup retinue and disable it as part of quitting (Carbon Applications don't need to do this - the OS does it for you automatically). While you're enabling TSM, know that while you must set the "high level event aware" bit in the SIZE resource, you do not need to set the bit "Use text edit services", because it is deprecated (relates actually to an earlier implementation prior to TSMTE).

Listing 1: Becoming TSM Aware

Determining if TSM is available, initializing it, and cleaning up

This is more or less boilerplate code that is required for any application. Note that, under Carbon, you do not even need to make these calls, as the system will do them for you - that is, any Carbon application is TSM-aware without any special calls!


Boolean IsTSMAvailable (void) {
   
   SInt32         version;
   Boolean      available   = false;

   // Note: gestaltTSMgrAttr is not defined under Mac OS 9
   // so we use the gestaltTSMgrVersion selector instead...

   if (noErr == Gestalt (gestaltTSMgrVersion, &version)) {
      if (version >= gestaltTSMgr15) {
         available = true;
      } // if
   } // if
   
   return available;

} // IsTSMAvailable

void StartTSM (void) {

   // Initialize TSM, and install our event handlers...

   OSErr      err       = noErr;

   if (IsTSMAvailable ()) {
   
      #if TARGET_API_MAC_OS8
         err = InitTSMAwareApplication ();
      #endif

      // Install TSM event handlers here - see later section

   } // if   
   gFontForce = GetScriptManagerVariable (smFontForce);
   SetScriptManagerVariable (smFontForce, false);

} // StartTSM

void CloseTSM (void) {

   // Clean up all TSM things, including our event handlers...

   SetScriptManagerVariable (smFontForce, gFontForce);

   if (IsTSMAvailable ()) {
   
      #if TARGET_API_MAC_OS8
         (void) CloseTSMAwareApplication ();
      #endif
      
      // Remove AE handlers here...

   } // if   

} // CloseTSM


You'll notice that there is no special "is ViaVoice installed" code. Again, because we're an input method, the code you write works whether we're installed or not!

If you're only using TextEdit and dialog boxes, set the refcon field of your dialogs to kTSMTEInterfaceType and you're done. Go talk to your computer for a while. Tell your friends/family/coworkers I said it was OK.

Beyond TextEdit Support

Although many applications can live with just TextEdit, the 32K limit, among other things, lead many people to roll their own or use a third-party code library such as WASTE (although now Apple has made a special MLTE - multi-lingual text edit - available). These will require implementation of five Apple event handlers. The complexity of these handlers, naturally, depends on how your code is laid out, but in general, you're just providing an external API to functions or data that you already have written. And you need to do 98% of this for Japanese anyway, so why not squeeze in the 2% for speech? I'll even write the code for you. Alternatively, you can use WASTE (WorldScript-Aware Styled Text Engine, by Marco Piovanelli) which is available in source code form all over the Web, and does much of the work (WASTE would need to be adjusted slightly to handle some of the modified events described here).

But before we delve into the handlers themselves, some TSM terminology. The basics of TSM are discussed in chapter 7 of Inside Macintosh: Text, and also in Tague Griffith's article in develop 29 (see links section), so I will gloss over most of that - the code is pretty self explanatory when read with those references in hand. Ironically, Tague even suggests that input methods might one day be used for dictation input!

TSM keeps track of things on a document basis, where a document is a unique editable area of text. You'll need to add some extra handling to your event loop, so that TSM gets a crack at events it may need to intercept with TSMEvent(), and give TSM first crack at menu events with TSMMenuSelect() (you may have noticed that input methods typically have menus - the ViaVoice input method does not have a menu, but you do want to support Japanese too, right?). As well, when your TSM-aware documents become active and inactive, you need to tell TSM with ActivateTSMDocument() and DeactivateTSMDocument().

Input methods also might like to change the cursor (currently, the ViaVoice input method does not do so, but others do and ViaVoice may in the future), so in your cursor-management routines, or at idle time, call SetTSMCursor(). For this to work, your mouse-moved region (the final parameter to WaitNextEvent() which most people lazily set to NULL) needs to be a single point - since you have no way of knowing when TSM wants to change the cursor.

OK, that was pretty fast, but as it's been written before in the references above, I didn't want to repeat it. You can look at the sample application that comes with this article if you're lost.

The key part to dictation-enablement is the Apple event handlers. These handlers need to get installed for the input methods, including ViaVoice, to be able to extract information from your document content. ViaVoice doesn't vary much from the standard architecture, and I will highlight the differences.

Position to Offset Event

This event converts screen coordinates into an offset in your document. You receive a point, and return an offset. ViaVoice does not currently use this event.

OSErr
DoPos2Offset (DialogPtr inDialog, const AppleEvent*
                     inAppleEvent, AppleEvent* outReply) {

   Size                  actualSize;
   DescType            actualType;
   OSErr               err            = noErr;
   Boolean            dragging      = false;
   Point               currentPoint;
   SInt32               offset;
   SInt16               where;
   DialogItemType   dialogType;
   Handle               dialogHandle;
   Rect                  dialogBounds;
   GrafPtr            svPort;
   
   GetPort (&svPort);
   #if TARGET_API_MAC_OS8
      SetPort (inDialog);
   #else
      SetPortDialogPort (inDialog);
   #endif
   
   // Required parameter is a point...
   
   if (err == noErr) {
      err = AEGetParamPtr (inAppleEvent, keyAECurrentPoint,
                  typeQDPoint, &actualType, &currentPoint, 
                  sizeof (currentPoint), &actualSize);
   } // if
   
   // Optional parameter is for dragging...
   
   if (err == noErr) {
      (void) AEGetParamPtr (inAppleEvent, keyAEDragging,
                     typeBoolean, &actualType, &dragging, 
                     sizeof (dragging), &actualSize);
   } // if

   // Now, we should do all sorts of calculations, but,
   // TextEdit will more or less do this for us once we
   // figure out if it's in the right place...
   
   GlobalToLocal (&currentPoint);
   GetDialogItem (inDialog, kEditTextDialogItem, 
                        &dialogType, &dialogHandle, &dialogBounds);
   if (PtInRect (currentPoint, &dialogBounds)) {
      TEHandle      dialogTE = GetDialogTEHandle (inDialog);
      offset = TEGetOffset (currentPoint, dialogTE);
      where = kTSMInsideOfActiveInputArea;
   } else {
      where = kTSMInsideOfActiveInputArea;
   } // if

   // Stuff the return values here

   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAEOffset, 
               typeLongInteger, &offset, sizeof (offset));
   } // if
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAERegionClass, 
               typeShortInteger, &where, sizeof (where));
   } // if
   
   SetPort (svPort);
   
   return err;

} // DoPos2Offset



Offset to Position Event

The reverse of Position to Offset: return a global point given a text offset. ViaVoice does not currently use this event.

OSErr
DoOffset2Pos (DialogPtr inDialog, const AppleEvent*
         inAppleEvent, AppleEvent* outReply) {

   Size                  actualSize;
   DescType            actualType;
   OSErr               err            = noErr;
   SInt32               offset;
   GrafPtr            svPort;
   Point               thePoint;
   TEHandle            teHandle      = GetDialogTEHandle (inDialog);
   Rect                  bounds;
   
   GetPort (&svPort);
   #if TARGET_API_MAC_OS8
      SetPort (inDialog);
      bounds = inDialog->portRect;
   #else
      SetPortDialogPort (inDialog);
      GetPortBounds (GetDialogPort (inDialog), &bounds);
   #endif

   // Required parameter is an offset position...
   
   if (err == noErr) {
      err = AEGetParamPtr (inAppleEvent, keyAEOffset,
                  typeLongInteger, &actualType, &offset, 
                  sizeof (offset), &actualSize);
   } // if
   
   // Convert the offset to a position, taking into 
   // account whether it's visible or not...
   
   if (err == noErr) {
      thePoint = TEGetPoint (offset, teHandle);
      if ((offset < 0) && (offset > (**teHandle).teLength)) {
         err = errOffsetInvalid;
      } else if (PtInRect (thePoint, &bounds)) {
         err = errOffsetIsOutsideOfView;
      } // if
   } // if

   // Return the point (in global coordinates), and
   // the parameters of the text...

   if (err == noErr) {
      LocalToGlobal (&thePoint);
   } // if
   
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAEPoint, typeQDPoint,
               &thePoint, sizeof (thePoint));
   } // if
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAETextFont, 
                  typeLongInteger, &(**teHandle).txFont, 
                  sizeof (SInt32));
   } // if
   if (err == noErr) {
      Fixed theFixed = Long2Fix((**teHandle).txSize);
      err = AEPutParamPtr (outReply, keyAETextPointSize, 
                  typeFixed, &theFixed, sizeof (theFixed));
   } // if
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAETextLineHeight, 
                  typeShortInteger, &(**teHandle).lineHeight,
                  sizeof (SInt16));
   } // if
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAETextLineAscent, 
                  typeShortInteger, &(**teHandle).fontAscent,
                  sizeof (SInt16));
    } // if
   
   SetPort (svPort);
   
   return err;

} // DoOffset2Pos

Update Active Input Area Event

This event is used to hilite an area of your document as requested by the input method. ViaVoice does not currently use this event.

OSErr
DoUpdateActiveInputArea (DialogPtr inDialog, const
         AppleEvent* inAppleEvent, AppleEvent* /*outReply*/) {

   Size               actualSize;
   DescType         actualType;
   OSErr            err               = noErr;
   AEDesc            theTextDesc      = {};
   AEDesc            theHiliteDesc   = {};
   AEDesc            theUpdateDesc   = {};
   SInt32            fixLength;
   TextRange      thePinRange;
   TEHandle         teHandle         = GetDialogTEHandle (inDialog);
   ScriptLanguageRecord   scriptCode;

   // Required parameters containing firmed text, script,
   // and fixed length...

   if (err == noErr) {
      err = AEGetParamDesc (inAppleEvent, keyAETheData, 
                  typeChar, &theTextDesc);
   } // if
   if (err == noErr) {
      // Note: "Inside Macintosh - Text" says this parameter
      // is under keyAEScriptTag, but in practice it appears to
      // be under keyAETSMScriptTag...
      err = AEGetParamPtr (inAppleEvent, keyAETSMScriptTag, 
                  typeIntlWritingCode, &actualType, &scriptCode,
                  sizeof (scriptCode), &actualSize);
   } // if
   if (err == noErr) {
      // Note: "Inside Macintosh - Text" says this parameter 
      // is required, but in reality, it seems to be optional
      // and not sent (and redundant with the actual size of
      // the data in  theTextDesc) - we won't use or rely
      // on it...
      (void) AEGetParamPtr (inAppleEvent, keyAEFixLength,
                     typeLongInteger, &actualType, &fixLength,
                     sizeof (fixLength), &actualSize);
   } // if
   
   // Optional parameters hilite range list, update range,
   // and Pin range; we don't use any of these...
   
   if (err == noErr) {
      (void) AEGetParamDesc (inAppleEvent, keyAEHiliteRange,
                   typeTextRangeArray, &theHiliteDesc);
   } // if
   if (err == noErr) {
      (void) AEGetParamDesc (inAppleEvent, keyAEUpdateRange,
                   typeTextRangeArray, &theUpdateDesc);
   } // if
   if (err == noErr) {
      (void) AEGetParamPtr (inAppleEvent, keyAEPinRange,
                     typeTextRange, &actualType, &thePinRange,
                     sizeof (thePinRange), &actualSize);
   } // if
   
   // At this point, we need to be inserting text, 
   // most probably...
   
   if (err == noErr) {
      #if TARGET_API_MAC_OS8
         SInt8      hState;
         Hstate = HGetState ((Handle) theTextDesc.dataHandle);
         HLock ((Handle) theTextDesc.dataHandle);
         TEDelete (teHandle);   // Clean first...
         TEInsert (*(theTextDesc.dataHandle), GetHandleSize
            ((Handle) theTextDesc.dataHandle), teHandle);
         HSetState ((Handle) theTextDesc.dataHandle, hState);
      #else
         // AEDescs are opaque under Carbon. So we need
         // to allocate and copy using the accessor APIs.
         // OK, fine...
         Size         dataSize = AEGetDescDataSize (&theTextDesc);
         Handle      dataCopy = NewHandle (dataSize);
         if (dataCopy != NULL) {
            HLock (dataCopy);
            err = AEGetDescData (&theTextDesc, *dataCopy,
                      dataSize);
         } else {
            err = memFullErr;
         } // if
         if (err == noErr) {
            TEDelete (teHandle);   // Clean first...
            TEInsert (*dataCopy, dataSize, teHandle);
         } // if
         if (dataCopy != NULL) {   
            DisposeHandle (dataCopy);
         } // if
      #endif
   } // if
   
   // Clean up...
   
   (void) AEDisposeDesc (&theTextDesc);
   (void) AEDisposeDesc (&theHiliteDesc);
   (void) AEDisposeDesc (&theUpdateDesc);
   
   return err;

} // DoUpdateActiveInputArea

Get Text Event

The GetText event is a mystical event introduced by Apple Japan and, until recently, documented only in Japanese. This event allows the input method to request the application to return text that has already been committed to the document. ViaVoice uses this event to extract text for correction.

More than that, however, ViaVoice expects two additional parameters: the offset start and the offset end. Why is this? Because, unlike simple text-editing Input Methods, ViaVoice distinguishes between the first utterance of the word "the" and the second - it actually keeps track of all the dictated text for a session, the relative words, and so on, in order to make correction work. If ViaVoice were to allow correction simply based on the text of the word, all of the additional contextual information and even the audio data would be useless!

Thankfully, if you use TextEdit, the extra parameters are added for you by ViaVoice, but for other wordprocessing situations, you'll need to add them. Relatively painless, in most cases, since you probably know the offsets of the selection anyway!

OSErr
DoGetSelectedText (DialogPtr inDialog, const AppleEvent*
         /*inAppleEvent*/, AppleEvent* outReply) {

   OSErr            err            = noErr;
   TEHandle         teHandle      = GetDialogTEHandle (inDialog);
   SInt8            hState         = HGetState ((Handle) teHandle);
   SInt64            selStart      = (**teHandle).selStart;
   SInt64            selEnd         = (**teHandle).selEnd;
   
   // The only required return is the current selected text...
   
   HLock ((Handle) teHandle);
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyAETheData, typeText,
                &(*teHandle)[selStart], selEnd-selStart);
   } // if
   
   // For ViaVoice, we also add the numeric values of the
   // start and end of the selection within the text...
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyVVStartSelectionParam,
                typeSInt64, &selStart, sizeof (selStart));
   } // if
   if (err == noErr) {
      err = AEPutParamPtr (outReply, keyVVEndSelectionParam,
                typeSInt64, &selEnd, sizeof (selEnd));
   } // if
   
   HSetState ((Handle) teHandle, hState);
   
   return err;
} // DoGetSelectedText

Set Selection Event

This event is new and unique to ViaVoice. For TextEdit, it is implemented for you in one of the ViaVoice extensions (with one caveat of course: if you don't expect the selection to change within your TextEdit fields, you may be surprised to see it change, or if you duplicate the selection range in one of your own data structures, you may end up out of sync).

Really, all this does is ask you application to change the active selection. This is necessary so that commands such as "correct 'the'" will work as the user expects them.

OSErr
DoSetSelection (DialogPtr inDialog, const AppleEvent*
          inAppleEvent, AppleEvent* /*outReply*/) {

   Size                  actualSize;
   DescType            actualType;
   OSErr               err            = noErr;
   TEHandle            teHandle      = GetDialogTEHandle (inDialog);
   SInt64               selStart;
   SInt64               selEnd;
   Boolean            doDraw         = false;
   // This is a ViaVoice-specific event. Retrieve the
   // selection, and the optional draw event, and do it...
   if (err == noErr) {
      err = AEGetParamPtr (inAppleEvent,
               keyVVStartSelectionParam, typeSInt64, &actualType,
               &selStart, sizeof (selStart), &actualSize);
   } // if
   if (err == noErr) {
      err = AEGetParamPtr (inAppleEvent,
               keyVVEndSelectionParam, typeSInt64, &actualType, 
               &selEnd, sizeof (selEnd), &actualSize);
   } // if
   if (err == noErr) {
      (void) AEGetParamPtr (inAppleEvent,
               keyVVDrawSelectionParam, typeBoolean, &actualType, 
               &doDraw, sizeof (doDraw), &actualSize);
   } // if

   // Clip off the ends to TextEdit range...

   if (err == noErr) {
      if (selEnd > 0x7fff) {
         selEnd = 0x7fff;
      } // if
      if (selStart > 0x7fff) {
         selStart = 0x7fff;
      } // if
      if (selStart < 0) {
         selStart = 0;
         err = paramErr;
      } // if
      if (selEnd < 0) {
         selEnd = 0;
         err = paramErr;
      } // if
   } // if
   if (err == noErr) {
      TESetSelect (selStart & 0x7fff, selEnd & 0x7fff,
                            teHandle);
   } // if
   if ((err == noErr) && doDraw) {
      // We would optionally draw here, but TESetSelect does
      // that anyway, there's no point. The idea is that the
      // screen will flicker if you honor this optional
      // parameter...
   } // if
      
   return err;

} // DoSetSelection



Fine Tuning

The Mac OS is a cooperative multitasking system. ViaVoice direct dictation involves the cooperation of at least four different applications, all of which need a slice of time. See Figure 5 for an overview of how the components interact. The recognition engine, in fact, will shut off if it doesn't get enough time to handle the incoming audio stream which, like the real world, isn't very cooperative. Audio data is big, so the amount of time before a shutdown is small. You can simulate this by clicking on a menu in Mac OS while the microphone is on.


Figure 5. Interprocess Communication within ViaVoice - and to your application - requires that everybody share the processor equitably!

So, what you need to keep in mind that your application, when in the foreground, should be as friendly as it can be with the other processes, particularly calling WaitNextEvent frequently enough that the recognition engine gets time to process audio, the dictation manager gets enough time to assemble words and send them to your application, and the VoiceCenter gets enough time to communicate status and feedback to the user. Some applications try to avoid WaitNextEvent in order to improve their own apparent performance, but if you do this with ViaVoice, you won't get very good throughput, and you may even starve the engine to shutdown.

Not everything that you can type into is appropriate for dictation - sure, you could say that text is text, but dictation isn't really the same as data entry. Right now, there is no way to restrict dictation to numbers, or constrain the dictation search to a single word answer or a set of words that might be appropriate for a given field. Rather, right now, we're focussing on freeing up the keyboard and mouse so that the user can speak and think for prolonged periods in large bodies of text, like letters, email, or other prose. This is not for typing in a choice of eleven point text!

Test Drive

Bring up your application, start ViaVoice direct dictation services, activate the dictation system with the phrase "begin direct dictation", and then, when the system is ready, click in a text field of your application, and dictate the phrase "Please write to Mr. Wright right away [period]". After a couple seconds, you should see the text appear. If there are no errors, say "correct mister", and "Mr." will hilite, and the correction window will open with alternatives, as in Figure 6.


Figure 6. Correcting in your application.

Then, you can pick one of the alternatives, or say "close correction window". For more things to try, consult the ViaVoice Users Guide.

Getting Creative

Beyond direct dictation, there are other things you can do. You can write AppleScripts to control your application to perform routine operations. You could even have a "secret about box" phrase bring up a nice little Easter egg in your product. Likewise, you could have other key phrases that, rather than processing as text, trigger behaviors or commands. I'm sure there are other things that I've not thought of yet.

Future Directions

"Prediction is difficult, especially about the future."

A word about future versions. Simple extrapolation from ViaVoice Millennium late last year to Enhanced in the middle of this year should suggest that the ViaVoice for Mac team has been busy, and continues to be busy, adding features, fixing bugs, and getting the product into our customers' hands. I cannot say what will result from this activity, but it is likely that developer opportunities, already greatly expanded with Enhanced, will continue to grow as the product line itself evolves and matures.

An interesting point on this topic is that ViaVoice is the only speech software technology that is currently available for and has an installed base on Windows, Mac OS, and several flavors of Linux. If you're thinking cross-platform speech, this is where you want to be.

MicroPhone Off

There you have it. The amount of work you need to get dictation into your application varies from "it works already for free" to "I had to write a couple of Apple event handlers." Beyond that, you can get as creative as you want.

ViaVoice for Macintosh has been a best seller since its introduction in 1999, and with ubiquitous dictation availability in the latest edition, there's a good chance that many of your customers will have ViaVoice, and want to dictate into your application. Believe me, the number of comments about "I'd like to be able to dictate into Application X" exceeds just about any other feature request. Make it so!

Acknowledgements

I would like to thank Deborah Grits, Eddie Epstein, Jeff Kusnitz, and Paul McLellan for taking the time to give me feedback on this article as it was being written. My special thanks to the rest of the ViaVoice for Mac team who broke new ground on the Mac - twice - and helped bring the future closer.

Links


Erik has been working on Mac development throughout modern history, and, in that time, has done everything from drivers to GUI, and from telecom to graphics processing. Last year, in search of new challenges, he joined the ViaVoice for Mac team at IBM in Florida, and has led the recent release of the Enhanced Edition, which, it is not commonly known, is written specifically for the Mac from the ground up. In his spare time, he breeds noncarnivorous slinkys in the desks of unsuspecting coworkers. You can reach Erik at esea@us.ibm.com.


© Copyright International Business Machines Corporation, 2000. All Rights Reserved.

Note to U.S. Government Users --- Documentation related to restricted rights --- Use, duplication or disclosure is subject to restrictions set forth in GS ADP Schedule Contract with IBM Corp. INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS ARTICLE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes may be incorporated in new editions of the article. In addition, IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this article at any time.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites and use of those Web sites is at your own risk. Finally, this article contains sample application programs in source language, which illustrates programming techniques for the subject matter. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Adobe Audition CC 2018 11.0.1 - Professi...
Audition CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Audition customer). Adobe Audition CC 2018 empowers you to create and... Read more
Adobe After Effects CC 2018 15.0.1 - Cre...
After Effects CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous After Effects customer). The new, more connected After Effects CC... Read more
Adobe Premiere Pro CC 2018 12.0.1 - Digi...
Premiere Pro CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Premiere Pro customer). Adobe Premiere Pro CC 2018 lets you edit... Read more
Adobe Photoshop CC 2018 19.1 - Professio...
Photoshop CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Photoshop customer). Adobe Photoshop CC 2018, the industry standard... Read more
Spotify 1.0.69.336. - Stream music, crea...
Spotify is a streaming music service that gives you on-demand access to millions of songs. Whether you like driving rock, silky R&B, or grandiose classical music, Spotify's massive catalogue puts... Read more
rekordbox 5.1.1.0001 - Professional DJ m...
rekordbox is the best way of preparing and managing your tracks, be it at home, in the studio, or even on the plane! It allows you to import music from other music-management software using the... Read more
Mactracker 7.7.1 - Database of all Mac m...
Mactracker provides detailed information on every Mac computer ever made, including items such as processor speed, memory, optical drives, graphic cards, supported OS X versions, and expansion... Read more
Printopia 3.0.6 - Share Mac printers wit...
Run Printopia on your Mac to share its printers to any capable iPhone, iPad, or iPod Touch. Printopia will also add virtual printers, allowing you to save print-outs to your Mac and send to apps.... Read more
Luminar 2018 1.1.0 - Powerful, adaptive,...
Luminar 2018 is the new full-featured image editor that adapts to the way you edit photos. Over 300 essential tools to fix, edit, and enhance your photos with comfort. The future of photo editing is... Read more
Opera 50.0.2762.67 - High-performance We...
Opera is a fast and secure browser trusted by millions of users. With the intuitive interface, Speed Dial and visual bookmarks for organizing favorite sites, news feature with fresh, relevant content... Read more

Latest Forum Discussions

See All

Jydge hints, tips, and tricks - Everythi...
Just released on iOS, Jydge is a prequel to Neon Chrome and is set in the same universe. Not just that, but the games play in pretty similar ways with them both being twin stick shooters full of surprises. As you might expect from a 10tons game,... | Read more »
World of Warships Blitz: A guide to tact...
Ahoy mates! It's time to set out on the high seas for some PvP battles, and ... sorry, actually, World of Warships Blitz has nothing to do with pirates. Let's start over. [Read more] | Read more »
Around the Empire: What have you missed...
Around this time every week we're going to have a look at the comings and goings on the other sites in Steel Media's pocket-gaming empire. We'll round up the very best content you might have missed, so you're always going to be up to date with the... | Read more »
Everything about Hero Academy 2: Part 4...
In this part of our Hero Academy 2 guide, we're going to have a look at some of the tactics you're going to need to learn if you want to rise up the ranks. We're going to start off slow, then get more advanced in the next section. [Read more] | Read more »
All the best games on sale for iPhone an...
Another week has flown by. Sometimes it feels like the only truly unstoppable thing is time. Time will make dust of us all. But before it does, we should probably play as many awesome mobile videogames as we can. Am I right, or am I right? [Read... | Read more »
The 7 best games that came out for iPhon...
Well, it's that time of the week. You know what I mean. You know exactly what I mean. It's the time of the week when we take a look at the best games that have landed on the App Store over the past seven days. And there are some real doozies here... | Read more »
Popular MMO Strategy game Lords Mobile i...
Delve into the crowded halls of the Play Store and you’ll find mobile fantasy strategy MMOs-a-plenty. One that’s kicking off the new year in style however is IGG’s Lords Mobile, which has beaten out the fierce competition to receive Google Play’s... | Read more »
Blocky Racing is a funky and fresh new k...
Blocky Racing has zoomed onto the App Store and Google Play this week, bringing with it plenty of classic kart racing shenanigans that will take you straight back to your childhood. If you’ve found yourself hooked on games like Mario Kart or Crash... | Read more »
Cytus II (Games)
Cytus II 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: "Cytus II" is a music rhythm game created by Rayark Games. It's our fourth rhythm game title, following the footsteps of three... | Read more »
JYDGE (Games)
JYDGE 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: Build your JYDGE. Enter Edenbyrg. Get out alive. JYDGE is a lawful but awful roguehate top-down shooter where you get to build your... | Read more »

Price Scanner via MacPrices.net

Clearance Apple refurbished iMacs available s...
Apple has previous-generation Certified Refurbished 2015 21″ & 27″ iMacs available starting at $849. Apple’s one-year warranty is standard, and shipping is free. The following models are... Read more
How to save $150-$420 on the purchase of a 20...
B&H Photo has 15″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 15″ 2.8GHz Touch Bar MacBook Pro Space Gray (... Read more
How to save $100-$180 on the purchase of a 20...
B&H Photo has 13″ MacBook Airs on sale for $50-$120 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 13″ 1.8GHz/128GB MacBook Air (MQD32LL/A): $899, $... Read more
Save on Beats: $30-$80 off headphones, earpho...
Walmart has Beats by Dr. Dre on sale on their online store for $30-$80 off MSRP, depending on the item: – Powerbeats3 Wireless Earphones: $134, save $65 – BeatsX Earphones: $109, save $40 – Beats... Read more
Deals on clearance 15″ Apple MacBook Pros wit...
B&H Photo has clearance 2016 15″ MacBook Pros available for up to $800 off original MSRP. Shipping is free, and B&H charges NY & NJ sales tax only: – 15″ 2.7GHz Touch Bar MacBook Pro... Read more
Apple restocked Certified Refurbished 13″ Mac...
Apple has restocked a full line of Certified Refurbished 2017 13″ MacBook Airs starting at $849. An Apple one-year warranty is included with each MacBook, and shipping is free: – 13″ 1.8GHz/8GB/128GB... Read more
How to find the lowest prices on 2017 Apple M...
Apple has Certified Refurbished 13″ and 15″ 2017 MacBook Pros available for $200 to $420 off the cost of new models. Apple’s refurbished prices are the lowest available for each model from any... Read more
The lowest prices anywhere on Apple 12″ MacBo...
Apple has Certified Refurbished 2017 12″ Retina MacBooks available for $200-$240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free.... Read more
Apple now offering a full line of Certified R...
Apple is now offering Certified Refurbished 2017 10″ and 12″ iPad Pros for $100-$190 off MSRP, depending on the model. An Apple one-year warranty is included with each model, and shipping is free: –... Read more
27″ iMacs on sale for $100-$130 off MSRP, pay...
B&H Photo has 27″ iMacs on sale for $100-$130 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 27″ 3.8GHz iMac (MNED2LL/A): $2199 $100 off MSRP – 27″ 3.... Read more

Jobs Board

*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 113384559 Brandon, Florida, United States Posted: 10-Jan-2018 Weekly Hours: 40.00 **Job Summary** Are you passionate about Read more
Security Engineering Coordinator, *Apple* R...
# Security Engineering Coordinator, Apple Retail Job Number: 113237456 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: 40.00 **Job Read more
*Apple* Data Center Site Selection and Strat...
# Apple Data Center Site Selection and Strategy Research Analyst Job Number: 83708609 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: Read more
Engineering Manager - *Apple* TV - Apple (U...
# Engineering Manager - Apple TV Job Number: 113305053 Santa Clara Valley, California, United States Posted: 05-Dec-2017 Weekly Hours: 40.00 **Job Summary** The Read more
AppleCare Support Engineer for *Apple* Medi...
# AppleCare Support Engineer for Apple Media Products Job Number: 113222855 Santa Clara Valley, California, United States Posted: 14-Nov-2017 Weekly Hours: 40.00 Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.