Kool Tools: MacSpeech Dictate
Volume Number: 25
Issue Number: 03
Column Tag: Kool Tools
Kool Tools: MacSpeech Dictate
by Dennis Sellers & Neil Ticktin
MacSpeech, a provider of speech recognition solutions for the Mac, and Nuance Communications, developers of Dragon NaturallySpeaking, teamed up to offer the MacSpeech Dictate speech recognition solution, which replaced their previous product called iListen.
The very first version of Dictate had some limitations which MacSpeech was very open about, but now, the technology is coming into its own.
For years, Dragon NaturallySpeaking has been the king of recognition ... on the PC. When MacSpeech licensed the underlying dictation technology from Nuance Communications for Dictate, those in the know were more than thrilled.
With the software, Mac users can begin dictating straight into their applications with surprisingly little time spent training the software to recognize their voice. The folks at MacSpeech say that training MacSpeech Dictate for up to a 99 percent accuracy level generally takes less than five minutes. That's maybe a bit of an exaggeration, but if you spend a bit of time on it, you can quickly get up an accuracy which is more than acceptable (and far better than iListen at its best).
Installing Dictate is a no-brainer. You drag the application from the CD to the application folder, load the English language data from another CD and follow the instructions on the set-up screens. (You have to enter your serial number, and enable access for assistive devices in the Universal Access control panel of Mac OS X first.)
There's a wide range of headsets that you can use, but you can't use just any headset. It's very important that the headset you choose has been qualified by MacSpeech for use with Dictate. You see, the quality is truly important to the accuracy of the speech recognition. This is not just with MacSpeech's product, but with any speech-recognition product.
For the purpose of this article, we looked at two headsets: one wired and one wireless. In either case, it was very important to make sure that the microphone was positioned properly in front of the mouth. To give you an idea how precise you should position the microphone, most people should place the microphone "1-2 fingertips from the corner of the mouth." Specifically, it should be out of the breath stream from your mouth as well as your nostrils.
And, it may come as some surprise that you can only use a mic that connects via USB. Specifically, you cannot plug a mic into the audio ports of your Mac.
Once you have your headset connected, you begin by making a profile. These profiles are dependent on your voice, and the microphone type. If you have a heavy accent (such as Dennis' Southern one), you'll need to spend more time "training" Dictate. In fact, you should go through all three of the different training stories. While you can get started in just five minutes, the more vocal training you give Dictate, the more accurate your results.
What's more, the training is tied to the headset used. This means that if you replace the headset, you'll need to create a new profile if it's a different model. That's a bit of a pain, but there's good reason for it. On the plus side, Dictate also supports multiple profiles so different people can use the software. In fact, that's exactly how we tested multiple microphones when reviewing the software. (It should come as no surprise that we used Dictate to write this article.)
Once you've got everything set up, you can use Dictate pretty much anywhere with any app in which you can type on your Mac. Dictate also includes its own notepad (try typing in it for some entertainment value). There's also a "command" mode that lets you control various features of your Mac, such as opening applications, switching apps, taking screenshots, etc. You can get as creative as you want in controlling things, but that goes beyond the scope of this article.
Using MacSpeech Dictate, user-spoken commands are recognized separately from dictation, which means you don't have to tell the software to change modes. With the recently released version 1.2, Dictate customers can dictate any specific word, no matter how obscure, by spelling it letter-by-letter with the new Spelling mode.
The new version also introduces Phrase Training, which helps increase accuracy even more as you use MacSpeech Dictate. The MacSpeech Dictate 1.2 update is also a maintenance release that fixes reported issues and introduces a new "Move" command for easier verbal editing of a document. The MacSpeech Dictate 1.2 update is free of charge and now available for registered customers, using the "Check for Updates" feature.
Why you should be in awe
You should be in awe that this works at all ... let alone as well as it does. Most people have no idea how hard speech-recognition actually is. Humans are so adept at listening to other people speak in interpreting what they have to say, it's amazing. People can pick a voice out of a loud room. Computers, on the other hand, have a much more difficult time picking out just speech from the surrounding noise.
In addition, people use their understanding of what is being said to fill in the gaps of what they don't hear, or even what the speaker didn't utter. Computers don't have it so easy; they cannot understand language as easily as you do. So, the computer takes a different path and calculates the probability of upcoming word to make its "guess."
Think of it this way. Remember to the last time you heard someone speaking a foreign language... one that you didn't know at all? If you remember, it kind of sounded like gibberish. But, you would hear breaks from time to time in the speakers speech. That was the end of a paragraph, not a sentence or word, but a paragraph. Imagine how difficult it is to break the words apart.
Now, imagine you are the poor computer. Not only do you have a hard time hearing through all the noise, not only can you not read hand gestures, but you don't understand the language.
Furthermore, if you were to look at a sonogram (a printout of what the sound waves look like), you would be very hard to tell apart similar sets of words. For example, "I screen" is extremely similar to "ice creme" when you look at it on a sonogram. Especially, when you realize how fast someone says these words.
Or take another example. Imagine "an aim" versus "a name" presents a similar kind of problem. See each to yourself now at a normal pace and see for yourself.
Now imagine that you have a mumbler. That makes things even more difficult. All the words get merged together and it becomes that much more difficult for the computer to tell them apart. Posture, your throat being dry, enunciation, volume, background noise, talking speed and more all can have an effect on the accuracy of speech recognition.
These are only a few of the challenges that speech recognition software has to deal with. And, that's why you should be in awe
What we would like to see work better
The speech recognition itself works quite well, especially if you are doing dictation of letters, or article like this. If you speak at a normal pace, and you enunciate, the recognition accuracy is quite good. Anyone that has a desire for dictation abilities should look at the product.
What doesn't work as well, are shorter burst of text like what you might use in iChat. Presumably, this is because you tend to not use natural language when IMing, and tend to use more fragmented speech. But, that's being picky.
More importantly, it would be nice to make for easier corrections and trainings. For example, we struggled a bit when trying to train new phrases that may not be part of everyone's speech. And, we have a desire to click in the recognition choices window and make a correction there. But, these are minor when compared to how well this all works (especially if we're not mumbling).
And of course, we'll always be in search of even better accuracy. Then again, this is a 1.x product, so we expect it will get better, even though it's quite good already.
Samson AirLine 77 Wireless USB microphone
MacSpeech has already qualified a selection of microphones at a wide range of pricing. If you are heavy dictation user, you should seriously look at the Samson AirLine 77 Wireless USB microphone (see below). While a pricey $349, it worked exceptionally well... in fact even better for us than wired mics. Designed to go around the back your head, it's a bit cumbersome to wear at first, but you get used to it.
MacSpeech Dictate requires Mac OS X 10.4.11 or higher, and requires Intel-based Macs. New MacSpeech Dictate solutions with a choice of headsets, is priced starting at US$199 (including an entry level microphone). Registered customers of iListen can purchase MacSpeech Dictate at a special crossgrade price of $99.
For more info, go to http://www.macspeech.com
Dennis is Editor of MacsimumNews, and regular MacTech contributor. Neil has a degree in Linguistics/Computer Science with a specialty in Acoustic Phonetics. In other words, he not only got to "geek" on this product from a Mac point of view, but from a Linguistics point of view. Scary.