TweetFollow Us on Twitter

Support Multibyte Text

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Toolbox Techniques

Supporting Multi-byte Text in Your Application

by Nat McCully, Senior Software Engineer, Claris Corporation

How to make most efficient use of the Script Manager and International APIs in your text engine

Introduction

You have developed the next greatest widget or application and you want to distribute it on the Net. Your application has a text engine, maybe simple TextEdit, or one of the more sophisticated engines available for license, or even one you wrote yourself. One day, you get an e-mail from someone in Japan:

Dear Mr. McCully,
My name is Takeshi Yamamoto and I use your program, MagicBook, everyday. But, I have a problem using Japanese characters in it. When I hit the delete key, I get weird garbage characters. My friends and I wish to use both Japanese and English in your program, but it does not work properly. Please fix it!
T. Yamamoto

Suddenly, there are people on the other side of the world who want to use your application in their language, and you are faced with a dilemma. You have no first-hand knowledge of the language itself, but you may be somewhat familiar with the Macintosh's ability to handle multiple languages in a single document with ease. Simply cracking open Inside Macintosh: Text seems daunting. How will these new routines affect the performance of your program? Will you introduce unwanted instability and anger your existing user base? Where can you find information on how to use which routines best, not just a description of what each routine does?

This article attempts to address some of these issues, and in general familiarize you with some of the best things that make the Mac an excellent international computing environment. Intelligent use of the Macintosh's international routines, WorldScript, and the other managers in the Toolbox can be the difference between a US-only application and a truly "world-ready" tool that any user, anywhere, can utilize as soon as they download it to their hard disk. Although this paper deals primarily with Japanese language issues, the concepts outlined herein can be used with any multi-lingual environment.

What is WorldScript?

WorldScript is the set of patches to the system that enables the correct display and measurement of multi-lingual text. Over time, many of these patches have been rolled in to the base system software, but even in MacOS 7.6.1, you will find a set of WorldScript extensions in the Extensions folder when you install one of the Language Kits available from Apple. The concepts and code snippets in this paper will work equally well on, for example, the Japanese localized Mac OS, or on a standard U.S. system with the Japanese Language Kit (JLK). A good source of localized system software and Language Kits is the Apple Developer Mailing CD-ROM, available from the Apple Developer Catalog. WorldScript is one of the Apple-only technologies that makes multi-lingual computing possible in a far easier way than the other guys. When it comes to having Chinese, Korean and Japanese all in the same document, WorldScript, on Mac OS, is the only thing out there.

Multi-byte Text on the Mac

OK, let's get to the meat, you say. How do you make your text engine handle two-byte characters? Well, before giving you a bunch of code, let's explain how the Mac handles two-byte text.

What is a Script?

Each language that the Mac supports is grouped into categories called "scripts." For example, English and the other Roman letter-based languages like French and German all belong to the Roman script. Japanese belongs to the Japanese script. Character glyphs in the Roman script are each represented by a single-byte character code. Japanese characters are represented by a 16-bit (2 byte) character code.

Setting Up the Port: Pre-System 7 API's versus New API's

On the Mac, each font is also associated with a script. There are Roman script fonts like Helvetica and Palatino, and Japanese script fonts like Osaka and Heisei Minchou. As you know, when text is drawn into a QuickDraw grafPort, you first set up the port with the appropriate font, size and style, and then call DrawText() to draw the text. If you are using the old Script Manager API's (like CharByte() and GetEnvirons()), you need to set the port to a font in the script you are interested in processing. Once the port is set to a particular font, calls to the Script Manager will follow the rules of that font's script. So the port, by way of setting the font, also has an implicit script setting. This is the key to using the Script Manager routines so they will return the correct information to your application using the older API's. The new API's have a script parameter, so it is not necessary to set the font of the port before using them. Since the Script Manager doesn't have to call FontScript() to find out the script of the current font before passing the script to WorldScript, using the newer API's could speed up your application is certain cases.

Adding Script-savvy Features to your Application

First, you need to determine if the user's system has a non-Roman script system installed. Many programs boost performance by calling Script Manager routines only when absolutely necessary. One way to find out all the scripts installed is to loop through all 33 possible script codes (smRoman being 0 and smUninterp being 32) and calling GetScriptManagerVariable() with the selector smEnabled. Roman script is always enabled.

Listing 1: Finding Which Scripts are Installed

ScriptCode   script;
for (script = smRoman + 1; script <= smUninterp; script++)
{
   if (GetScriptManagerVariable(script, smEnabled))
      InitInternalScriptInfo(script);
            // non-Roman script present...
}

InitInternalScriptInfo() refers to a routine you write that will initialize your internal data structures that deal with specific script systems, such as line-breaking tables, on a script-by-script basis.

Line Breaking

Most applications don't rely entirely on the Script Manager for line breaking, hit testing, or word selection, because using those routines is thought to be too slow. It is possible to optimize your text engine so that you incorporate the correct behaviors for each script system present, while maintaining the highest possible performance. The Toolbox call for finding line breaks is StyledLineBreak(). To use it, however, you must restrict the text you pass it to lengths of less than 32K (actually, this is true of the whole Script Manager, so tough) and text widths to whole pixel values (can you say 'rounding error?'), and if you are explicitly scaling the text, it won't work at all. You must also organize the text you pass to it in terms of script runs and style runs within them. Therefore, most applications that have word-processing functionality choose to implement their own line-breaking code that is customized for their own needs. Unfortunately, many of these private implementations break when used on WorldScript systems.

Line Breaking with Japanese Text

The simplest line-breaking algorithm for English text is to look for a space (ASCII 0x20) character in the line near the graphic break, and if there is none, to break on the byte-boundary nearest the graphic break. Japanese text is a bit more complicated. Japanese text has no spaces, so you must break at the character boundary nearest the graphic break. There is an additional wrinkle: Certain characters are not allowed to begin a line, and certain characters are not allowed to end a line. This set of line-breaking rules is referred to as Kinsoku shori. For example, you cannot begin a line with a two-byte period. You cannot end a line with a two-byte open parenthesis followed by more text on the next line. A list of kinsoku characters is available from the Japanese Standards Association in the form of a Japanese Industrial Standard (JIS) document. It is also in Ken Lunde's excellent book, Understanding Japanese Information Processing, in the section entitled "Japanese Hyphenation." While not all Japanese agree on the correct set of kinsoku characters, this set is a good default. Some applications allow the user to edit the kinsoku character set to their own liking.

Once you know that the current byte offset in your text is on or just before the graphic break, you need to see if that byte is part of a two-byte character. Then you need to see if the character is a character that can't end a line. Then you need to check the character after it to see if it is a character that can't begin a line. This can be repeated as necessary, for support of a string of kinsoku characters. For example, suppose the character on the graphic break (the break char) can end a line, but the character after it can't begin a line, causing the break char to wrap. However, the character before the break char is one that can't end a line, so you must then check the char before it, and so on, and so on, and... The example below is simplified to illustrate a particular case; actual code for an application would probably be organized differently.

Listing 2: Checking Graphic Break Char with Kinsoku Processing

CheckGraphicBreak
Check a text run's length against the pixel edge of the line
(the so-called graphic break), and then adjust the line break
if there is an illegal Kinsoku character on the break. 

UInt16 *   gStartLineKinsokuChars;   // chars that can't begin
                                     // a line.
UInt8      gNumStartKinsokuChars;    // number of chars above.
UInt16 *   gEndLineKinsokuChars;     // chars that can't end a line
UInt8      gNumEndKinsokuChars;      // number of chars above.

// This function will return FALSE if the char at offset
// is not a valid break point. It checks the char after it,
// but not the char before it, for kinsoku.
static Boolean CheckGraphicBreak(UInt8 *      textPtr,
                                              Uint16      offset,
                                              ScriptCode script);
{
   SInt16 result;

   // The textPtr starts at a known 'good' character
   // boundary. In this case it is the beginning of the
   // line, but it could be the beginning of the
   // stylerun.

   // Find out if script only has 1-byte chars. If so,
   // we assume it's ok to break at this char.
   if (GetScriptVariable(script, smScriptFlags) &
               (1 << smsfSingByte))
      return TRUE;

   result = CharacterByteType((Ptr)textPtr,
                  offset,
                  script);
   if (result == smSingleByte)
      return TRUE;   // In real life, you're not done
                        // until you check the chars before
                        // and after this one for kinsoku.
   if (result == smFirstByte)
      return FALSE;
   if (result == smLastByte)
   {
      UInt8      index;
      UInt16   theChar = *(UInt16 *)&textPtr[offset - 1];

      // Now we have a valid break on a 2-byte char.
      // We need to check if it's a kinsoku character.
      // This code checks Japanese kinsoku only, but
      // with a little work this could be extended to
      // all 2-byte scripts that don't break on spaces.
      if (script != smJapanese)
         return TRUE;
      
      for (index = 0; index < gNumEndKinsokuChars; index++)
      {
         if (theChar == gNumEndKinsokuChars[index])
            return FALSE;
      }
      
      // Now we check the char after this one, in case it
      // is a char that can't start a line. First see if
      // it's a 1-byte char. In real life, there are 1-byte
      // kinsoku chars to check for.
      if (textPtr[offset + 1] == NULL || 
         CharacterByteType((Ptr)&textPtr[offset + 1],
                  0, script) == smSingleByte)
         return TRUE;

      theChar = *(UInt16 *)&textPtr[offset + 1];
      
      for (index = 0; index < gNumStartKinsokuChars;
                        index++)
      {
         if (theChar == gNumStartKinsokuChars[index])
            return FALSE;
      }
   }
   return TRUE;
}

Hit Testing

Hit testing is another area in your text engine that demands the highest possible performance. When the user clicks in the text, any delay in setting the insertion point there will be noticed. Drag selection is another example of the same code working hard to find the character boundaries and setting the correct hilite area.

Some applications use a locally allocated cache of possible first byte character codes that they use to test a particular character in the text stream for byteness. This is simple to create with the Toolbox call FillParseTable(). FillParseTable() returns in your pre-allocated 256 byte buffer all the bytes that can be a first byte of a two-byte character in the script you pass to it. Be aware that in some scripts, some character codes can be both the first byte of a two-byte character as well as the second byte of a two-byte character, depending on their context within the text stream. Therefore, you need more than just this information to successfully find out what kind of character the byte you're interested in is a part of. In a mixed stream of text with both one-byte and two-byte characters, using the parse table in a single pass over the text is much faster than calling CharacterByteType() for each byte. An example of this is below, in a sample function that goes through a text stream and counts the number of characters in it:

Listing 3: Counting the Chars in Mixed-byte Text

CountCharsInScriptRun
Counts the number of characters in a run of contiguous script
(font), checking for both one-byte and two-byte characters.

UInt32 CountCharsInScriptRun(UInt8 * textPtr, UInt32 length,
               ScriptCode script)
{
   UInt8      parseTable[256];
   UInt32   curByte, charCount;
   

   (void)FillParseTable(&parseTable, script);
   
   for (curByte = 0L, charCount = 0L; curByte < length;
                        curByte++)
   {
      if (parseTable[textPtr[curByte]] == 1)
         continue;
      charCount++;
   }
   return (charCount);
}

Notice that because we started at a known 'good' boundary, we were able to test only the first bytes of the two-byte characters in the stream as we counted along. This code would not work in all cases if we started at an arbitrary point in unknown text, because of the ambiguity of the byteness of some character codes in some scripts. Caching the parse tables for all installed scripts in the user's system at launch time would further speed up your processing, so you wouldn't have to call FillParseTable() every time.

Measuring Two-byte Characters

On the Mac, all two-byte characters are the same width. In a future system software release, proportional two-byte characters will be supported, but up until now all two-byte-savvy applications assume mono-spaced two-byte characters, and even if proportional characters are supported, they will be mono-spaced by default so as not to break every application currently shipping.

Before the Mac OS supported measuring two-byte characters with TextWidth(), a special code point in the single-byte 256 char width table was reserved for the two-byte character width for that font. In the Japanese and both Chinese scripts, this code point is 0x83. In Korean script, it is 0x85. This code point still works, even though Apple now recommends you use TextWidth() for all measuring of multi-byte or mixed text. In the future for proportional measuring, TextWidth() will probably be what you will use.

Below is an example of a function that measures any text, and returns the amount in a Fixed variable. This is useful if you are measuring text and the user has Fractional Glyph Widths turned on (meaning you made a call to SetFractEnable()).

Listing 4: Measuring Mixed-byte Text

GetTextWidth
This function supports multiple styleruns in the text, and consists
of two loops: One for the styles and one for the characters in each
style. The port is set on each stylerun and then Macintosh APIs
are called to measure the text. 

#define JSCTCWidthChar   0x83
#define KWidthChar   0x85

typedef struct tagStyleRun {
   UInt32     styleStart;
   UInt16     font;
   UInt16     size;
   UInt8      face;
} StyleRun,   *StyleRunPtr;

Fixed GetTextWidth(UInt8 * textPtr, UInt32 length,
                           StyleRunPtr styleRuns, UInt32 numStyles)
{
   ScriptCode      curScript;
   UInt32          byteNum, styleNum;
   Fixed           totalWidth = 0L;
   ScriptCode      curScript;
   FMetricRec      curFontMetrics;
   WidthTable **   curWidthTable;
   UInt8           parseTable[256];
   
   // loop thru each stylerun, measure its characters
   for (styleNum = 0L; styleNum < numStyles; styleNum++)
   {
      // Set up the port (in real life, you'd restore the
      // old settings when you exit) 
      TextFont(styleRuns[styleNum].font);
      TextFace(styleRuns[styleNum].face);
      TextSize(styleRuns[styleNum].size);
      FontMetrics(&curFontMetrics);
      curWidthTable = curFontMetrics.wTabHandle;
      HLock((Handle)curWidthTable);
      curScript = FontScript();
      (void)FillParseTable(&parseTable, curScript);
      
      // loop thru each char in the stylerun
      for (byteNum = styleRuns[styleNum].styleStart;
            (styleNum + 1 < numStyles &&
               byteNum < styleRuns[styleNum+1].styleStart) ||
            (styleNum + 1 >= numStyles && byteNum < length);
               byteNum++)
      {
         if (parseTable[textPtr[byteNum]] == 1)
         {
            if (curScript == smJapanese ||
                  curScript == smTradChinese ||
                  curScript == smSimpChinese)
               totWidth +=
                  (*curWidthTable->tabData)[JSCTCWidthChar];
            else if (curScript == smKorean)
               totWidth +=
                  (*curWidthTable->tabData)[KWidthChar];
            else
               totWidth += (Fixed)
                  TextWidth(&textPtr[byteNum], 0, 2) << 16;

            byteNum++;
         }
         else
            totWidth += 
               (*curWidthTable->tabData)[textPtr[byteNum]];
      }
      HUnlock((Handle)curWidthTable);
   }

   return (totWidth);
}

The above function still makes expensive calls like FontMetrics(), FillParseTable() and TextWidth() on each stylerun. It would be an even better idea to have a local cache of the width tables and parse tables of fonts you know are in the document, so you don't have to rebuild them every time the user clicks or drags or types in the text.

So, now that you have a relatively fast way of measuring the text, you can use it to find the pixel value of any character in the text, and use that for your internal CharToPixel and PixelToChar logic. Or, you can use the Mac OS Toolbox calls CharToPixel() and PixelToChar(), which will always work on any script but may be slower.

Localizing Your Application for Japan

Now that we have reviewed a few of the basic text engine issues for handling two-byte text, there are a few things about Japan in particular that make localization a challenge.

Japan is possibly the most interesting major software market to localize for, if you are interested in text and text layout. It is a mature market, with a diverse number of products enjoying many millions of dollars in sales each year. The Macintosh has a larger market share there than in the U.S. or Europe. Text in Japan has traditionally been difficult to input and output using machines, and the use of text in graphic design requires that the text layout be extremely flexible. The characters are complex (so complex that bolding them may make them illegible), and emphasis or adornment has forms that use background shading, different types of lines around the text, and even dots or ticks above or to one side of each character. Condensed and extended faces have different results on PostScript printers than they do on QuickDraw. Bold and italic faces were not supported on the first PostScript Japanese printers. Underlines are not drawn by QuickDraw when the font is a Japanese font. These last two things might be fixed in future releases of the system software, but for now the application developer must work around them.

For underline, you must draw a line under the text. The reason QuickDraw doesn't draw it for you is that it usually uses the font's baseline as the underline location, but Japanese fonts' two-byte glyphs take up more room and descend below the baseline. Where you draw your underline is up to you, but take a look at how other Japanese programs do it and make it fairly consistent. Vertical text is pretty much a checkbox item nowadays in Japanese word-processing programs. Most novels and many magazines are layed out vertically, but until recently computers were horizontal-only. While the Windows95® APIs support drawing text vertically, the Mac OS still does not, outside of using QuickDrawGX typography (which is excellent, by the way). In comparing vertical text to horizontal text, several things change about the line layout: The first line starts at the top right, and the text flows down to line-end, then wraps to the next line, which is to the left of the first line; the baseline is generally considered to be in the center of the line; underlines are drawn to the right of the text, as are emphasis dots; two-byte characters are not rotated, but single-byte characters are, 90° clockwise; certain characters have vertical text variants, like many punctuation characters. Where these variants are in the font can be found in the 'tate' table in the font ("tahteh" means "vertical" in Japanese).

Rubi are small annotation characters, placed above, below or to the side of the text they annotate. Usually they provide pronunciation guidance for unusual or hard-to-pronounce Kanji characters.

Date formats in Japan include the current year of the emperor's reign; again, supported on Windows95® but not on Mac OS. It is up to the application to support these formats if so desired. Also, date formats 2 and 3 produce identical results, due to the fact that the abbreviated month and the long month are the same thing in Japanese. Japanese applications may opt to substitute a different format in one of those formats' place.

Find and Replace needs to be expanded to include the different types of characters used in Japanese. Standard Japanese text may contain any of the following types of characters: one-byte Roman, two-byte Roman, one-byte numerals and symbols, two-byte numerals and symbols, one-byte katakana syllables, two-byte katakana syllables, two-byte hiragana syllables, and two-byte Kanji characters. The hiragana and katakana characters are equivalent in terms of the sounds they represent in Japanese, so a good Find/Replace function should include an option to find the search string in either syllabary.

Sorting in Japanese is difficult because the Kanji characters can have different pronunciations depending on their context. To sort Kanji correctly, you need a separate kana key field that indicates the pronunciation and you sort on that. Also, Mac OS CompareText() doesn't sort the long sound symbol correctly (that symbol changes sound depending on the character before it, but Mac OS always sorts it in the symbols area), so for linguistically correct sorting you need to write your own sorting routine.

If your application supports character tracking using the Color QuickDraw function CharExtra(), be aware that the CGrafPort member chExtra only uses 4 bits for signed integer values and the other 12 bits for the fraction. The value you pass to CharExtra() is a Fixed value of how many pixels you want to track out (or in) the text, and QuickDraw divides that by the current text size, to arrive at the chExtra value. This means that if the tracking value you pass to CharExtra() is greater than 8 times the text size, the chExtra field will go negative, and your text will be drawn incorrectly. Unfortunately, Japanese text is routinely tracked out beyond this limit in many applications. The only workaround is for you to draw the text one character at a time, and use the QuickDraw pen movement calls like MoveTo() to move the pen yourself. The same is true for SpaceExtra().

Inline Input

Inline input of Japanese, Chinese or Korean is a way of using an intermediate program (called an Input Method) to translate your keystrokes into the many thousands of possible characters in those languages, all in the same place on screen that you would normally see characters typed in the line. In Japanese, the Input Method changes your keystrokes into phonetic Japanese kana characters, then converts some of those characters into Kanji characters to form a mixed kana and Kanji sentence. Then the user hits the return key to confirm the text in the line, ending the inline input session. Inline input on the Mac on System 7.1 or later uses the Text Services Manager (TSM). If your application uses TextEdit as its main text engine, you can support inline input quite easily using TSMTE. If you have your own text engine, you will need to do more work to support TSM Inline Input.

TSM uses Apple events to send and receive data between your application and the Input Method. You must implement several AppleEvent handlers, the most complex of which is the kUpdateActiveInputArea. In that handler, you must draw the text in all its intermediate stages, as the user is composing and editing the Japanese sentence before s/he confirms it to the document. If there is text after the so-called 'inline hole,' you must actively reflow the text if such editing causes the length to change. Each time the user makes a change, the text in the inline hole is received from the Input Method in an Apple event. The application draws it in the text stream, along with special Inline styles that help the user tell which text in the inline hole is raw (unconverted) text, which is converted text, which is the active phrase, where the phrase boundaries are in the inline hole, and other information.

After implementing the TSM support in your application, it is imperative that you test it with third-party Input Methods. At the time TSM was introduced, the documentation for how to write an Input Method was still a little spotty. This resulted in each Input Method handling text slightly differently. Also, Kotoeri, Apple's Input Method, has fewer features than the leading third-party Input Methods. Be sure to test your application with all of them you can find, so you can verify that it won't crash or produce strange results. Some Input Methods have strange quirks, like always eating mouseDown events, or having different requirements about how large a buffer they can handle without crashing. This knowledge comes from testing, and sometimes can be found on the Internet in Usenet newsgroups (in Japanese).

What About Unicode?

Unicode is being billed as the latest panacea for the problems of internationalization. What does Unicode give you? Where does it fall short?

Unicode was designed to solve one problem: There are many incompatible, overlapping encoding schemes for different languages, and supporting all of these encodings is a complex problem. What if there was a single encoding scheme that supported all the writing systems of the world, and guaranteed that you could display text in all the languages Unicode supports if only you had the right Unicode font for each language? Unicode tries to be that encoding.

For Japanese text data, the Mac OS and Windows95 use Shift-JIS internally, while Rhapsody and WindowsNT® use Unicode. On the Internet, most Japanese text is encoded using the 7-bit ISO-2022-JP standard. Whether or not you use Unicode to represent text internally to your application, you will have to support all three standards for full file and data compatibility with the rest of the world. In Unicode, all characters are two bytes long. So, you no longer have to worry about testing for byteness in a Unicode stream. However, all ASCII characters are represented with a leading 0x00 in Unicode. So you can't have loops that look for a terminating NULL in a C-string. And, all your formerly one-byte text doubles in size unless you explicitly compress it (and then you lose the byteness testing advantage).

Whether or not you think testing byteness is too complex or expensive to do, you should know that Unicode also does another controversial thing: For the so-called "Han" languages (Japanese, Chinese, Korean) that use characters that originated in China, it attempts to unify them into one codepoint for each character judged by the Unicode Consortium to be unique, even if it has variant forms in each language. The same is true for Arabic languages (Persian & Farsi). Because of this, you cannot tell what language a character is in just from its codepoint. Unicode was not designed to be a multi-lingual solution, in that representations of Chinese and Japanese in the same document will have overlapping character codes, requiring the OS to provide a parallel linguistically-coded data structure to render the glyph forms appropriately to each language. This might be another version of today's font/script/language relationship on Mac OS. As you can imagine, the Chinese, Japanese and Korean governments have each published competing encoding standards to Unicode, labeling the latter as something designed by foreigners who didn't understand the issues (both political and linguistic) involved in trying to make a worldwide encoding system.

Another issue about Unicode is that although it can represent 65,536 characters, there is not enough space for all the Han characters and their variants, plus all the other languages that Unicode currently supports. New languages are becoming computerized as more countries join the Digital Revolution and the Unicode Consortium cannot give space to all of them. Preferring the flat encoding model, they came up with another standard that uses four bytes per character (the ISO 10646 encoding standard). Given that on the Internet, where many languages need simultaneous support on computers, bandwidth is at a premium, I would prefer using the ISO 2022 standard of mixed-byte (7-bit and 14-bit characters) plus the escape codes that tell you what language the current stream is in to sending 32-bit characters through the wire. Since most web pages use this encoding, expect your OS to provide utilities for encoding conversion (like the Mac OS Encoding Converter debuting soon on a Mac near you).

Cross-Platform Development Issues

Going cross-platform is already complicated without having to think about internationalization. Should you have separate codebases for maximum use of each platform's unique features? Or should you have a single codebase and use an emulation layer for the other OS's APIs? Each has its advantages, but for this article I can speak to those of you who have a joint codebase, and tell you about some of the things that the Windows platform lacks that you have to write yourself for multi-byte support and internationalization.

Windows has no Script Manager. There is no Gestalt Manager. It cannot support multiple two-byte codepages at the same time. It uses totally separate fonts for vertical and horizontal text. It supports proportional kana in Japanese, so you can't assume all two-byte characters are the same width.

If your code uses the Script Manager routines heavily, then you will have to write them yourself on the Windows side. All the convenience of the Mac OS's international routines comes very clear when you try the same things on a PC!

Also, Japan once again has its own special challenges. Until Windows came out in Japan, each computer manufacturer made its own proprietary OS and hardware. Even floppy disks were incompatible with each other. Now, most companies have adopted the Intel PC standard, but NEC continues to manufacture its own line of incompatible PCs. NEC has such a huge share of the market in Japan that it has teamed up with Microsoft to produce its own version of Windows95 for NEC. So when you buy Windows95 in Japan, you find there are three versions: MS Windows95 for Intel, MS Windows95 for NEC, and NEC Windows95 for NEC. All three versions are basically the same feature-for-feature, but the drivers are different and you need to test your application on each platform to verify compatibility.

On the hardware side, you will find that Japanese hardware is different: They use different displays, different keyboards, different printers, and different floppy formats. The drive lettering on NEC machines is different from Intel PCs: The hard disk drive is labeled 'A:' on one and 'C:' on the other. Make sure your installer isn't hard-coded to install on drive C:.

Conclusion

As we have seen, internationalization of your software on Mac OS is not very difficult to do, and it is to your benefit to try and enable as many users as possible to enter text in their own language when using your program. We have also examined Japanese localization in more depth, and demonstrated that Japanese language applications usually require some amount of new features designed specifically for that language's needs and conventions. As more markets around the world reach maturity, you can be sure that there will be ample opportunity to differentiate your product by adding locale-specific features. It is these locale-specific features that will tell your users that they are valued customers, and that their needs are being addressed in a very specific way. For your product, especially if you are in the initial designing phases, I would recommend you try to make it as easily expandable as possible. Design generic internationization into the core modules, while leaving open the opportunity to add locale-specific features for certain markets like Japan, as you see your product's market expand and rise in success.

Bibliography and Related Reading

  • Apple Computer, Inc. Inside Macintosh: Text, Menlo Park, CA: Addison Wesley, March 1993.
  • Apple Computer, Inc. "Technote OV 20, Internationalization Checklist," Cupertino, CA: Apple Computer, Inc, November 1993.
  • Griffith, Tague. "Gearing Up for Asia With the Text Services Manager and TSMTE," Develop Issue 29. Cupertino, CA: Apple Computer, Inc, March 1997.
  • Apple Computer, Inc. "Technote TE 531, Text Services Manager Q&As," Cupertino, CA: Apple Computer, Inc, May 1993.
  • Lunde, Ken. Understanding Japanese Information Processing, Sebastopol, CA: O'Reilly & Associates, September, 1993.

See also Ken Lunde's home page at http://jasper.ora.com/lunde/ for more information about multi-byte text processing on computers.


Nat McCully has been at Claris in the Japanese Development Group for the last 6 years. He has worked on numerous Japanese products, including MacWrite II-J, Filemaker Pro-J, Claris Impact-J, ClarisDraw-J, and ClarisWorks-J. He speaks, reads and writes Japanese, and enjoys traveling in Japan. He is currently working as Development Lead on the next release of ClarisWorks-J.

 
AAPL
$111.78
Apple Inc.
-0.87
MSFT
$47.66
Microsoft Corpora
+0.14
GOOG
$516.35
Google Inc.
+5.25

MacTech Search:
Community Search:

Software Updates via MacUpdate

Monolingual 1.6.2 - Remove unwanted OS X...
Monolingual is a program for removing unnecesary language resources from OS X, in order to reclaim several hundred megabytes of disk space. It requires a 64-bit capable Intel-based Mac and at least... Read more
NetShade 6.1 - Browse privately using an...
NetShade is an Internet security tool that conceals your IP address on the web. NetShade routes your Web connection through either a public anonymous proxy server, or one of NetShade's own dedicated... Read more
calibre 2.13 - Complete e-library manage...
Calibre is a complete e-book library manager. Organize your collection, convert your books to multiple formats, and sync with all of your devices. Let Calibre be your multi-tasking digital librarian... Read more
Mellel 3.3.7 - Powerful word processor w...
Mellel is the leading word processor for OS X and has been widely considered the industry standard since its inception. Mellel focuses on writers and scholars for technical writing and multilingual... Read more
ScreenFlow 5.0.1 - Create screen recordi...
Save 10% with the exclusive MacUpdate coupon code: AFMacUpdate10 Buy now! ScreenFlow is powerful, easy-to-use screencasting software for the Mac. With ScreenFlow you can record the contents of your... Read more
Simon 4.0 - Monitor changes and crashes...
Simon monitors websites and alerts you of crashes and changes. Select pages to monitor, choose your alert options, and customize your settings. Simon does the rest. Keep a watchful eye on your... Read more
BBEdit 11.0.2 - Powerful text and HTML e...
BBEdit is the leading professional HTML and text editor for the Mac. Specifically crafted in response to the needs of Web authors and software developers, this award-winning product provides a... Read more
ExpanDrive 4.2.1 - Access cloud storage...
ExpanDrive builds cloud storage in every application, acts just like a USB drive plugged into your Mac. With ExpanDrive, you can securely access any remote file server directly from the Finder or... Read more
Adobe After Effects CC 2014 13.2 - Creat...
After Effects CC 2014 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous After Effects customer). After Effects CS6 is still available... Read more
Evernote 6.0.5 - Create searchable notes...
Evernote allows you to easily capture information in any environment using whatever device or platform you find most convenient, and makes this information accessible and searchable at anytime, from... Read more

Latest Forum Discussions

See All

Make your own Tribez Figures (and More)...
Make your own Tribez Figures (and More) with Toyze Posted by Jessica Fisher on December 19th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
So Many Holiday iOS Sales Oh My Goodness...
The holiday season is in full-swing, which means a whole lot of iOS apps and games are going on sale. A bunch already have, in fact. Naturally this means we’re putting together a hand-picked list of the best discounts and sales we can find in order... | Read more »
It’s Bird vs. Bird in the New PvP Mode f...
It’s Bird vs. Bird in the New PvP Mode for Angry Birds Epic Posted by Jessica Fisher on December 19th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Telltale Games and Mojang Announce Minec...
Telltale Games and Mojang Announce Minecraft: Story Mode – A Telltale Games Series Posted by Jessica Fisher on December 19th, 2014 [ permalink ] | Read more »
WarChest and Splash Damage Annouce Their...
WarChest and Splash Damage Annouce Their New Game: Tempo Posted by Jessica Fisher on December 19th, 2014 [ permalink ] WarChest Ltd and Splash Damage Ltd are teaming up again to work | Read more »
BulkyPix Celebrates its 6th Anniversary...
BulkyPix Celebrates its 6th Anniversary with a Bunch of Free Games Posted by Jessica Fisher on December 19th, 2014 [ permalink ] BulkyPix has | Read more »
Indulge in Japanese cuisine in Cooking F...
Indulge in Japanese cuisine in Cooking Fever’s new sushi-themed update Posted by Simon Reed on December 19th, 2014 [ permalink ] Lithuanian developer Nordcurrent has yet again updated its restaurant simulat | Read more »
Badland Daydream Level Pack Arrives to C...
Badland Daydream Level Pack Arrives to Celebrate 20 Million Downloads Posted by Ellis Spice on December 19th, 2014 [ permalink ] | Read more »
Far Cry 4, Assassin’s Creed Unity, Desti...
Far Cry 4, Assassin’s Creed Unity, Destiny, and Beyond – AppSpy Takes a Look at AAA Companion Apps Posted by Rob Rich on December 19th, 2014 [ permalink ] These day | Read more »
A Bunch of Halfbrick Games Are Going Fre...
A Bunch of Halfbrick Games Are Going Free for the Holidays Posted by Ellis Spice on December 19th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »

Price Scanner via MacPrices.net

The Apple Store offering free next-day shippi...
The Apple Store is now offering free next-day shipping on all in stock items if ordered before 12/23/14 at 10:00am PT. Local store pickup is also available within an hour of ordering for any in stock... Read more
It’s 1992 Again At Sony Pictures, Except For...
Techcrunch’s John Biggs interviewed a Sony Pictures Entertainment (SPE) employee, who quite understandably wished to remain anonymous, regarding post-hack conditions in SPE’s L.A office, explaining “... Read more
Holiday sales this weekend: MacBook Pros for...
 B&H Photo has new MacBook Pros on sale for up to $300 off MSRP as part of their Holiday pricing. Shipping is free, and B&H charges NY sales tax only: - 15″ 2.2GHz Retina MacBook Pro: $1699... Read more
Holiday sales this weekend: MacBook Airs for...
B&H Photo has 2014 MacBook Airs on sale for up to $120 off MSRP, for a limited time, for the Thanksgiving/Christmas Holiday shopping season. Shipping is free, and B&H charges NY sales tax... Read more
Holiday sales this weekend: iMacs for up to $...
B&H Photo has 21″ and 27″ iMacs on sale for up to $200 off MSRP including free shipping plus NY sales tax only. B&H will also include a free copy of Parallels Desktop software: - 21″ 1.4GHz... Read more
Holiday sales this weekend: Mac minis availab...
B&H Photo has new 2014 Mac minis on sale for up to $80 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 1.4GHz Mac mini: $459 $40 off MSRP - 2.6GHz Mac mini: $629 $70 off MSRP... Read more
Holiday sales this weekend: Mac Pros for up t...
B&H Photo has Mac Pros on sale for up to $500 off MSRP. Shipping is free, and B&H charges sales tax in NY only: - 3.7GHz 4-core Mac Pro: $2599, $400 off MSRP - 3.5GHz 6-core Mac Pro: $3499, $... Read more
Save up to $400 on MacBooks with Apple Certif...
The Apple Store has Apple Certified Refurbished 2014 MacBook Pros and MacBook Airs available for up to $400 off the cost of new models. An Apple one-year warranty is included with each model, and... Read more
Save up to $300 on Macs, $30 on iPads with Ap...
Purchase a new Mac or iPad at The Apple Store for Education and take up to $300 off MSRP. All teachers, students, and staff of any educational institution qualify for the discount. Shipping is free,... Read more
iOS and Android OS Targeted by Man-in-the-Mid...
Cloud services security provider Akamai Technologies, Inc. has released, through the company’s Prolexic Security Engineering & Research Team (PLXsert), a new cybersecurity threat advisory. The... Read more

Jobs Board

*Apple* Store Leader Program (US) - Apple, I...
…Summary Learn and grow as you explore the art of leadership at the Apple Store. You'll master our retail business inside and out through training, hands-on experience, Read more
Project Manager, *Apple* Financial Services...
**Job Summary** Apple Financial Services (AFS) offers consumers, businesses and educational institutions ways to finance Apple purchases. We work with national and Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.