The Style Council
Volume Number: 22 (2006)
Issue Number: 9
Column Tag: Microsoft | Mac in the Enterprise
The Style Council
Externals and Internals of Styles on Microsoft Word 2004 for Mac
by Rob Daly & Rick Schaut
Word Styles are almost as old as Microsoft Word itself. In fact, Word Styles will celebrate their 20th anniversary next year, being the double helix of Word's 2 million + lines of code since Word 4.0 for PC released in 1987. This article exposes and celebrates both the fundamental cascading behavior of styles as they appear to the user and the back-end implementation of what is a very elegant piece of code infusion. For brevity, only some of the fundamental principles are discussed here.
What is a Word Style?
In simple terms, a Word Style is a collection of pre-defined or user customizable formatting properties. By default, when Word launches, around 150 Word Styles (of varying types) are available for users to choose from. This is a vast list that has been developed over the years to facilitate common formatting to common writing tasks. Given the obvious power and benefits of Word Styles, Microsoft Word 2004 for Macintosh introduced a number of UI enhancements and underlying changes to make styles more intuitive and accessible.
Figure 1. Word Style Gallery
In Microsoft Word 2004 for Mac, there are four Word Style types: paragraph, character, list and table. List and table styles are new for Word for Mac, and comprise of their own distinct characteristics. These will not be covered in any depth in this discussion. Instead we will try to understand how styles exist as cascading sets of properties and delve into how that might be implemented.
All Your BaseStyle Are Belong to Us
To understand how a style cascades through descending trees of properties, let's examine what a style looks like. The easiest way to do this is to peek at the style class defined in Word's object model. The style class is a sub-class of the bounding document class. Breaking it out into its properties, it looks like this:
Figure 2: Word Style Class
Fundamental to styles is the concept of cascading properties. For any instance of the style class, all of its inheritance is keyed off of BaseStyle property. Styles do not require a value for BaseStyle (in this case that property is set to "none"), but as we'll see later, there are computational advantages to using it.
For now though, let's focus on what the user sees. If Style C's BaseStyle property is Style B and Style B's BaseStyle property Style A, then changing an attribute of Style A will cascade through Styles B and C. However, there is a pecking order. Let's say that these styles are defined thusly:
Style A: Arial 10pt + Bold
Style B: Style A + Red character formatting, ((Arial 10pt + Bold) + Red)
Style C: Style A + Times New Roman ((Arial 10pt + Bold) + Times New Roman)
Style B is pretty simple - it simply changes the font color property (part of the font class, contained by the style's Font property) to red. Style C, however has two values for the font property. In this case, the sub-classed style (Style C) takes precedence for the font property. So Times New Roman will be the font for Style C.
If we now go and change Style A to Apple Chancery 10pt. + Bold + Blue + Italic, what will become of Styles B and C? The logic is the same - again, the first thing we need to consider is pecking order. The base style will always change the style that's sub-classed from it, unless the latter contains a property value that explicitly contradicts a property value set in the BaseStyle definition. So, as in this example, changing the font property and the color property of Style A will not override the settings for those properties in Styles B and C. The next result then is that when we make our changes to Style A, we end up with Apple Chancery + 10pt + Bold + Italic + Red (Red trumping Blue) for Style B and Times New Roman + Bold + Italic + Blue (Times New Roman trumping Apple Chancery).
Much like CSS, time invested in creating styles that have cascading properties can be a huge benefit, but if things ever become too generic or just don't work, we can override any base style property by setting it as an overriding value in our new style.
The style types themselves are not necessarily discrete in terms of properties types either. When applying a style to, say, a document element such as a paragraph, the style definition will naturally change the properties of that paragraph. Moreover, it will also encode the character formatting properties that are implicit to the style definition. Similarly, when we apply a table style, it encodes both paragraph and character properties (perhaps even list properties) to its cell content by the nature of what that content is: any number of paragraphs of any number of characters (or "text runs"). In essence then styles help us to chart out the hierarchy of a document through chains of elements that form the building blocks of more complex elements each of which comes with it own set of unique properties.
This brings us to our next benefit, and that is one that helps both the user and Word's own internal processing. Because of the essential etiology of a style being rooted in the properties of another style, Word chooses to store that defined collection of properties just once. To track styles that are based upon other styles, Word stores only the differences.
For the user this provides some nifty benefits. Because of the use of both cascading properties and building by difference, Word can actually strive towards making smart choices based on the context in which a style is used. Sometimes users get confused by this behavior so let's also take a moment to illustrate how this works: by default Word uses the "Normal" style for all standard text in a document. In Microsoft Word 2004 for Mac, this style is defined as Times New Roman + 12pt. If I now build a new character style that I call "Emphasis text" that is based on "Default Paragraph Font" (which is the font of the default paragraph style, Normal) and has Italic applied, then that will amount to (Times New Roman + 12pt) + Italic. Applied over a paragraph set to Normal style it looks like this:
Now what happens if a user modifies the Normal style to make it Times New Roman + 10pt + Italic? What would happen to the Emphasis Text style? Technically, Emphasis Text now becomes ((Times New Roman + 12pt + Italic) + Italic). But isn't that second Italic superfluous? In this case, no. Because, this is direct formatting that is used in the style to create a difference from the base style. Word will take the second application of italic, notice that it has been applied to the base style, proceed to remove italic formatting. Does this mean that ++ = - and vice versa? Not always, as is the case with Word's internal "Emphasis" style. However, In the most cases of direct formatting options (such as Bold, Italic, Underline, etc), it does. Let's get back to the purpose of our style - it was to provide emphasis. When the style was designed, Italics were chosen to be the delta from the base style that supplied that emphasis. By changing Normal (the base style) to now also have Italics, does the Emphasis text style now serve no purpose? Not if emphasis is now instead supplied by the absence of italics. This is a concept many users find difficult to understand, but when it's grasped it can make life much easier. Instead we can now define styles and think of direct formatting as switches instead of hard formatting. This is certainly not easy with manual formatting, especially if you have, say, a 300 page document.
It is also worth pointing out some Word Style differences between Word 2003 for Windows and Word 2004 for Macintosh. In Word 2003 for Windows, a new LinkStyle property was added for styles. When selecting a portion of a paragraph and applying a different paragraph style to the selection instead of the entire paragraph, Word 2003 creates a new character style that is "linked" to the paragraph style. This new behavior does not exist in Word 2004 for Macintosh since our target users preferred the existing behavior prior to Word 2003. While this is not a major difference in terms of typical usage, knowing the differences here can help explain potential discrepancies when examining styles information when working in cross-platform environments.
Now that we have observed a modicum of how a style behaves, let's indulge in a little of what that means to the internals of Word. Having covered some fundamental elements of style properties and their rules, it's exciting to examine the implementation of those properties and where they live in the global food chain of Word's internal properties.
Every property in a Word document has an associated property modifier (PRM--pronounced "PERM"). Each PRM is grouped according to the kind of property it modifies. Word has six different PRM groups: character, paragraph, picture, table, section and document. While not specifically germane to styles, it is worth noting that there are sub-groups of property modifiers. For example, one can think of list properties as a distinct sub-group of paragraph property modifiers. Another example would include property modifiers for revision marking.
We assign an arbitrary value to each of these PRM groups, and refer to each arbitrary value as a PRM group code (PGC). While the actual values we assign to each PGC are arbitrary, it makes sense to assign them in order of frequency of use. For example, paragraph property modifiers appear more often in documents than table property modifiers, so the actual value we assign to the paragraph PGC should be less than the value we assign to the table PGC.
We can think of a PRM as analogous to an assignment operator, or op-code. The left-hand side of the op-code is the property group that the PRM modifies. The right-hand side of the op-code is the argument list associated with that PRM. Returning to the Emphasis Text example, the character PRM for "italic" has a single argument specifying whether the "italic" character property should be on, off or toggled from its current state. The combination of a PRM and its argument(s) is known as a property list (PRL--pronounced "PREL"). A PRL can be simple, as in the case of the bold or italics character properties, or very complex. A number of property modifiers in the table group fall into the latter category.
We can pack multiple property lists into a packed array, or a group of property lists (GRPPRL--pronounced "grouprel"). Moreover, we can sort the each PRL within the GRPPRL according to the PGC of the PRM in the PRL. Thus, all PRL associated with paragraph property modifiers will appear before any character property modifiers, and character property modifiers will appear before any table property modifiers.
With this design, we can write a simple function to apply a given GRPPRL to a property group, and this function can be agnostic to the actual property group being modified. For the sake of illustration, we'll assume the existence of four functions: PrmFromPrl, PgcFromPrm, CbFromPrl, and ApplyPrlPgc. Their actual implementation is left as an exercise for the reader, but their C/C++ prototypes would be:
PRM PrmFromPrl(unsigned char *prl);
// Extracts the property modifier from the given property list
PGC PgcFromPrm(PRM prm);
// Returns the property group associated with the given property modifier
size_t CbFromPrl(unsigned char *prl);
// Returns the count of bytes occupied by the given property list
void ApplyPrlPgc(void *pvProperties, PGC pgc, unsigned char *prl);
// Applies the given PRL to the properties of the given PGC
Our code to apply a full GRPPRL to a given set of properties would look like:
void ApplyGrpprlPgc(void *pvProperties, PGC pgc, unsigned char *grpprl, size_t cbGrpprl)
unsigned char *prlLim = grpprl + cbGrpprl;
unsigned char *prl = grpprl;
// skip past any PGC's we don't care about
while ( prl < prlLim && PgcFromPrm(PrmFrmPrl(prl)) < pgc )
prl += CbFromPrl(prl);
// Now apply the ones we do care about
while ( prl < prlLim && PgcFromPrm(PrmFromPrl(prl)) == pgc )
ApplyPrlPgc(pvProperties, pgc, prl);
prl += CbFromPrl(prl);
assert(prl == prlLim);
That's it! Though, it should be pointed out that the actual code in Word is both a bit more complex and a bit more robust than this example. For the sake of clarity and for the purpose of focusing on styles, we've left out some details that would be important to an actual shipping application.
What's nice about this design, and very useful when it comes to styles, is that we can encode the difference between two property sets as a GRPPRL that will transform one property set into the other. In order to make this work, however, we have to have a base set of properties that forms our starting point (our base style). Thus, Word has something known as a StandardPap and a StandardChp (PAP and CHP being paragraph and character properties respectively). There are, in fact, standard versions for each property group, but, as we have done before, we'll focus on the PAP and CHP property groups for the rest of this discussion.
As noted, each style has a base style. The base itself can be "none," which actually
means that the base property groups for the style are the standard property groups. With this design,
the work to derive, say, the character properties associated with a given style is a very simple,
recursive algorithm given by the following pseudo-code:ChpFromStyle(out chp, in style)
if base from style is none
chp = standard chp
apply pgcChp GRPPRL from style to chp
The algorithm to derive the full set of paragraph properties from an arbitrary style is left as another exercise for the reader!
Form and function
For the reader who finds Microsoft Word a fascinating topic, this discussion has, hopefully, provided the sparks for further thought. On the inside, we have an exercise in organization and recursion. On the outside, we have a set of cascading structures upon which the savvy user places her trust!
Applying some of the concepts, it would be pretty easy to create an AppleScript that showcases the PAP/CHP dichotomy while providing a somewhat useful functionality. When writing an AppleScript code sample in Word, it's bothersome how much formatting it takes to make it look like compiled AppleScript. This, of course, is easily remedied by just copying and pasting from an AppleScript editor directly into Word. What, though, if I want the AppleScript formatting to be easily changeable within Word? Rather than changing my AppleScript settings for every formatting change I want to make (say, make comments red), why not create styles for all of the formatting types that AppleScript commonly uses. Applying those styles is still tedious though, but not if we create an script that can do it for us.
The AppleScript provided on MacTech's ftp site this month will do something like that, creating the styles needed and then setting about the rather tedious task of parsing and applying. The script itself is rather inefficient, and could be improved by the energetic reader. Also, it presumes that the document itself is all AppleScript code, but could easily be altered to apply the formatting to a text range or a selection object. Finally, to avoid too much complexity, all variables must be declared for this script to effectively format variables - but there are smarter ways to do that too...
(Script found here <http://www.mactech.com/editorial/filearchives.html>.)
As an exercise, the reader should print the script and transcribe or copy this script into Word (in Normal). Then copy the script into your favorite script editor and run it! The resulting Word document should look pretty close to the complied script in the script editor. The avid reader will find the few minor bugs.
Once, you've run this script, change any of the style whose name is prepended "AS" and watch how the document can change its look very quickly. Once again, if this had been formatted by hand, it would take hours to make the changes that the style can make in one action. The fact that the character properties (CHP) can co-exist with the paragraph properties (PHP) means that I can make one paragraph style be the base for all Applescript (ASBaseStyle) and formatting by case using my various character properties. The character styles laid upon with apply formatting to the "Default Paragraph Style" which in this case is "ASStyle". AppleScript fun!
The ability to build a document relies on an understanding of those document structures. These being defined by Word PRM categories form the basis for how a document works, not just for Word, but for the writer. An understanding of these categories leads the individual to a quest for compartmentalization and efficiency within their paradigms. The bounty of that quest lies nested in the cascading palimpsests of Styles. As shown in this article, understanding Word Styles is not only vital to understanding how Word works, it can further enable one to take full advantage of the power of Word.
Rob Daly is the Word Test Lead for Microsoft's Macintosh Business Unit. He is known for walking the corridors of Microsoft in search of Rick Schaut.
Rick Schaut is the Word Development Lead for Microsoft's Macintosh Business Unit. He is a veteran Word developer, having played a part in every major release of Word for the Macintosh since Word 5.0. He is known for walking the corridors of Microsoft eluding Rob Daly.