November 93 - QD3D
Jonathan A Hess
Special effects created by high end computer graphics
hardware awe almost everyone. Jurassic Park is a
recent and widely seen example. Less theatrical applications are just as worthy of 3d graphics, and many even require the third dimension. Examples include medical imaging, molecular modeling, scientific data visualization, and visualization of detailed information hierarchies. This article briefly describes an architecture for interactive 3d applications, discusses implementation issues and suggestions for that architecture, and describes the Qd3d library that serves as the foundation for such an architecture.
Elements of an interactive 3d application
Figure 1 represents key components of 3d applications and the types of information passed between those components. The user drives the hierarchy from the top by interacting with the view window, view controller, and model editing tools. Visual feedback to the user is provided by the 3d renderer at the bottom. The view window sequences update events and sends "draw yourself" commands to the application specific model. The view controller is responsible for manipulating the viewing parameters of the three dimensional scene including viewer position and viewer direction. The view controller may also manipulate viewing parameters to act as an animator for movie recording.
At the center of any 3d application is the organization of the data being modeled. Examples of application specific models include simple geometry information describing a building, a kinetic simulation of a human gymnast, a vector field interpolation of engineering data, and etc. There are three key things to remember regarding the application specific model. First, application code is responsible for saving, organizing, and retrieving the essence of the model. Second, application code breaks the draw yourself message down into the 3d drawing primitives recognized by the 3d renderer. Third, to decrease module cross coupling the model rendering code rarely concerns itself with the 3d viewing parameters.
Separating the lighting model, model editing tools, and view controller elements from the model itself is important from a flexibility and maintainability standpoint. By keeping the lighting model separate, it becomes feasible to have a default high-speed lighting model that can be swapped out with a slower but more realistic lighting model. The model need only tag different elements with attributes such as "sand," "copper," or "red." These attributes are then passed to the lighting model which in turn checks viewing parameters and calculates a suitable RGB color.
Even more critical is the flexibility required for the editing tools used for model manipulation. Editing tools often interface to real devices such as the Z Mouse from Multipoint Technology Corporation [Multi93] and the Mattel Power Glove [Glove93]. Alternatively, editing tools may interface to virtual 3d control devices such as the triad mouse [NiO86], or a virtual sphere controller [Chen93]. Editing tools may also take advantage of application specific conditions. For example, if a 3d object's movement is restricted to a plane, 2d desktop mouse movement can directly be mapped to that plane. Query tools such as "tell me the name of the object I'm pointing at" or "what is the height value at this point on the surface" also qualify as editing tools. Finally, the current editing tool must easily switch between different tools similar to how a draw program lets users switch between translation and rotation operations. Only by separating editing tool code from the model can these manipulation requirements be met.
At the bottom of Figure 1 lies the 3d renderer. The renderer is responsible for creating the image resulting from the primitives called out by the model "draw yourself" code. Editing tools and the view controller also use the renderer to provide 3d manipulation feedback. Finally, the renderer maintains and manipulates the viewing parameters under direction from the view controller.
Key to the 3d application framework of Figure 1 is the lack of specifics and constraints. Swapping out different elements of the architecture, such as the lighting model, has already been discussed. In addition, the lighting model, the model editing tools, and the view controller are optional. When working entirely with wireframes, there is little need for a lighting model. If the model is static, there is no need to provide editing tools. And, if the application is a "fly through" visual explorer, the application is the view controller. Also, the diagram does not preclude multiple view windows from sharing a single application model. Simultaneously viewing a model from multiple directions helps with understanding and manipulation issues. Another consideration is that multiple hardware input devices could feed into the system at once. Finally, nothing prevents this 3d application framework from being built on top of favorite application frameworks such as MacApp, the Think Class Library (TCL), or Bedrock. For example, "viewing window" could be an item in a MacApp or TCL view hierarchy.
Where do Qd3d, 3dPane, and SmartPane fit in?
The application structure in Figure 1 describes interactive 3d applications as opposed to photo-realistic but largely static applications. As a result, a polygon based renderer provides an appropriate foundation for the figure. Qd3d is such a renderer.
Qd3d has Macintosh roots reaching back to the Fall of 1988. After existing as an in house development tool, Qd3d went commercial in the Spring of 1992 with version 1.2. Since that time, individuals and institutions have used Qd3d in applications such as sports medicine, geology, molecular modeling, information sciences, robotics, and scientific visualization. The most recent version, 2.1, was released in September 1993 and features support for the Cyberscope stereoscopic viewer from Simsalabim Systems.
Qd3d is a collection of classes and routines that provide a robust set of features for 3d polygonal rendering. Primitives include point marking, line drawing, polygon framing and filling, and text placement. Except for text placement, these primitives are defined in color constant and color varying varieties. Color constant primitives use the present GrafPort's foreground color and color varying primitives allow the association of individual colors to each vertex of the primitive. During quality renderings, color varying primitives smoothly blend color between the vertices. Polygons filled with such a color blending are referred to as Gouraud shaded polygons.
Similar to QuickDraw's rendering of 2d primitives through the GrafPort pointed to by "thePort," Qd3d sends its 3d primitives through "the3dPort." Unlike QuickDraw's thePort structure pointer, the3dPort is an object pointer of type CQd3dPort. As a result, access to Qd3d viewing parameters, such as viewer location and viewing direction, are cleanly wrapped in messages defined for the CQd3dPort class. In addition, all aspects of 3d rendering can be specialized or augmented by subclassing, overriding, and inheriting.
CQd3dPorts maintain six rendering options which affect the appearance of every primitive. These rendering options dramatically modify rendering speed and quality and are known as OnlyQD, Wireframe, UseZBuff, DepthCue, and clipping level. OnlyQD controls whether Qd3d quickly renders color varying primitives as color constant. Wireframe maps polygon fill primitives to polygon frame primitives. Clipping level controls how accurately Qd3d clips primitives to the boundary of the viewing window. The default clipping level clips primitives so they fit just within the view window. In contrast, the drastic clipping level entirely skips primitives that cross the viewing area. UseZBuff requires OnlyQD to be off and uses a "z-buffer" the same size as the image for hidden surface removal. Finally, DepthCue causes distant primitive colors to be blended with the scene background color but leaves primitives closer to the viewer at their original color. This background color blending helps with depth perception. The manipulation of these six rendering options allows the same model rendering code to be used for quick OnlyQD previews and for final Gouraud-shaded z-buffered renderings.
Text placement primitives render 2d text at the projection point of a 3d location. Text rendering options and attributes include: left, right, and center justification; whether the text honors depth cueing; automatic font scaling with respect to distance from the viewer; whether text is removed if its area is greater than the projection area of a 3d object; and whether all text should be skipped.
CQd3dPort supports parallel and perspective projections. The familiar perspective projection renders close objects larger than more distant objects of the same size. In contrast, parallel projections appear flat because objects of the same size are rendered at the same size regardless of distance from the viewer. Parallel projections appear unnatural but have engineering applications that perspective projections can not support.
Qd3d also supports stereoscopic projections with the CStereo3dPort and CCyberscope3dPort subclasses. Depth perception is a function of the visual cortex comparing minor differences between left and right eye views. Unfortunately, the 2d nature of CRT monitors strip 3d graphics of their depth information. This becomes acute in scenes that are unfamiliar to users or when the refresh rate of view reorientation is low. Using stereoscopic techniques restores depth perception by creating and presenting different images to both the left and right eyes.
CStereo3dPort acts as a template for Qd3d stereoscopic projections and provides for straight-eyed and cross-eyed side-by-side stereoscopic image pairs. Figure 2 is an example of a cross-eyed side-by-side stereo image pair.
To view Figure 2, hold the images squarely to the line of sight. Look cross-eyed at the image pair and four images will appear. Vary cross-eyedness so the two inner images overlap. Concentrate on the center image until it becomes focused. Practice. Side-by-side stereo images require no additional gadgets for viewing but beginners find it difficult to fuse image pairs and such viewing causes eye strain and fatigue.
An economical, on the order of US $180, device to recombine stereo image pairs is Simsalabim's Cyberscope [Sims93]. The Cyberscope is a hood velcroed to the front of a regular computer monitor. Looking straight through the Cyberscope reveals the monitor screen as before with the exception of a vertical divider to separate left and right eye views. Looking down in the Cyberscope allows its front surfaced mirrors to recombine left and right eye views rotated on the monitor into a stereoscopic image. Advantages of the Cyberscope over other 3d viewing methods include no need to go cross-eyed, full color, increased depth perception from rotated images being wider than they are tall, and no flicker from shuttered LCD glasses. The CCyberscope3dPort class, included with Qd3d, creates the rotated images as required for the Cyberscope.
3dPane and SmartPane
As Qd3d provides an application framework independent 3d renderer for the architecture in Figure 1, 3dPane provides a view controller and view window for the TCL. The C3dPane class marries the3dPort of Qd3d to the TCL CPane class. TCL application developers override the Draw3dScene method for their application specific drawing. C3dPolarControl provides a polar coordinate based view controller. With C3dPolarControl a horizontal scroll bar controls longitudinal viewing position about an object of interest and a vertical scroll bar controls the latitudinal viewing position.
The SmartPane library supplements the existing TCL Pane hierarchy with offscreen image buffering that reduces flicker. SmartPane also adds animation support and QuickTime movie recording of images.
Application 3d Models
This section briefly suggests and describes considerations for developing a class hierarchy forming the application specific model of Figure 1. This section also describes facilities present in Qd3d for supporting tight integration between the application model and actual rendering.
Many 3d objects are appropriately rendered as collections of polygons known as polyhedra. A key feature of polyhedra is that many polygons often share the same vertices. Taking advantage of this sharing allows polyhedra to be described by vertex locations and a special list of polygons. The special list of polygons lists the polygons as ordered lists of vertex indices. This representation of a polyhedron adds structure when multiple polyhedra are involved, and it compresses the data used to represent the polyhedron. Data is compressed because values for individual vertices are stored only once and subsequently referenced by index in the polygon lists.
The idea of referencing vertices by index is important from a rendering performance standpoint. Transformation and projection of a world coordinate point to screen coordinates is computationally expensive. By using the vertex indexing scheme during polyhedron rendering each vertex can be transformed and projected once into a cache of screen coordinates. As each individual polygon is rendered the screen coordinates of the associated vertices are looked up in the screen coordinate cache. Without this cache rendering a polyhedron can result in approximately 4 to 6 times as many transformation and projection operations as necessary.
Qd3d includes a meta-primitive library called Hedra. Hedra implements the polyhedron description and screen coordinate caching scheme just described. Hedra is fully described in Qd3d documentation [Viv93] and an example usage is given in [Hess92b]. Hedra suffices for very simple modeling requirements but begins to fail when polygonal rendering is used to approximate curved surfaces.
Qd3d Integration Support for Models
Qd3d includes support for a high level of integration between application specific rendering meta-primitives and Qd3d's rendering pipeline. The focal point of this integration is the PBPoly3dPrim message of CQd3dPort. That message takes a parameter block indicating the primitive type, such as polyFrame or polyFill, a pointer to the world coordinate vertices of the primitive, and an optional pointer to vertex colors for color varying primitives. The parameter block also includes an optional screen coordinate cache pointer allowing improved efficiency under meta-primitive control. Vertices indicated in the parameter block may be in sequential array order or may be specified by index with optional vertex index arrays. These vertex and index arrays are analogous to those previously described for Hedra. In fact, the Hedra meta-primitive uses PBPoly3dPrim in its implementation. For an example of PBPoly3dPrim usage consider a scheme for rendering a triangular Bézier patch.
Figure 3 depicts a wireframe polygonal triangular Bézier patch approximation. Composite triangles are rendered triangle by triangle and row by row. For performance reasons the Bézier patch points, the vertices of the triangles, are cached at world coordinate and screen coordinate levels. World coordinate patch points are cached because their calculation is expensive. Screen coordinate points are cached to avoid the redundant transformations and projections previously noted. Colors at triangle vertices are also cached and are used when Gouraud shading the patch. Pseudo code for rendering the patch:
get points, normals, colors, & screen coordinates for row 0 points
for every row i
Get points, normals, colors & screen coordinates for row i+1
for every triangle j in row i
fix parameter block cache indices
render triangle (call PBPoly3dPrim)
set caches for next row
Hidden Surface Removal
Another issue for performance integration is hidden surface removal. A popular method for hidden surface removal is z-buffering. Z-buffering correctly places one polygon in front of another, handles intersecting polygons, and performs this hidden surface removal regardless of polygon rendering order. Unfortunately, using a z-buffer requires an amount of storage equivalent to that used for the image and z-buffering takes a severe performance hit on the Macintosh which lacks a hardware z-buffer.
To avoid the use of a z-buffer and still have realistic pictures painter's algorithm techniques are recommended. Briefly stated, the painter's algorithm renders polygons most distant in the scene first followed by polygons closer to the viewer. When a scene is completed the closest polygons were rendered most recently and have overwritten more distant polygons. However, the painter's algorithm will fail when polygons intersect or when polygons have large depths of field.
Again consider the triangular Bézier patch. As depicted in figure 3, the rendering order honors the painter's algorithm if the point C is closest to the viewer, then the point B, and finally A. If point A were closest to the viewer, the labeled rendering order would exhibit hidden surface rendering errors. A solution for the triangular Bézier patch is to modify the rendering order dependent on the relative distance of A, B, and C from the viewer. For triangular Bézier patches this reordering may still fail if the patch is not well behaved.
Transform Based Modeling
One final issue regarding Qd3d integration with the application model regards transformation matrix concatenation. In some 3d modeling applications objects are defined not by the world coordinate locations of object points but by transforms on object points given in some local coordinate system. Using such transforms allows objects with many points to be simultaneously rendered and moved through world coordinate space more efficiently. Transform based definitions also reduce memory requirements when many scene objects are duplicated from a small set of master objects. To support the use of transform based object modeling, CQd3dPort provides get and set transformation matrix operations. Pseudo code describing the rendering of a transform based objects follows (it makes sense to those who need to know):
retrieve and store the existing transform matrix (CQd3dPort::GetTMat)
create the matrix mapping local object coordinates to destination
multiply the mapping matrix and the stored matrix with the stored
matrix on the right
set the Qd3d transform matrix to the new matrix (CQd3dPort::SetTMat)
generate the primitives in local coordinates for the transform based
restore the original transform matrix (CQd3dPort::SetTMat)
For complete transform control, CQd3dPort::Transform could be overridden.
Modeling Using an OOP Class Hierarchy
Advanced modeling requirements quickly outstrip Hedra's abilities and the different modeling requirements of applications often clash with each other. For this reason, ViviStar recommends developing modeling code tailored to individual application requirements rather than attempting to use a single "catch all" modeling method. Catch all modeling methods introduce too many unnecessary restrictions and complexities. ViviStar also recommends developing an object oriented class hierarchy for modeling that could potentially have parts reused in similar applications. Specific requirements for new but related applications could quickly be implemented by subclassing and specializing existing classes in the model hierarchy.
What are key concepts and traits for a 3d modeling class hierarchy? Usage, composing, containment, processing, and easy expansion.
Usage implies that different types of elements in a model should be accessed and manipulated in similar ways. For example, if a manipulation tool keeps track of selected elements in a selection list, the tool should be able to send list elements "move this direction with this distance" messages and all list elements should respond accordingly. Stated another way, the base class of the modeling hierarchy should define a minimal set of operations that all elements respond to.
Composing and containment relate to building higher level elements from compositions of multiple sub-elements. For example, a "human" is a composite of "head," "abdomen," and 4 "limb" elements. In turn, limbs are composed of "long bone" elements and etc. Containment implies that the union of the volumes of constituent elements are completely enclosed within the bounding volume of the composed element. Because of this containment property, ray intersections and other volume space queries need not be performed for every single atomic element in a model. If a ray does not intersect the bounding volume of a composite element, the ray can not possibly intersect one of the contained elements. This type of trivial rejection for queries is important for performance considerations in element and user interaction. Composing does not imply containment because a reference element, such as an alignment plane, could be used in the composite definition of an element. Nevertheless, that composite element does not "contain" the reference element.
Containment expedites implementation of the painter's algorithm for hidden surface removal. Draw messages are sent to contained elements in back to front order. If the sub-elements also honor the painter's algorithm, a quick hidden surface removal algorithm for the model is complete.
A strict tree composition hierarchy suffices for most models. In tree hierarchies display and query functions can be implemented with simple recursion. However, if a model allows more than a tree hierarchy, such as two composite elements referencing another element for alignment, simple recursion will result in redundant processing or even infinite recursion. Therefore, "processing" refers to the necessary mechanisms to prevent such redundancy and infinite recursion. For a more detailed description of processing issues for a specific application see [Hess92a].
The final aspect of a class hierarchy for 3d modeling is the most vague. When creating new element classes it preferably requires as little overriding and creation of new methods as possible. Hopefully many of the details in maintaining usage, composing, containment, and processing will already be taken care of and supported by the existing element hierarchy. Therefore, if this support is already in place, an element hierarchy will provide for "easy expansion."
Many application models do not require all the key concepts and traits discussed here. As previously stated, an application model that is strictly tree based does not require an involved mechanism for processing. Trying to implement processing support when simple recursive techniques will do clutters and unnecessarily complicates the application. If an application only performs quality rendering and has no interaction, there may be little need for implementing the containment concept. In short, the model hierarchy need only be as sophisticated as the application requires. Anything more than that becomes a performance, maintenance, and development liability.
Macintosh applications traditionally excel at direct manipulation and this feeling of direct manipulation is largely what makes the Macintosh user interface so friendly, successful, and productive. Use of the third dimension provides a whole new set of problems, considerations, and opportunities for user interface designers and implementors. Premier among these problems is that the vehicle typically driving direct manipulation, the desktop mouse, is strictly 2d. While 3d input devices exist they typically drive solution costs up and differ widely in capability.
Mice, Modes, and Mouse Clicks
The primary function of the 2d mouse is the specification of relative 2d locations. Devices for specifying points in three dimensions range from a box with three dials to "boxes" that sense the position of an ultrasonic transducer attached to a glove or helmet. As already stated, the use of such physical devices drive system cost up. Fortunately, alternatives exist to squeeze 3d location information out of 2d devices.
One archaic method for placing points in a 3d scene involves the use of at least two different projections of the 3d scene. Operators are required to indicate a 2d point in each projection and those 2d points are then converted into corresponding 3d rays. The 3d point specified is the intersection of the rays or the midpoint of the shortest segment between the rays.
A more elegant solution is the triad mouse [NiO86]. Triad mouse operation divides 2d mouse movement into six regions of directional movement. The six directions correspond to the positive and negative directions of the projections of the 3d x, y, and z axes. Relative mouse movement in one of the six 2d directions is mapped and restricted to a relative 3d movement in the associated 3d axis direction. Using the triad mouse also requires "triad cursor" feedback as a visual cue for the 6 mouse directions. Mapping 2d movements to visually corresponding 3d movements gives the triad mouse an intuitive characteristic unlike other 2d methods for 3d manipulation. Unfortunately, triad mouse operation becomes awkward when one of the 3d axes becomes perpendicular to the screen or when projections of two of the axes become relatively collinear.
Another strength of the Macintosh user interface is the lack of modes and the automatic transition between modes [App93]. If a 3d application supports pull down menus and 3d point specification using a triad mouse, there is an inherent mode transition between 2d and triad mouse behavior. If the user must make a conscious effort for this mode transition, such as issuing a command key solely to start triad operation and another command solely to return to 2d mouse operation, it is likely an example of a poor user interface.
The author's favorite editing tool scheme allows 3d model element selection using the 2d mouse. When a user starts dragging on an element the application interface naturally presumes the user wishes to translate (move) the element in three dimensional space. The mode transition to triad mouse operation is inherently implied, automatically made, and triad mouse operation begins. After the user has specified the 3d translation with accumulated triad mouse movement, the user releases the mouse, the 3d translation is applied, and mouse operation returns to 2d mode.
A 3d application user interface may also be streamlined by taking advantage of application specific constraints. For example, if a user needs to place points on a 3d sphere, there is no need to attempt to use a triad mouse or some scheme of mapping mouse x and y movements to longitude and latitude coordinates on the sphere. Simply use the closest intersection of the sphere with the 3d ray corresponding to a normal clicked 2d mouse location.
Since conversion of 2d screen locations to world coordinate 3d rays requires access to viewing parameters and transformations, such functionality should be a feature of the foundation 3d library. This functionality is needed to isolate manipulation code from the details of viewing parameters and associated coordinate transformations. Similar arguments apply to the interplay between the 3d axes of the triad mouse and their 2d projections. Qd3d answers these requirements with the GetXRay and UpdateTriadPosition methods. The stereo port subclasses also override these CQd3dPort methods so they function properly with stereoscopic projections-manipulation code works correctly whether being used with a mono or stereo projection.
Event Transition Diagrams and Languages
While the direct manipulation experience is much more rewarding to its users it provides a greater challenge to application developers. Two methods of describing and increasing the functionality, maintainability, and reliability of direct manipulation interfaces, a class of human-computer dialog, are event languages and state transition diagrams [Gre86].
Consider Figure 4 from [Hess92a] in light of the automatic 2d and 3d mouse mode switching.
The bubbles represent different states of the manipulation tool and the arcs represent different events that can occur. Arcs may also associate processing actions to events such as changing state information and providing user feedback. For example, the "MouseMove" arc of the neutral state could adjust the cursor shape to provide feedback on the type of element underneath the cursor. Using state diagrams is very useful to designers for communicating thoughts and considering interface alternatives. Techniques for finite state machines from theoretical computer science also allow the diagrams to be analyzed for unreachable nodes and states of no return. Finally, the diagrams provide concrete guides to interface implementors.
Event language concepts provide facilities for converting the direct manipulation state diagrams into actual code. Event language "keywords" correspond to events such as ButtonDown and MouseMoved. Event language "handlers" correspond to a single type of manipulation tool by collecting code for recognized keywords and containing tool state information. For implementation, the object oriented class construct provides a framework to build event language "handlers" from. Messages defined in a tool class define the keywords of the event language and instance variables store state information. An object instantiated from such a class is an event handler-a manipulation tool. Additionally, the inheritance in object oriented subclassing facilitates the development or specialization of tools for a specific application. Finally, since tools have been created as objects, it is easy to switch between tools at runtime. This switching includes between tools that connect to physical 3d devices or, when those devices are not present, to fall back on virtual 3d tools that use the 2d mouse.
This article briefly introduced an architecture for interactive 3d applications and has discussed related implementation strategies and issues. Present personal computers such as the Macintosh possess enough power to make effective use of 3d graphics today and the future will only see more applications making sophisticated use of three or more dimensions to increase the information bandwidth from computers to humans. With a little time, personal computer hardware developments will enable even more intensive 3d processing. For ingenious and creative developers use of the third dimension opens a new set of opportunities today and these opportunities will only become greater with time.
[App93] Apple Computer Inc., Macintosh Human Interface Guidelines, Addison-Wesley, 1993.
[Chen93] Chen, M.: 3-D Rotation Using a 2-D Input Device, Develop: The Apple Technical Journal, 14 pp. 40-53 (June 1993).
[Glove93] Email subscription list. Send message body "subscribe glove-list <your name>" to email@example.com.
[Gre86] Green, M.: A Survey of Three Dialogue Models, ACM Transactions of Graphics, 5(3) 244-275 (1986).
[Hess92a] Hess, J. A.: A Direct Manipulation Three Dimensional Software Visualization Tool, MS Thesis, Arizona State University, 1992.
[Hess92b] Hess, J. A.: Qd3d in Action, THINKin' CaP: The Journal of the Symantec Programming Languages Association, No. 4, pp. 66-73 (August 1992).
[Multi93] Multipoint Technology Corporation; Suite 201, 319 Littleton Road; Westford, MA 01886; (508) 692-0689; AppleLink: MULTIPOINT
[NiO86] Nielson, G. M. and Olsen, D. R. Jr.: Direct Manipulation Techniques for 3D Objects Using 2D Locator Devices, Proceedings 1986 Workshop on Interactive 3-D Graphics, Chapel Hill, pp. 259-269, 1986.
[Sims93] Simsalabim Systems, Inc.; PO Box 4446; Berkeley, CA 94704-0446; (510) 528-2021.
[Viv93] ViviStar Consulting: Qd3d & 3dPane: User Manual: Version 2, Scottsdale, AZ, 1993.