OpenGL: An API for Interactive 3D Graphics
Volume Number: 20 (2004)
Issue Number: 1
Column Tag: Programming
OpenGL: An API for Interactive 3D Graphics
by David J Harr
3D for fun, profit, and world domination
This is the first in an open-ended series of articles intended to explain the basics of 3D interactive programming on the Macintosh using OpenGL. This article will serve as an introduction to OpenGL, give the history of its development, discuss the details of its architecture, and talk about Apple's implementation. Finally, we will close with a program showing some of the capabilities of OpenGL, without going too deeply into the details of the code; it is primarily designed to whet your appetite for what is to come. Future articles in the series will discuss basic topics in 3D programming, including how to construct and display objects, positioning the camera, and texture mapping, just to name a few. After covering the basics, we will go on to more advanced topics such as bump-mapping, shadow volumes, and particle systems. We will begin each article with a short discussion of the mathematics of the topic, then give a basic implementation in OpenGL. If space permits, we may also show some variations on the theme. Finally, we will close with some ideas for advanced experimentation. Hopefully, through these articles, the readers can get a taste of what 3D programming is like, and hone their skills in OpenGL.
Have you ever dreamed of writing a 3D action game? Do you crave the thrill of writing your own flight simulator? Do you live and breathe vectors, matrix math and trigonometry? Then you need to be able to program your computer to display 3D graphics. It was not too many years ago that anyone wanting to do any 3D on a Macintosh or PC was pretty much forced to write all their own routines for doing the graphics. Recently, however, there has been an explosion of interest in the field of graphics, and 3D graphics in particular. In response to this interest, hardware manufacturers such as ATI and nVidia have produced ever more powerful graphics accelerators, usually containing extensive support for hardware acceleration of common 3D graphics operations. Initially, in order to take advantage of the features of these boards, a program had to be specially modified to support each flavor of accelerator. Finally, the people programming 3D applications rebelled, and hardware independent interfaces for the most common graphics boards began appearing.
Microsoft created DirectX, which provided a system level interface for doing graphics operations, and takes advantage of whatever hardware acceleration is available. The problem with DirectX is that it is only available on machines that are running some version of Microsoft Windows. This was not terribly useful for anyone programming on a Macintosh.
Fortunately, there exists an alternative to DirectX that is almost as well supported under Windows as DirectX, and also runs under Linux, most commercial versions of Unix, and Mac OS 9 and X. This is OpenGL. OpenGL is used by many commercial games, including the upcoming Doom III, all versions of Quake, Half-Life, SpecOps, and others. It is also used in many high-end rendering and modeling packages such as Maya, 3DS Max, and Lightwave 3D. In other words, it is an industrial strength, cross-platform interface to allow programmers to write interactive 3D applications.
In 1992, Silicon Graphics (SGI), a maker of high-end graphical workstations, proposed to create an open interface to graphics hardware for use by application programmers. Their proposed Open Graphics Library, or OpenGL, was based on their proprietary IRIS GL library that was used to program their graphics workstations. Drawing on their extensive experience with IRIS GL and graphics programming in general, SGI made significant changes to the IRIS GL to make it more appropriate for use as a general-purpose graphics interface, and presented it as the OpenGL specification, version 1.0. In order to ensure that OpenGL remained an open standard, SGI surrendered control of the specification to the OpenGL Architecture Review Board, a group made up of members from many of the leading graphics vendors. The ARB approves OpenGL features and extensions, and determines how various implementations need to conform to the published standard. The current version of the OpenGL standard is version 1.4.
OpenGL came to be widely adopted; it became the standard 3D API under X Windows and many versions of Unix, and most video card manufacturers have drivers for Microsoft Windows that support both DirectX and OpenGL. Around the time of the initial release of Mac OS X, Apple decided that they too would support OpenGL, and announced that it was the system interface for 3D rendering on the Macintosh. With the release of Jaguar (Mac OS X version 10.2), Apple extensively revamped its implementation of OpenGL, using it as the basis for many of the graphics capabilities for the Core Graphics, bringing sophisticated graphical operations such as transparency, compositing, rotation, and scaling to the standard imaging model, greatly increasing the range of graphical capabilities of even the simplest applications.
OpenGL is an API that is designed to provide device independent access to a common core of 3D graphics capabilities, while still allowing for hardware acceleration where the hardware supports it. In addition, there are extensions allowed to the core capabilities that can allow the programmer access to hardware specific features. OpenGL is a real-time graphics API; it is designed for interactive 3D graphics, not for off-line rendering. Therefore, OpenGL is designed to function as efficiently as possible. To this end, OpenGL functions at a lower rather than a higher level to allow for efficient implementations. Objects are defined as points, lines, and polygons, rather than as cubes, spheres, or other higher level objects. Lights are specified per vertex, rather than using a sophisticated mathematical illumination model such as ray tracing or radiosity.
OpenGL has two main parts. The GL library contains the core routines for interfacing with graphics hardware and performing basic graphical operations. GL contains several hundred commands: commands for the specification of two and three-dimensional objects, as well as commands that control how these objects are rendered into the frame buffer, or video memory. These commands are implemented as function calls to GL. A program using GL will typically open a window referencing the frame buffer where the objects are to be drawn, then make some calls to allocate and initialize a GL drawing environment or context. Once the context is prepared, the program begins making function calls to GL in order to issue commands. There are commands to draw objects made up of geometric primitives such as points, lines, and triangles. Other commands alter the way these objects are drawn, including such things as the color of the objects, whether they are lit or not, the kind of shading they are drawn with, and the way in which the objects are transformed from their own two or three-dimensional space onto the screen. There are also commands to read data directly to and from the frame buffer, for example, copying portions of the frame buffer into texture memory (an area of video memory used for storing pixel information to be used in texture-mapping operations) or overwriting the frame buffer with static pixel data. Most of the time, GL operates in immediate mode, where issuing a command causes it to be executed. It is also possible to accumulate GL commands into a display list for later execution, and send them down to the GL library all at once. This is referred to as batch mode or retained mode.
On top of GL is the GL Utility library, or GLU. It uses GL commands to allow the program to operate at a higher level of abstraction than is supported by GL. GLU provides facilities for performing some useful operations. GL is only capable of drawing convex polygons, or polygons that have no holes or indentations in them. GLU can take a non convex polygon of arbitrary complexity, and reduce it to a series of convex polygons, that can be drawn by GL. This process is known as tessellating a concave polygon. In addition, GLU can return the boundary of such a polygon as a series of line segments. GLU also allows you to specify quadrics, such as spheres, cones, and cylinders. GLU then generates the GL primitives to draw these into a display list, so they can be rendered by the application. Finally, GLU allows curves and surfaces to be represented mathematically. GLU supports several popular mathematical representations, including Bezier curves and surfaces and Nonuniform Rational B-Splines or NURBS. Similarly to quadrics, when GLU encounters curves and surfaces of these types, it converts them into a display list of GL primitives for rendering and display. Thus, using GLU, an application can work with 3D graphics at a much higher level than is possible using just the basic GL interface.
Many versions of OpenGL also include a third component, the GL Utility Toolkit or GLUT. GLUT was originally written for X Windows, and has since been ported to most operating systems that have OpenGL available. GLUT provides a platform independent way of handling window and event management, freeing the programmer to concentrate on the rendering portion of the application, rather than worrying about the mundane business of tracking mouse and keyboard events, and handling window update events. Anyone planning to do a cross-platform OpenGL application should consider using GLUT to simplify interface issues. Even if you are only writing for one platform, using GLUT can considerably simplify your application. I will be using GLUT for several of the applications in this and future articles.
OpenGL is built on a client-server model. In other words, the OpenGL application (client) sends commands to the OpenGL renderer (server). These commands are then interpreted by the renderer, and the renderer modifies the frame buffer in accordance with the commands. Although all implementations of of OpenGL are guaranteed to support the required features of the language, there is no guarantee of the performance or final rendering results of any given command. Increasingly, computers have dedicated graphics hardware that allow for acceleration of many of the more common graphics operations. A good implementation will take advantage of the hardware, but may have to fall back on software rendering for less common cases, which will have much lower performance than the accelerated operations. In addition, there may be mathematical operations on the data that are capable of being accelerated at the cost of some precision in the calculations. For these and other reasons, the OpenGL standard does not dictate the implementation of operations, rather describing the ideal behavior and specifying the range of deviation allowed by implementations. In those cases where deviation from the ideal occurs, OpenGL specifies the rules the implementation must follow to approximate the ideal behavior. Because the behavior of operations is not exactly set out, two different implementations of OpenGL with identical frame buffer configurations may not produce pixel identical results for identical command inputs.
The results of rendering commands sent down to GL are determined by the settings of the current GL context. A GL context encapsulates all the state settings of the current drawing environment. These settings consist of values like the current lighting model, the current background color, the current texture and texture coordinates, and many others. The graphics context is all these state variables taken together. The values of these state variables are set in the current context by issuing commands to the GL libraries through function calls. Rather than having to specify a complete set of states every time a piece of geometry is to be drawn, it is possible to set up multiple contexts and switch between them. For example, let us say that an application has two contexts, one context with the settings so that everything is drawn as a wire frame, and another set up so that everything is drawn texture-mapped. If the application keeps references to both contexts, it will be possible to alternate wire frame drawing with texture-mapped drawing merely by switching contexts between objects. OpenGL commands are always processed in the order they are received, so when drawing two objects, all the drawing operations for the first object are guaranteed to complete before any of the second object's commands are executed. One of the results of this is that any queries of the internal state of the context and the pixel values of the frame buffer are guaranteed to be consistent with all the previously dispatched commands executing before any query or pixel operation returns.
Figure 1. OpenGL Command Flow Diagram
An understanding of the GL renderer operation can be helpful in comprehending the results of OpenGL commands. Figure 1 shows the data flow in the GL graphics pipeline. Commands issued by the application enter the pipeline from the left. Looking at figure 1, we can see that a command takes one of two paths, depending whether it deals with vertex or pixel data. Vertex commands follow the upper path. First, they enter the evaluator stage of the pipeline. Here, evaluators provide the means for specifying a polynomial or rational polynomial mapping to produce vertex coordinates, normal coordinates, texture coordinates and colors. These values are then passed along to the later stages of the pipeline as if they had been provided to the pipeline directly by the client. Evaluators are the mechanism that GLU uses to create GL vertex data from Bezier and NURBS surfaces. Vertex primitives (points, line segments, and polygons), are operated on in the per-vertex operation phase. Here, vertices are transformed and lit, and primitives are clipped to the viewing volume. In the rasterization stage, the rasterizer analyzes the vertex data and produces a series of frame buffer addresses, known as fragments. Each fragment is then fed into the last stage of the pipeline, per-fragment operations. This stage performs any final operations on the fragment before it is stored as a pixel in the frame buffer. Among the operations performed are conditional updates to the frame buffer based on the state of the depth buffer, blending the fragment color value with the current pixel value in the frame buffer, subpixel sampling for antialiasing, and various arithmetic and logical operations on the pixel values in the frame buffer.
Commands dealing with pixel data bypass all the geometric operations and are instead processed as pixels directly in the pixel operations stage. Results are then stored as texture memory for later use in the rasterization stage, or are rasterized. In the case of rasterization, the resulting fragments are stored to the frame buffer just as if they were generated and rasterized from geometric commands in the other command path. Once pixels have been written to the frame buffer, they can be copied back to the pixel operations stage, and then either used as textures or sent back through the pixel pipeline for further processing.
Figure 2. Screenshot of FirstGL
Putting It All Together - FirstGL
We have covered the basics of the architecture and operation of OpenGL. "How do you write a program with it?," I can hear you asking. This month's article comes with the source code for a very simple OpenGL program that simply opens a window and draws a cube with six colored faces rotating in space. When the mouse is clicked in the window, the rate of rotation of the cube changes. A click in the left side of the window will change it in one direction and a click in the right side of the window will change it in the other. A command-click or right mouse click will switch it from drawing filled to drawing wireframe. We will briefly examine the structure of the program and the sorts of GL commands that it uses to do the drawing. However, an in-depth discussion of all the techniques and a detailed explanation of all the concepts will have to wait for a later day. Everything that this program does will be explained in mind-numbing detail in the coming months.
Listing 1: main() (FirstGL.c)
int main(int argc, const char *argv)
Sets up the initial GLUT environment and registers the callbacks we will be using in the program.
The only callbacks FirstGL uses are the mouse, idle and display callbacks.
// forward declarations of the callbacks
void init (void);
void idle (void);
void display (void);
void mouse (void);
// our main function
int main (int argc, const char * argv)
// initialization for the GLUT library
glutInit(&argc, (char **)argv);
// We are using an RGB, double buffered window, with a z-buffer
GLUT_RGB | GLUT_DEPTH);
// top left corner of the window
// Make a WINDOW_WIDTH X WINDOW_HEIGHT window
glutInitWindowSize (WINDOW_WIDTH, WINDOW_HEIGHT);
// The string specifies the title of the window
g_window = glutCreateWindow("FirstGL");
// this is the function where we do our initial OpenGL setup. Note
// that it is called AFTER the window is created. OpenGL will only
// function when the OS has already set up the windowing environment.
// The functions calls below all install event callbacks into GLUT.
// They specify the functions that GLUT needs to call whenever events
// of a certain type happen. We will only use the idle, display, and
// mouse click functions, although there are also provisions for mouse
// moved, keyboard, window resizing and other callbacks as well.
// the idle function updates the rotation of the cube, and forces a redraw
// This function does the heavy lifting for the drawing in the window.
// when the mouse is clicked in the window, the speed of the rotation changes
// on command/right-clicked it switches to and from wireframe
// This function will never return. Whenever it encounters an event, it
// calls the appropriate callback. It will also quit the application.
In order to simplify the programming, FirstGL uses the GLUT library for all window and event handling. So, the first thing to do is to look at how FirstGL's program code interacts with GLUT. In the main function, FirstGL makes a series of calls to functions of the form glutXXXXXX. Unsurprisingly, these are GLUT library functions. First, glutInit is called with the command line arguments that were received by main. Then, FirstGL calls glutDisplayMode with the arguments GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH. GLUT_DOUBLE tells GLUT to make the context double-buffered. In other words, FirstGL will be drawing into one buffer, while the other is being displayed. This is a way to reduce flicker while doing animation. GLUT_RGB tells GLUT that to create a window using RGB coloring, instead of indexed colors, so no palettes to worry about. GLUT_DEPTH instructs GLUT to allocate a depth buffer (or z-buffer), which is a method for drawing correctly depth sorted objects.
The next three calls to GLUT, glutInitWindowPosition, glutInitWindowSize, and glutCreateWindow are reasonably self-explanatory. The next function main calls is the first of FirstGL's own functions, the init function. This is where FirstGL sets up the OpenGL environment. We will examine that code a bit later. The next three calls, to glutIdleFunc, glutDisplayFunc, and glutMouseFunc, register callbacks for idle events, display events, and mouse click events, respectively. Finally, FirstGL calls glutMainLoop, which is GLUT's main event handler. From this point on, the code runs completely inside glutMainLoop, returning only when the application quits.
From within glutMainLoop, GLUT makes calls to the callback functions that have been registered with it. Each time through the main loop, the idle function is called, then GLUT polls an internal event queue. For every type of event, if the application has registered a callback for that event type, GLUT calls the callback routine. FirstGL registers callbacks for mouse-clicks, window updates, and idle events. Other possible events that an application can register callbacks for are mouse-moved, keyboard, window resizing, and window moving, as well as other, less common types of events.
The way that FirstGL works is as follows. In the idle callback, the rotation of the cube is modified by the amount stored in rot_change. Then, the idle callback calls glutPostRedisplay, which forces a redraw of the window. If the user clicks in the window, a click in the right half of the window increases rot_change, and a click in the left half of the window decreases it. When rot_change becomes negative, the direction of rotation reverses. Finally, when the idle function calls glutPostRedisplay, the display function is called, and that is where all the actual drawing commands for the cube are issued.
Listing 2: init() (FirstGL.c)
Sets the OpenGL environment variables to the desired initial state.
// this is the initialization function. Here, we set all the
// initial parameters for the OpenGL drawing environment.
void init (void)
// enable depth buffer, so that drawing is depth sorted correctly
// Allow a more realistic shading model
// all vertices for faces go in counter-clockwise order
// set the background color to white
glClearColor(1.0, 1.0, 1.0, 1.0);
// turn on lighting in OpenGL
// turn on light 0 (out of 8)
There are two places where actual OpenGL calls are made. One is in the init function, and the other is in the display function. Let's look at init first. init() is composed entirely of commands that modify the settings of some of OpenGL's state variables. In other words, these calls put the OpenGL context into the state the application wants for the drawing to look the way it expects it to. The first call is to glEnable(GL_DEPTH_TEST). The function call glEnable allows an application to enable and disable a wide variety of features in OpenGL. In this case, the parameter of GL_DEPTH_TEST instructs OpenGL to start using the depth buffer that we told GLUT to allocate in our call to glutDisplayMode back in main. The call to glShadeMode(GL_SMOOTH) informs GL that FirstGL wants all the polygon faces to be realistically shaded. glFrontFace(GL_CCW) tells the renderer to expect all geometry to be constructed such that the front of the polygon is defined by the face formed by following the vertices in counter-clockwise order. glClearColor tells OpenGL to erase the window to white. Finally, init has two more calls to glEnable. In the first one, it activates lighting in the scene. In the second one, it activates one of the eight OpenGL lights. If init didn't activate a light, the cube would be drawn as a black solid, even though lighting was enabled in the scene.
Listing 3: display() (FirstGL.c)
Called whenever there is a redraw event for GLUT. This is where all the actual
drawing in the program takes place
// this is the callback for window drawing -- the meat of the program
// lives here. Everything else is just window dressing, pardon the pun.
<Camera and window setup omitted for clarity>
// start transforming the model space
// initialize the matrix stack to the identity matrix
// set the rotation for the cube, values set in the idle function
glRotatef(rot, rot, rot, rot);
// cant the cube on its end, 45 degrees around the x-axis
glRotatef(45.0f, 1.0f, 0.0f, 0.0f);
// from here, we are specifying the vertices of the faces of the cube.
// as we stated in the init function, the vertices are in CCW order.
// the normals are the same as the vertices, since the center of the
// cube is at the origin.
// Each face is wrapped in a glBegin/glEnd pair.
glColor3f(1.0f, 0.0f, 0.0f); // red face
glVertex3f(1.0f, -1.0f, 1.0f);
glNormal3f(1.0f, -1.0f, 1.0f);
glVertex3f(1.0f, -1.0f, -1.0f);
glNormal3f(1.0f, -1.0f, -1.0f);
glVertex3f(1.0f, 1.0f, -1.0f);
glNormal3f(1.0f, 1.0f, -1.0f);
glVertex3f(1.0f, 1.0f, 1.0f);
glNormal3f(1.0f, 1.0f, 1.0f);
<Repeat five times, one for each face of the cube>
// start displaying the buffer we have just finished drawing into.
Now, let us look at the heart of the program, display. At the top of the function is a bit of bookkeeping to get the correct camera position and transformation matrices in place. Again, all these commands are commands that modify the current setting of the GL context. None of them actually cause any drawing to take place. Finally, comes a call to glBegin(GL_QUADS). This is where drawing starts taking place. Most drawing commands in OpenGL are contained in a glBegin/glEnd block. glBegin(GL_QUADS) informs OpenGL that a series of vertices and possibly other information is coming down, and that these vertices make up a quad, or 4 sided polygon. After the glBegin, display sets the color for the polygon. Then, display starts specifying the coordinates for the points, and the direction of the normals, for color calculations. Since the center of the cube is located at the origin, the direction of each normal is simply the coordinates of the associated vertex. display specifies 6 faces for the cube. OpenGL applies the transformations that were set out at the beginning of display, draws the transformed polygons, colors them, and then swaps the buffer so that the finished drawing is displayed in the window. And that is essentially how a program uses OpenGL. The drawing environment is controlled through commands that set the values of the various state variables of the context. The appropriate transformations are sent down, then geometry is specified, and finally, the finished drawing is shown.
Acknowledgements and Further Reading
Many thanks to my wife, Carol, who suffered through several drafts of this article, each seemingly more impenetrable than the last, but who finally got me to write it in such a way that she could follow my meandering exposition. Thanks also to Paul Snively, who provided important feedback on the direction I was going with the article, and also proofread it, pointing out several errors. He also did the OCaml version of FirstGL. Also not to be forgotten is Chris Page, who looked over the article, and translated FirstGL to Dylan. Finally, thanks to Marshall Clow, who first gave me the idea of writing these articles, and then kindly led me through the process of getting it in to the folks at MacTech.
You may notice that I mention an OCaml and Dylan version of FirstGL. Undoubtedly, you are wondering what I am talking about. OCaml and Dylan are both programming languages that take a very different approach to the C/C++ family of languages. It is almost certain that all of us would be a lot more productive and write more correct code faster if we were using these languages to develop in. In an attempt to get people to look at some interesting alternative languages, I am going to try to provide versions of all the applications for this series in C/C++, Ocaml and Dylan. To find out more about OCaml, you can go to http://www.ocaml.org. You can learn more than you ever wanted to know about Dylan at http://www.gwydiondylan.org. I encourage you to check both languages out, if for no other reason than to become acquainted with a different programming worldview.
Anyone who is serious about programming OpenGL cannot live without the OpenGL Programming Guide or the OpenGL Reference Manual. My discussion of the architecture and operation of OpenGL was heavily influenced by both of these, and the illustration of the OpenGL command flow diagram in figure 1 was inspired by an illustration in the Programming Guilde. Both of these books have a wealth of information in them regarding just about every facet of OpenGL you could care to name, although it is not always completely obvious where things are at times. There are also numerous web pages and various discussion groups on the web for OpenGL programming. A good place to start would be the official OpenGL web site: http://www.opengl.org.
OpenGL Architecture Review Board, et al. OpenGL Programming Guide, The Official Guide to Learning OpenGL, Version 1.2. 3d edition. Addison-Wesley, Reading, Massachusetts, 1999.
OpenGL Architecture Review Board, Dave Shreiner, ed. OpenGL Reference Manual, The Official Reference Document to OpenGL, Version 1.2. 3d edition. Addison-Wesley, Reading, Massachusetts, 2000.
David Harr (and the voices in his head) has been programming Macs and things not Macs more years than he cares to count. As punishment for his many sins, he has been forced to program 3D graphics over and over until he got it mostly right. Currently, he is taking a sabbatical from programming professionally and is enjoying the quaint customs of the natives of Academia. He can be reached at firstname.lastname@example.org.