January 93 - Persistent Objects and Object-Oriented Databases for C++
Persistent Objects and Object-Oriented Databases for C++
Dirk Bartels and Jonathan Robie
Reprinted with permission from C++ Report September 1992. Copyright© 1992 SIGS Publications, 588 Broadway, NY, NY; 212/274-0640; 212/274-0646 (fax).
Many programmers first learn about object-oriented programming by buying a C++ compiler and a GUI class library. These libraries are generally well structured, support true object-oriented design and offer a simple, powerful programming interface. This is often enough to convince programmers to use true object-oriented designs. Unfortunately, many of these programmers find themselves tearing their designs apart when they integrate their programs with conventional database systems, which provide almost no support for objects or for expressing the relationships among objects.
Conventional databases are good at managing large amounts of data, sharing data among programs, and fast value-based queries. They are not very good at modeling the relationships among data, however, since everything must be represented as series of a two-dimensional tables.
Object-oriented database systems (OODBS) are a relatively new tool for software developers. Unlike relational and table-oriented systems, they provide full support for the object-oriented programming model used in languages like C++ and Smalltalk. This model is intuitive, good at modeling relationships, and very suitable for large software projects.
An object-oriented database combines the semantics of an object-oriented programming language with the data management and query facilities of a conventional database system. This makes it easy to manage large amounts of data and to model the relationships among the data. If an object-oriented database is integrated with an object-oriented language, it should support the semantics of that language; relationships established in the program should automatically be represented in the database when objects are stored.
This article focuses on the advantages of object-oriented databases over conventional table-oriented and relational databases and the integration of an OODBS into C++. For small applications these advantages mean that your program will be less complex and easier to understand. For large or complex applications these advantages may mean the difference between success and failure.
LIMITS OF CONVENTIONAL DATABASE SYSTEMS
Database systems are designed for managing large amounts of data, and they provide many important features that object-oriented programming languages do not: permanent storage, fast queries, sharing of objects among programs, device independent formats, and sophisticated error handling for database operations.
Relational database systems (RDBS) and table-oriented systems based on B-Tree or Indexed Sequential Access Method (ISAM) are the standard systems currently used in most software development. Each requires that all data be portrayed as a series of two-dimensional tables. The relational model declares the structures, operations, and design principles to be used with these tables.
These systems are quite appropriate for some applications and were a real breakthrough in their time, but software developers are rapidly learning that life is not a series of two-dimensional tables. The growing complexity of modern programs and the increasing use of dynamic data models have pushed traditional databases to their limit. The limited data models they support can result in significant software development costs since they do not allow program designs that closely match the problem domain. They are not even worth considering for some application areas like computer-aided design (CAD), computer-aided engineering (CAE), multimedia, and office automation.
Limited Data Types
Modern software systems often contain data types that are not easily modeled using such predeclared types. For example, a CAD program might have an array of shapes, or a desktop publishing program might model a page as a series of frames which may contain bitmaps, paragraphs, or vector drawings. We have already seen that object-oriented programs allow us to declare new data types as needed.
Conventional databases have a fixed set of data types. The better systems include both simple data types like INTEGER, FLOAT, or CHAR and complex data types like DATE, TIME, or CURRENCY. New data types cannot be added by the user. If your database does not have the data type you need then you are stuck. Aggregate data types like arrays are rarely available. The only way to group data is to put it in a table.
Limited Modeling of Data Relationships
In conventional database systems, each item is represented as a row in a table. Tables may be accessed sequentially or by searching for values. The only way to express relationships among items is by setting values in the rows. In each table one or more columns is chosen as the primary key; this key must be unique for each row in the table. For instance, the primary keys for a student, a teacher, and a class might each be represented as identification numbers.
The relational model is weak when showing many-to-one relationships, which generally require the introduction of a new table. In our example, the only way to show which students are taking a class is to create an "enrollment" table which has a row for each student and contains the student identification number and class identification number in each row.
Since relational databases have no concept of hierarchy, it is difficult to model the ISA relationship. Suppose we have a "people" table, a "students" table, and a "teachers" table. Every student is also a person, and some of his fields are in each table. To update all of a student's information you must find the rows of each table whose identification numbers match. Every level in the hierarchy requires a new table, and every program using the database must update every relevant table appropriately. The hierarchy is not explicitly represented in the database; you simply have to know why the various tables are there.
No Way of Grouping Code With Data
We have already seen that object-oriented programming languages allow related code and data to be combined to form objects. There is no way to do this in a conventional database system. If you know the name of a table you may use it, and the system will not prevent you from changing the wrong table. As long as you have the right password everything in the database is globally accessible to all of your code.
Limited Manipulation of Data
Database languages are often very poor at manipulating data. SQL, for instance, does not allow you to perform computations on your data as input to a query, nor does it allow you to perform computations on the result of a query. A computer language designer would say that SQL is not computationally complete even though it is relationally complete; a normal human being might say that SQL is great for searching but lousy for anything else. Because of this, most serious applications are written in conventional programming languages using some kind of SQL-based interface to the database.
Since the database and the host programming language use two different models and different data types the programmer must either perform all operations directly in the database or constantly convert between the two systems. The first method does not let the programmer use many features of the host language; the second means a great deal of overhead and frustration since the relationships among data must be constantly converted to support both programming models. Such a program has two distinct designs, one for the program itself and one for the database.
To store data in a conventional database, it must be dissected into a series of two-dimensional tables. Only predeclared data types are supported. Object-oriented programming languages have a rich set of features for creating data types and representing the relationships among data that are not supported in such databases. In the rest of this article, we discuss features that an object-oriented database must support. To illustrate these features, we examine POET, a commercial object-oriented database system with which we are connected.
The original implementation of Smalltalk had a simple method for storing objects: the program's entire memory image could be dumped to disk and restored when running the program later. This scheme has some real advantages. It is very simple to implement, requires almost no effort from the programmer, and fully implements all aspects of the programming language (after all, the program sticks everything in memory somewhere!). It also has some real disadvantages. The number of objects that can be stored depends on the amount of main memory available, the programming context must be stored and retrieved as a whole, objects may not be shared among programs or retrieved on another kind of computer, and there is no way to implement intelligent error recovery.