|Column Tag:||Version Control
Keeping Things Straight, Orthogonally
Do do that VOODOO
By Christoph Reichenberger
Organizing Variants and Revisions
The importance of software version management increases with the size of a software project. The implementation of large projects produces program families (programs existing in many different variants and revisions). Therefore, all components of a project (source texts, documentation, pictures, etc.) must be stored in many different versions, and have to be retrievable at any time.
A drawback of many version control tools (SCCS and RCS, as well as most of their derivatives like DSEE, ClearCase, Projector/SourceServer, etc.) is the failure to distinguish variants and revisions. This is called intermixed version organization. Here, variants and revisions of components are organized together in version trees. Variants are generated by branching off new revisions from certain points and developing them in parallel. The component forms the center of attention, and the different versions of the components are managed by maintaining a single version tree for each component. This results in two main shortcomings:
Revisions and variants are identified by means of a multi-digit number. When a variant is split off, two more digits are appended to the version number. Especially in projects with a couple of variants, this numbering scheme becomes unwieldy.
The trees structure represents the chronological order of branching of variants. As the components histories of development will in general be different, distinct tree structures arise within the project, all representing the same variant information. The user of the software version control tool must know each of these structures in order to retrieve the specific variants and revisions of the components to be worked with. Moreover, these trees soon become difficult to survey when the number of variants of the project increases.
Lets take an example. Suppose there is a project consisting of two components A and B. At the beginning, this project is implemented in Modula-2 on a Macintosh computer. Some revisions of A and B arise. Figure 1 shows the version trees for A and B that would evolve when managing this project with SCCS at instant 0.
Figure 1. The projects version trees at instant 0
Next we need to implement a variant of the project in Pascal and branch off a new variant of the component A for the Pascal implementation, where the programmers expect that component B will not differ for the Pascal and Modula-2 implementations. Next we branch off a variant of component B for a Sun workstation, where (conversely) component A is believed identical for Macintosh and Sun (instant 1).
At instant 1, the project exists in three different implementations:
1. in Modula for Macintosh, consisting of A 2.1 and B 2.1
2. in Pascal for Macintosh, consisting of A 220.127.116.11 and B 2.1
3. in Modula for Sun, consisting of A 2.1 and B 18.104.22.168
This information, however, cannot be inferred from the version trees in Figure 2.
Figure 2. The projects version trees at instant 1
After development of some revisions of the different variants it turns out that component A must be implemented differently on both machines, and component B has to be developed in two variants according to the programming languages (instant 2). The resulting version trees are presented in Figure 3.
Figure 3. The projects version tree at instant 2
This small example shows that the chronological order of branching is reflected by the naming of the different variants of the projects components. The shape of the trees of components A and B are different, although both have the same variant structure. As the tree structure is used for naming software objects, the different structures must be known to the user in order to enable retrieval of a certain revision of a given variant. A fixed revision i of the variant Sun/Modula of component A is referred to by 2.i, whereas the same version of component B is called 22.214.171.124.1.i. Moreover, at each new branch two digits have to be appended. If there are four distinctive marks within a project, the software objects in SCCS-like systems have revision numbers like 126.96.36.199.188.8.131.52.3.4, bordering on incomprehensibility.
These drawbacks can be avoided by using orthogonal organization of variants and revisions. Two main characteristics distinguish orthogonal from intermixed version organization.
Instead of managing versions of atomic objects only, orthogonal version organization also deals with variants and revisions of the whole software project. In fact, the management of versions of the whole software project is the basic idea of orthogonal version management.
Variants and revisions span the whole project and are considered to be orthogonal to each other.
Using the orthogonal organization model, a software project consists of a set of objects which we call the object pool, and a set of project structure trees. The following sections explain these terms.
The Object Pool. As mentioned above, our task is to manage a project consisting of a set of fundamental components. Each of these components may exist in different variants and revisions. We use the term object for an instance of a component, which is uniquely defined by a certain variant and a certain revision. An object could be the source text of a module within a software project, i.e., a particular implementation of a component (e.g., revision 7.3 of module XY belonging to variant A). We represent an object graphically as a small cube (Figure 4).
The object pool of a project is the collection of all these different variants and revisions of all of the projects components. We can envision the object pool as a three-dimensional space whose three dimensions are component, variant, and revision, as shown in Figure 5.
Figure 4. Object
Figure 5. Object pool
The object pool can be variously projected . For instance, by cutting off a vertical slice, we get a project revision (Figure 6), comprising all variants of all components of the project at a given time. Since the component group consists of all objects for a specific implementation of the project at a given time, it may constitute the basis for automatic build management.
Figure 6. Project revision Figure 7. Component group
Restricting a project revision to a particular variant gives a component group (Figure 7).
There is also, however, a hierarchical structure connecting the components of a project. Representing this is the project structure tree, consisting of structure nodes and component nodes. Component nodes are the leaves of the project structure tree, and stand for indivisible components of the project, the components (e.g., a module of a program system, a chapter of a manual, etc.). Note that an object node does not correspond to a physical file. It is a logical element of the project and can exist in multiple versions. Structure nodes are the inner nodes of the project structure tree. They comprise several structure and/or object nodes and form a vehicle for structuring the project structure. The relation between parent and child in the structure tree can be defined as contains, as in a hierarchical file system. (Note that the project structure tree has nothing to do with relations among the objects; it cannot, for example, represent an import relationship.)
Figure 8 shows a project structure tree together with the object pool. In this example, the project consists of five components (c1, c2, m1, m2 and m3). Figure 8 shows further that each horizontal slice of the object pool (called a project component) corresponds to one component node.
Figure 8. Connection between
project structure tree and object pool
Revisions and Variants of the Project Structure. Since the project structure may change in the course of time, more than one project structure tree may exist within one project, each describing the structure of the project for a certain period of time. Thus, a particular project structure tree does not describe the structure of a whole project, but rather the structure of a project revision. In the extreme case, every project revision can have its own structure tree.
Figure 9 shows an object pool consisting of five project revisions together with two revisions of the structure tree. Revision 1 of the structure tree was valid for project revisions 1 and 2, when the structure node m contained only components m1 and m2. Starting with project revision 3, the structure node m additionally contains component m3.
Figure 9. Revisions of the project structure tree
Even for a given project revision, we cannot speak of the project structure. A certain project variant may not use some of the components, so that the structures of various project variants may differ. However, we can always imagine a total project structure tree, which is a project structure tree with all single trees overlaid.
Figure 10 shows possible variants of the total project structure tree shown in Figure 8. We can see that project variant x does not use components c1 and c2. Project variant y does not use component m1, and component m3 is not used in project variant z.
Figure 10. Variants of the project structure tree
This detailed project structure tells the user exactly which components must be chosen when building a special variant of the project. (Note that there is no statement about how they are to be assembled; that is the responsibility of configuration management.)
In every software version control tool, delta storage is used to store all the versions space-efficiently. Only one version is stored in full; the others are stored as delta scripts. A delta script (or delta) is a sequence of edit commands transforming one version of a document into another. One of the first version control tools, SCCS , was designed to store different versions of files in a UNIX environment. All source documents in this environment were text documents with an inherent line structure. The delta algorithm made use of this structure information and regarded lines as atomic elements of the file. The same holds for most of the other even more sophisticated version control tools like RCS  and DSEE .
But todays software projects do not consist exclusively of text files. Programming environments may store source programs in the form of an intermediate language; the documentation may be written using a word processor which stores data in a special file format; different versions of drawings have to be stored. An extraordinarily nasty example are files on the MacOS which consist of a data fork and a resource fork.
Thus, modern version control systems must make no assumptions about the structure of the files to be stored, but have to supply delta storage for arbitrary files. This means that they must be able to generate delta scripts between any two byte streams.
In  we introduced an algorithm for generating deltas between arbitrary files. Besides its applicability to arbitrary files, the calculated deltas are smaller, and are calculated faster, than those of other algorithms [1, 2, 9].
We generated deltas between a large number of files and compared the results with those generated by SCCS (on a SUN SparcStation 2). For non-text files we used two versions each of about 30 files of different types. The total size of the 30 files was about 4 MB. Half of the files could not be managed by SCCS. Table 1 shows the results of the delta generation between the remaining files.
Table 1. Delta generation between text files
The test suite for text files consisted of two versions each of about 300 plain text files. The total size of the 300 files was about 20 MB. Table 2 summarizes the results.
Table 2. Delta generation between non-text files
VOODOO implements the above techniques, and is usable not only for the organization of software development projects in a narrow sense (program development), but also for CAD, technical documentation, desktop publishing, etc. Even the writing of a book, for example, is a project in which multiple elementary building blocks (the individual chapters, illustrations, etc.) evolve in various revisions.
VOODOO models the structure of the software project as a project tree, an extension of the structure tree. The project tree represents the logical associations of the individual components and gives insights into their association with project variants. Figure 11 shows the complete project tree of a small sample project.
Figure 11. Project tree consisting of four kinds of nodes
Let us examine the meaning of the individual kinds of nodes:
Structure node (e.g., )
Structure nodes provide the logical structure of the project. Each structure node can contain structure nodes and component nodes.
Component node (e.g., )
Component nodes represent the elementary building blocks of a project (e.g., a module of a program system, a chapter of the documentation, etc.). A component node does not represent a physical file, however. It represents a project component that can exist in multiple variants and revisions. Each component node is the child of a structure node and can contain any number of version group nodes.
Version group node ( )
A version group node represents a certain version of a component. Each version group node is the child of a component node and can contain any number of variant nodes.
Variant node (e.g., )
A variant node always carries the name of a project variant. Variant nodes identify in which project variants a certain version group is used. If a version group node contains a variant node x, this means that this version group is used in the project variant x. Each variant node is the child of a version group node and cannot have children itself.
In Figure 11, the example composite project (a compiler) consists of two parts, implementation and documentation. The implementation is subdivided into lexical analysis and syntax analysis (including code generation), of which each part again consists of two components. The further structure of the documentation is not yet defined.
The project is being developed in two variants, one with optimization and one without. The component Parser can be used commonly by both variants (a version group node that contains both variant nodes). The components Scanner and CodeGen are being developed differently for each of the two project variants (each variant node has its own version group node). The component Switches is used only in the variant Optimizing (no version group node for the variant Standard).
The connection between the project tree and the object pool can be illustrated by turning the version group nodes 90 degrees, i.e., by visualizing the third dimension (see Figure 12). To keep the illustration clear and simple, the connections between the component nodes and the object pool have been drawn only for the components Scanner and Parser.
Figure 12. Connection between project tree and object pool
Within a project managed with VOODOO, the project tree not only defines the logical connections between the projects components, but also constitutes the basis of the user interface. It can be manipulated with a browser, and can be filtered by particular components, variants and/or revisions.
VOODOO presents the software project in two windows. In Figure 13, the front window shows the project tree. The user is about to check in a new version of the component TCLTools. The other window shows the project history (see below).
Both windows can be filtered. The variant information that applies to the project is not mixed with the information about the revisions of the individual components. The two are managed in a strictly orthogonal fashion. VOODOO is thus able to filter even the project tree according to variants of the project, i.e., to display only those parts of the project tree that are associated with a certain variant (or a set of variants). The filtering of the project tree is applied in two levels.
Figure 13. Screen snapshot during a typical VOODOO session
Figure 14. Different view menus for different users
First of all, the project tree shows nodes of only those variants to which the user has at least read-only privileges. The names of variants to which he lacks access are not displayed in the View menu either. The user is thus not even aware that these variants exist (Figure 14).
Then, within those variants that are visible to a user, he can set variant filters that further restrict the view of the project. Figure 15 shows the unfiltered project tree for the sample project. The user has set the variant filter to the variant Standard, but has not yet activated the filter mechanism (Use Filter is not checked). The project tree is thus displayed unfiltered for both variants, Optimizing and Standard.
By activating the filter mechanism, the nodes associated with the variant Optimizing are hidden. Since the component Switches is used only in the variant Optimizing, this component node is not visible in the filtered display of the project tree (Figure 16).
Support for drag and drop simplifies the creation of the project tree as well as the task of checking in files. The user drags the files he wants to store from any other application to the VOODOO project window. VOODOO looks up the corresponding components and brings up a dialog that shows which files will be checked in to which version group nodes. Figure 17 shows an example of how files are stored to the VOODOO object pool by dragging them from a Symantec Project Manager window.
Figure 15. Unfiltered project tree
Figure 16. Filtered project tree
Figure 17. Archiving objects using Macintosh Drag and Drop
Each time a new object is archived or modifications are made to the project structure or variant information, VOODOO generates an entry in the project history. The entry consists of the date and time of the modification, the affected components and variants, the name of the user having made the modifications, and a comment.
Figure 18 shows the history window with its two types of entries. Modifications to the project structure, as well as named configurations, are displayed in bold, all other entries in normal.
The small boxes in front of the names indicate whether there were changes made to the variant information () and/or to the project structure (), or whether a line represents a named configuration ().
To provide a general view of the software project, the entries in the history window can also be filtered according to various criteria. Setting the variant filters affects the history window as it does the project tree - only the entries for checked variants are displayed. If one or more nodes in the project tree are selected, then only the history records of these nodes are displayed in the history window. If no node is selected, all entries are shown. To set the viewing time of the entire software project (project tree, variants, revisions, etc.), you select the line in the history window with the desired date/time and press the Turn Back button. The entire project will then appear as it looked at the selected time.
Figure 18. History window
A problem that often arises in software projects with multiple programmers is that of simultaneous access of several programmers to a given component. Version control tools can help to solve this problem by providing a locking mechanism.
Figure 19. Project tree with locked nodes
VOODOO provides its locking mechanism at the version-group level rather than at the component level. Since various version groups represent different parallel development branches, two version groups of a component can be worked on simultaneously by two team members without causing any problems. Version groups can be locked to prevent other team members from overwriting them until they are unlocked again, either explicitly or as a side effect of retrieving or archiving objects. The current status of a version group is displayed in the project tree with corresponding icons (Figure 19).
VOODOO also supports locking of files within the local workspace, using the Finders locked flag (instead of the 'ckid' resource) to lock local files.
A working demo of VOODOO can be found at:
1. Heckel, P. A Technique for Isolating Differences Between Files. Communications of the ACM 21:4 (April 1978).
2. Hunt, J. W., and T. G. Szymanski. A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM 20:5 (May 1977).
3. Leblang, D. B., and R. P. Chase, Jr. Computer-Aided Software Engineering in a Distributed Workstation Environment. Proceedings of the ACM SIGSOFT /SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Pittsburgh 84, ACM Software Engineering Notes 9:3 (1994).
4. Reichenberger, C. Orthogonal Version Management. Proceedings of the 2nd International Workshop on Software Configuration Management, ACM SIGSOFT Software Engineering Notes 17:7 (November 1989).
5. Reichenberger, C. Delta Storage for Arbitrary Non-Text-Files. Proceedings of the 3rd International Workshop on Software Configuration Management, Trondheim, Norway (June 12-14, 1991). ACM Press (Order Number: 594910).
6. Reichenberger, C. VOODOO: A Tool for Orthogonal Version Management. Proceedings of the 4th International Workshop on Software Configuration Management, Baltimore, Maryland, USA (May 21-22, 1993).
7. Rochkind, M. J. The Source Code Control System. IEEE Transactions on Software Engineering SE-l:4 (December 1975).
8. Tichy, W. F. Design, Implementation, and Evaluation of a Revision Control System. Proceedings of the 6th International Conference on Software Engineering, ACM, IEEE, IPS, NBS (September 1982).
9. Tichy, W. F. The String-to-String Correction Problem with Block Moves. ACM Transactions on Computer Systems 2:4 (November 1984).
10. Tichy, W. F. RCS - A System for Version Control. Software - Practice and Experience 15:7 (July 1985).