CGI Programming with REALbasic and Apache

Volume Number: 20 (2004)
Issue Number: 6
Column Tag: Programming

CGI Programming with REALbasic and Apache

by Mark Choate

With the recent release of REALbasic 5.5, RB has become an excellent tool to use for web development. The most recent version sports improved networking features and support for XML (including XSLT and Xquery), plus the ability to compile command-line applications, called console applications in REALbasic. Perhaps most interesting is the ability to compile applications for use on Windows and Linux, in addition to Macintosh platforms.

Traditionally, Mac web servers communicated with CGI applications through Apple events. This doesn't work with Apache, however, so a CGI application needs to be able to be able to receive information from the server in the normal CGI way - through environment variables. This article illustrates the steps necessary to implement this in REALbasic. One important thing to note: since many of the features that enable CGI programming in REALbasic are new, the current release (5.5.1) has some bugs, which I have had to work around. Some may be fixed by the time this article is released, but hopefully this will save you some time for those that have not been fixed.

The first step will be to review CGI programming for those who aren't familiar with it. CGI stands for the common gateway interface. It's called an interface because it provides the means for Apache (or any web server that supports CGI) to execute scripts and applications on the host machine of a web server. When a user types a URL into his or her web browser, that URL often represents the location of an HTML file that the server just picks up and sends back to the browser. In a CGI program, the URL represents a script or a program that gets executed. The output of the program then gets sent back to the user. In order to provide security, Apache allows the administrator to configure which directories allow CGI programs to be executed. On OS X the cgi-bin directory is here:


This article assumes that you haven't made any changes to the default Apache configuration that comes with OS X. The configuration file that Apache uses is available at /etc/ httpd/http.conf. If you have never modified this file, now is not a good time to start - but you shouldn't need to. It's worth taking a look at it just to make sure that CGI is set up properly. My httpd.conf file has this, about 2/3 of the way through the document:

    # ScriptAlias: This controls which directories contain server scripts.
    # ScriptAliases are essentially the same as Aliases, except that
    # documents in the realname directory are treated as applications and
    # run by the server when requested rather than as documents sent to the client.
    # The same rules about trailing "/" apply to ScriptAlias directives as to
    # Alias.
    ScriptAlias /cgi-bin/ "/Library/WebServer/CGI-Executables/"

The last line indicates two things. "/cgi-bin/" is going to be part of the URL for the CGI application - something like: http://localhost/cgi-bin/ plus the name of your script. The second path is the absolute path for this directory on the server. For this example, we'll be placing our REALbasic CGI program in this directory. Sometimes you'll see CGI scripts that end with a ".cgi" extension, but we won't need to use that - in fact, you should avoid using any extensions because it will mess things up. Other scripting languages, like perl and Python, usually reside on the web server as text files that are executed by an interpreter. Apache uses extensions to map an interpreter to a particular file. Since REALbasic is a compiled program, it doesn't need an interpreter and it's better just to leave the extension off. It also provides for a much nicer URL, which is important, too.

Now we can start work on the program. The easiest way to work is to save the project in the CGI-Executables directory. This is because you'll need to compile the application in order to test it with Apache, and it's easier to just compile it and leave it there to test than it would be to compile it and copy it to the CGI directory.

In RB, a console application is one that does not have a graphical interface - it runs on the command line. In order to create a console application, simply create a new project in REALbasic 5.5+, and select the "Console Application" template. Once that is done, RB will provide you with the shell of an application with one class called "App".

Figure 1. Starting a new console application in REALbasic.

There are two default events in a console application - "UnhandledException" and "Run". The "Run" event is triggered when the program is launched - in the case of a CGI application, it is triggered when a user requests it by typing the application's URL in her web browser.

Figure 2. Blank console application project.

Now is a good time to select the FILE a Build Settings... menu and configure the application. Select "Build for OS X" (this program has only been tested on OS X, although it should work on other platforms as well. Click on the top popup menu on the page, and select "Mac OS Settings". The only thing to change here is the name - be sure to give it a name without an extension and without spaces or punctuation. In this example, I've chosen the name "CGI", which is short and easy to type into a browser window.

Once that is done, it's time to write some code.

Since console applications do not have a graphical interface, they have to be able to input data and output data in some other fashion. For programs that are executed on the command line, this is typically referred to as "Standard Input" and "Standard Output" respectively. With a REALbasic console application, the command "INPUT" represents (you guessed it) standard input. "PRINT" sends data to standard output. In addition to standard input and output, CGI applications also make use of environment variables that are set by the web server. In order to access environment variables, you need the system object, which includes the method: System.EnvironmentVariable(), which returns the value for the environment variable that is passed to it. In the current version (5.5.1) there is a bug that causes REALbasic to crash if you try to access a variable that does not exist. This places some real limitations on what you can do, but it is supposed to be fixed in 5.5.2.

The console application "App" class is where the action is. It has two events: "Run", and "UnhandledException". The "Run" event is triggered when the application is invoked by the web server, so it is in the "Run" event that we put the main part of our code. I also created a "request" object, which is created when the "Run" method is executed. It is a sub class of Dictionary and it is used to hold the data that is passed to the CGI application from Apache. It also executes a "Write" method, that sends data back to the client browser.

The "Run" method should look like this:

#pragma disableBackgroundTasks 
request = new request
request.value("SERVER_SOFTWARE") = system.environmentVariable("SERVER_SOFTWARE")
request.value("SERVER_NAME") = system.environmentVariable("SERVER_NAME")
request.value("REQUEST_METHOD") = system.environmentVariable("REQUEST_METHOD")
request.value("QUERY_STRING") = system.environmentVariable("QUERY_STRING")
request.value("REMOTE_ADDR") = system.environmentVariable("REMOTE_ADDR")

Background tasks are disabled because Apache doesn't work well with them. If you don't disable them, every time you do a loop, or execute anything that triggers a new thread or background task, the application crashes mercilessly.

In this example, I have only gathered the minimal environment variables necessary to execute the program, because of the bug mentioned earlier. One notable environment variable missing is "HTTP_COOKIE", which is very useful if you use cookies, which provide a way to track a visitor to the site. A complete list of variables is included in the sample script, but commented out.

The two variables that matter most to use are "REQUEST_METHOD" and "QUERY_STRING". There are several kinds of requests a web server can accept. The two that concern us are "Post" requests and "Get" requests. The distinction between the two in actual practice is virtually non-existent, except that it changes the way that form data is passed to the CGI program.

Any time you fill out a form on a web page, either to log in or make a purchase, the information that you enter needs to be transferred to the server so that it can take some appropriate action. When you create a form in HTML, you have the option of selecting the request method you want to use - either "Get" or "Post". If you choose "Get", then the data from the form is encoded and sent across as part of the URL. If you use "Post", then the data is sent to the CGI program as standard input. Here is an example of a "Get" request URL:


The first step in processing a CGI request is to find out what kind of request it is, and process it accordingly. In the request class, I have implemented the following method:

#pragma disableBackgroundTasks // Throws an error during the loop
  Dim query_string, field, key, value As String
  Dim x As Integer
  query = New Dictionary
//If the REQUEST_METHOD is a "post", then get the string from standard input, 
   otherwise get it from QUERY_STRING
  If me.hasKey("REQUEST_METHOD") then
    if me.value("REQUEST_METHOD") = "POST" Then
      query_string = Input
      query_string = System.EnvironmentVariable("QUERY_STRING")
    End If
  end if
  if query_string <> "" then
    //parse the query string
    For x = 1 to CountFields(query_string, "&")
      field = NthField(query_string, "&", x)
      key = NthField(field, "=", 1)
      value = NthField(field, "=", 2)
      value = ReplaceAll(value, "+", " ")
      value = DecodeURLComponent(value)
      query.value(key) = value
  end if

The method creates a new dictionary to hold the values of the query (the data from the form). If the request method is a "Post", then the method grabs the string from standard input. If it is a "Get", then it grabs it from the environment variable "QUERY_STRING". Beyond that, everything else is the same and the string is parsed and the dictionary values are set.

We now have a request object that contains all the needed values from the request, plus the query parsed into a dictionary. Normally, this would be sent to some method that would provide a response based upon the content of the query. For our example, we'll just send back to the client all the information stored in the request object.

To send data back to the client, we need to send some header information followed by an HTML string.

#pragma disableBackgroundTasks
  // simple write method that returns the data in the request.
  dim output as string
  dim html as string
  dim requestString, queryString as string  
  dim x,y as integer 
  // set the value for "Content-type", followed by a blank line
  output = "Content-type: text/html" + chr(13) + chr(10) + chr(13) + chr(10)
  // create the html string 
  html = "<html><head><title>TestOutput</title></head><body>"
  y = me.count
  for x = 0 to y-1
    requestString = requestString + me.key(x) + ": " + me.value(me.key(x)) + "<br />"
  y = me.query.count
  for x = 0 to y-1
    queryString = queryString + me.query.key(x) + ": " + me.query.value(me.query.key(x)) + "<br />"
  html = html + requestString + queryString + "</body></html>"
  output = output + html
  print output

If you placed the application in the /Library/WebServer/CGI-Executables directory, and set the application name as "CGI", then you should be able to access the script from the following URL:


You should be able to paste it in the browser, hit return, and then get back a list of the variables. If you want to test the query string, then enter a URL like the following:


Figure 3. Results of CGI application.

You now have a good starting point for writing CGI programs in REALbasic for Apache. One thing you'll notice, especially if you have a lot of traffic on your site, is that CGI can be slow at times. The reason for this is that the program has to be started up with each request, which produces a lot of overhead. The downside to RB is that it produces large executable files - about 1.3 MB for this simple CGI program, so the particular solution is best limited to low-traffic sites. Because of this, there have been a variety of CGI workarounds that speed up the process. They way they work is that instead of invoking the program each time it is requested, the program stays resident in memory and handles the requests as they come in. This is usually accomplished with an Apache plug-in. This is an interesting approach that can be used with REALbasic as well - and you don't need to rely on console programming.

I developed an RB application that worked with an Apache plug-in called "mod_scgi". Mod_scgi works by taking the data that Apache would normally send as environment variables to a CGI program, and instead sends it as a block of data over a TCP connection. Using REALbasic's networking abilities, you can create a SocketServer that creates a pool of TCPSockets that listen on the appropriate port, gets the data when it is available, parses it and acts on it just like a CGI program. As soon as the individual socket is done, instead of exiting, it returns to listening on the port for the next request. This creates a huge performance boost, and is a tactic that should be considered if you expect a lot of traffic to your site.

The original (and best) guide to CGI from the inventor's of Mosaic, NCSA:

Mark Choate


All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.