Quoting in the shell
Volume Number: 23 (2007)
Issue Number: 09
Column Tag: MacEnterprise
Quoting in the shell
Dealing with interesting characters
in shell programming
By Philip Rinehart, Yale University
This month, a rather interesting problem arose on the Macenterprise mailing list. How are quotes used and dealt with when scripting for the bash shell? It can be a particularly difficult problem, as OS X allows the use of non-standard characters and spaces in filenames. If quoting is not done properly, unexpected, and even disastrous results may occur. Anyone, remember iTunes 2.0 and destruction of hard drive data? It was a quoting problem! Thus, it is extremely important to quote things properly when shell programming, both for unexpected results, as well as properly sanitizing input.
Let's start the process by looking at the use of single quotes in bash shell scripting. The original question was about the use of a directory listing. The directory listing was then piped to a second command, which failed due to spaces in some of the directory names. The specific example:
ls -F /Applications
Try it. Note how many Applications contain spaces in the name. If Adobe Acrobat 8 Professional were installed, any shell script attempting to perform an action on the path would fail. Why? As the shell interprets spaces as input separators, Adobe, Acrobat, 8 and Professional are all seen as individual paths. In our above example, if the chmod operation were being performed, "Adobe", "Acrobat", "8" and "Professional" would all be modified by chmod. This action would fail naturally, as each of these paths does not exist. Would using single quotes help?
Single quotes treat all information contained between single quotes as exact character data in the bash shell. In practice, single quotes are extraordinarily difficult to use as variables and are not expanded. Even the backslash character is not considered special within single quotes. In our example, this fact means that a variable cannot be used in single quotes when shell scripting. As a result, single quotes aren't going to solve the problem above, so a different solution is required.
What about the use of double quotes? Can they solve the problem? Hmm, it get's a bit closer this time. Again, let's use the above list command. When assigned to a variable, multiple words will be treated as a single word. Wow, what a mouthful! Here's a better example. Let's use Adobe Acrobat 8 Professional again. When talking about single quotes, remember the fact that each word is treated separately? With double quoting, the entire word remains intact, and is not split. So, if the variable test is assigned and then echoed out using double quotes, the entire word, Adobe Acrobat 8 Professional, is used. A brief shell script snippet to further reinforce this concept:
test="Adobe Acrobat 8 Professional"
printf "Single word is $test\n"
Note, that when these commands are entered and run that the variable is correctly displayed with printf. Without the quotes, only the word Single is printed. Does this solution work for the problem encountered on the list? Well, not quite. It certainly gets somewhat closer, as spaces are correctly preserved with double quotes, but command substitutions with variables still see each space separated entry as a separate word. If single and double quotes won't solve the problem, what will? Can the field separator be altered, and will it solve the problem?
IFS? Never heard of it before? IFS stands for Internal Field Separator. It determines how bash interprets word boundaries. By default, the three special characters, space, tab, and newline are used. Using the Adobe Acrobat 8 example, it now should be somewhat clearer as to why each is treated as a separate word. bash reads each space in the name as a field separator. With the default value for IFS, command substitution treats each space as a delimiter. This can be changed. At the top of each shell script, set the IFS variable. Any character can be the field separator, from a semi-colon, space, to a comma. The Macenterprise list came up with the following solution for the problem when a directory listing contains spaces:
The author inserted this line at the beginning of the shell script. Now that the IFS separator has been reset, spaces are no longer recognized as valid field separators. The problem is now gone, and the variable is correctly assigned. It is also useful to note that resetting IFS can be used to ensure that the IFS variable has not been improperly set before any shell script is run. As a security precaution when writing a shell script, just as the PATH variable is set, make it a habit to also set the IFS variable. If input is being read in, or parsed, any spoofing or accidental resetting of this variable becomes a non-issue.
Let's return to the original list command that the article opened with. The original question wanted to take a list of directories and then change their permissions. Jeff McCune from The Ohio State University suggested another method on the list. He used the shell find utility to accomplish the same thing in a more efficient and unix-y way. Let's start with the command in its full form:
find /Applications -type d -maxdepth 1 -mindepth 1 -print0 | xargs -0 chmod 750
O.K., that's a bit much! As the find utility may be new to some, let's break it down into its components. The second argument, /Applications, can be any directory that find will operate on. In this case, it is looking in the Applications directory. The next argument, -type, looks for directories, as specified by the option d. After that, the next two arguments instruct the find command not to descend into any of the Application packages. It is accomplished with the maxdepth and mindepth switches. The next option is where the real magic begins. The option, print0, prints the result to standard output, terminating the output with the ASCII NUL character. It will become clear why this option is important in a second. All the results from the command are printed without a terminating newline. Try just the first part of the command before the pipe character, both with -print and -print0. Note the difference, as it is critical to this commands proper function.
Now that the find results are complete, it is time to process them using xargs. xargs takes an argument list, and processes the results with any other command line utility. Remember the use of print0. As the NUL character terminates the output, xargs must use the -0 flag. Without it, xargs does not know how to interpret the results from the find command. This technique is also generally considered faster by Unix gurus, as it only executes the chmod command once, which may not be the case for a shell loop, or by using the exec switch with the find utility.
As you can see, the problem presented on the list is not as straightforward as one might expect. As always, there is more than one way to solve the problem, but with a full understanding of the problem's intricacies, it should now be trivial to solve. Until next month, see you on the lists!
Philip Rinehart is co-chair of the steering committee leading the Mac OS X Enterprise Project (macenterprise.org) and is the Lead Mac Analyst at Yale University. He has been using Macintosh Computers since the days of the Macintosh SE, and Mac OS X since its Developer Preview Release. Before coming to Yale, he worked as a Unix system administrator for a dot-com company. He can be reached at: firstname.lastname@example.org.
The MacEnterprise project is a community of IT professionals sharing information and solutions to support Macs in an enterprise. We collaborate on the deployment, management, and integration of Mac OS X client and server computers into multi-platform computing environments.