TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats

CSV, OML, XML, YAML...

by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.

CSV

Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

/etc/crontab
  15 3 * * * root periodic daily
/etc/gettytab
  a|std.110|110-baud:\
        :np:nd#1:cd#1:uc:sp#110:
/var/log/netinfo.log
  Dec 21 17:59:33 cerberus netinfod ...
An_Excel_File.csv
  1,2,3
  "1,2,3","3,4,5"

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...

OML

Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
<snap>
  <file>ASCII text</>
  <flags>f,avbstclinmed,,</>
  <lstat>303507,33200,1,1000,20,914,8</>
  <md5>2e1240f444fc3f984186fc5a4fd28eb0</>
  <times>1040087752,230,1740993,10890145</>
</snap>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; www.xml.com) for some ideas, however.

XML

Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; www.w3.org) is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site (www.xml.com) is a good place to start your journey...

YAML

CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
snap:
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
snap:
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text
    isn't.

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site (www.yaml.org) is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; www.xml.com) for an informal introduction.


Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (www.ptf.com), a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at rdm@ptf.com.

 
AAPL
$439.66
Apple Inc.
-3.27
MSFT
$34.85
Microsoft Corpora
-0.23
GOOG
$906.97
Google Inc.
-1.56

MacTech Search:
Community Search:

Software Updates via MacUpdate

KeyCue 6.5 - Displays all menu shortcut...
KeyCue helps you to use your OS X applications more effectively. Just hold down the Command key for a while - KeyCue comes to help and shows a table of all currently available keyboard shortcuts.... Read more
HoudahSpot 3.7.8 - Advanced front-end fo...
HoudahSpot is a flexible file-search tool based on Apple's powerful Spotlight engine. Keep frequently used files within reach Retrieve the files you didn't know you still had Don't waste time... Read more
Cobook Contacts 1.2.6 - Intelligent addr...
Cobook Contacts is a better address book that makes contact management enjoyable for millions of people every day. Find contacts faster and organize them with tags. Get integrated social profiles... Read more
AppDelete 4.0.7 - Delete your unwanted a...
AppDelete is an uninstaller for Macs that will remove not only applications but also widgets, preference panes, plugins and screensavers along with their associated files. Without AppDelete these... Read more
OnyX 2.6.9 - Maintenance and optimizatio...
OnyX is a multifunctional utility for OS X. It allows you to verify the startup disk and the structure of its System files, to run miscellaneous tasks of system maintenance, to configure the hidden... Read more
Apple iTunes 11.0.3 - Manage your music,...
Apple iTunes lets you organize and play digital music and video on your computer. It can automatically download new music, app, and book purchases across all your devices and computers. And it's a... Read more
Spotify 0.9.0.133. - Stream music, creat...
Spotify is a new way to enjoy music. Simply download and install. Before you know it you'll be singing along to the genre, artist, or song of your choice. With Spotify you are never far away from... Read more
JollysFastVNC 1.46 - Fast VNC client. (S...
JollysFastVNC is a VNC client which aims to become the best VNC client on the Mac. When I started ScreenRecycler I thought that there are enough VNC clients out there to support it. When the program... Read more
Skitch 2.5.2 - Take screenshots, annotat...
Skitch allows you to take screenshots on your Mac, edit them and share them with others. It makes the sharing process seamless by making it a natural workflow to send the image (with edited arrows... Read more
Backblaze 2.1.0.608 - Online backup serv...
Backblaze is an online backup service, available fo $5/month for unlimited storage. With half of the founding team heralding from Apple, Backblaze is deeply committed to the Mac platform. The... Read more

Blitz Brigade Review
Blitz Brigade Review By Andrew Stevens on May 21st, 2013 Our Rating: :: CHAMPION KILLERUniversal App - Designed for iPhone and iPad Blitz Brigade is an enjoyable first-person shooter where players fight online in multiple gameplay... | Read more »
gMusic Submits Update To Bring Google’s...
gMusic Submits Update To Bring Google’s All Access Streaming Music Service To iOS Posted by Andrew Stevens on May 21st, 2013 [ permalink ] gMusic: A Google Mus | Read more »
CandyMeleon Review
CandyMeleon Review By Blake Grundman on May 21st, 2013 Our Rating: :: SWEETLY ADDICTIVEUniversal App - Designed for iPhone and iPad Who could say no to a Chameleon that is this cute? Feed his sweet tooth and you will see just how... | Read more »
Fire & Forget: The Final Assault Rev...
Fire & Forget: The Final Assault Review By Rob Rich on May 21st, 2013 Our Rating: :: MY CAR IS FIGHTUniversal App - Designed for iPhone and iPad Fire & Forget: The Final Assault is one crazy post-apocalyptic ride.   | Read more »
Appy Geek Updates With Enhanced Design a...
Appy Geek Updates With Enhanced Design and Customizable Home Screen Posted by Andrew Stevens on May 21st, 2013 [ permalink ] | Read more »
What’s the Deal with rymdkapsel?
rymdkapsel made a bit of a splash when it was released on the PlayStation Vita a few weeks ago. And in another couple of months this excessively minimal and abstract strategic base building “sim” will be making its way on to the App Store for... | Read more »
Star Command Getting Exploding Ships, Sp...
Star Command Getting Exploding Ships, Spreading Fires, and Away Teams In Future Updates Posted by Andrew Stevens on May 21st, 2013 [ permalink ] | Read more »
Catch a Ninja Review
Catch a Ninja Review By Jordan Minor on May 21st, 2013 Our Rating: :: CATCH AND RELEASEiPhone App - Designed for the iPhone, compatible with the iPad It turns out ninjas aren’t that much tougher than fruit.   | Read more »
The Portable Podcast, Episode 186
On This Episode: Carter and Kurt Bieg of Simple Machine talk about his studio’s new release, Tomb Breaker, how it spawned from a nearly-complete prototype of another game, and how it fits in with his other titles, Circadia and Twirdie. Break into... | Read more »
Flickr Upgrades Its Free Users To 1 Tera...
Flickr Upgrades Its Free Users To 1 Terabyte Of Photo And Video Storage Posted by Andrew Stevens on May 21st, 2013 [ permalink ] | Read more »

Price Scanner via MacPrices.net

iPads with Retina Displays (Apple refurbished) ava...
The Apple Store has Apple Certified Refurbished 4th generation iPads with Retina Displays, Wi-Fi & Cellular, available for $50 off MSRP. Apple’s one-year warranty is included with each iPad, and... Read more
Apple MacBook Orders To Rise 20% Sequentially In 2...
Digitimes’ Aaron Lee and Joseph Tsai say that with Apple ready to release its new MacBook products in the near future, sources from the upstream supply chain have revealed that orders for MacBook... Read more
Trial Production of 5th-Generation iPad To Begin R...
Digitimes’ Max Wang and Adam Hwang report that trial production of Apple’s 5th-generation 9.7-inch iPad will begin soon with volume production to begin in July, and monthly shipments ramping up to 2-... Read more
Dell’s $100 Thumb-Sized Android PC To Ship In July...
9to5google.com says that Dell’s Project Orphelia, a thumb-sized drive that turns any display with an HDMI port into an Android PC, is to start shipping in July at a price of around $100 according to... Read more
MacBook Airs (Apple refurbished) available startin...
 The Apple Store has Apple Certified Refurbished 2012 MacBook AIrs available for up to $240 off MSRP, with models starting at $849. An Apple one-year warranty is included with each model, and... Read more
Updated Mac Pro, iMac, and Mac mini Price Trackers
We’ve updated our Mac Pro Price Tracker, iMac Price Tracker, and Mac mini Price Tracker with the latest information on prices, bundles, and availability from Apple’s Authorized Internet/Catalog... Read more
Updated MacBook Price Trackers
We’ve updated our MacBook Price Trackers with the latest information on prices, bundles, and availability on MacBook Airs, MacBook Pros, and the MacBook Pros with Retina Displays from Apple’s... Read more
15″ 2.3GHz MacBook Pro on sale for $1659 w/free bu...
B&H Photo has the 15″ 2.3GHz MacBook Pro on sale for $1659 including free shipping. Their price is $140 off MSRP. B&H will include free copies of Parallels Desktop, Bento Database, and LoJack... Read more
15-inch Retina MacBook Pros on sale for $200 off M...
 B&H Photo has 15″ Retina MacBook Pros on sale for $200 off MSRP including free shipping. B&H will also include free copies of Parallels Desktop, Bento Database, and LoJack for Laptops... Read more
Apple refurbished iPad minis available starting at...
The Apple Store has a full lineup of Apple Certified Refurbished iPad minis available starting at $299 – up to $40 off new models. Apple’s one-year warranty is included with each mini, and shipping... Read more

Jobs Board

*Apple* At-Home Team Manager - Apple (U...
Changing the world is all in a day's work at Apple . If you love innovation, here's your chance to make a career of it. You'll work hard. But the job comes with more than Read more
Class 1 District *Apple* Technician -...
QUALIFICATIONS: High School diploma Associate Degree in Technology preferred. Apple Certified Support Professional Mac OS X 10.5, 10.6, 10.7, 10.8 Apple Certified Read more
*Apple* Infrastructure Engineer II - Ba...
39964 Apple Infrastructure Engineer II Full Time Regular posted 04/22/2013 San Ramon, CA San Francisco, CA Requirements What sets Bank of the West apart from other banks Read more
*Apple* Retail - Manager - Apple (Unite...
Job SummaryKeeping an Apple Store thriving requires a diverse set of leadership skills, and as a Manager, youre a master of them all. In the stores fast-paced, dynamic Read more
*Apple* At-Home Team Manager - Apple (U...
Changing the world is all in a day's work at Apple . If you love innovation, here's your chance to make a career of it. You'll work hard. But the job comes with more than Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.