TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats

CSV, OML, XML, YAML...

by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.

CSV

Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

/etc/crontab
  15 3 * * * root periodic daily
/etc/gettytab
  a|std.110|110-baud:\
        :np:nd#1:cd#1:uc:sp#110:
/var/log/netinfo.log
  Dec 21 17:59:33 cerberus netinfod ...
An_Excel_File.csv
  1,2,3
  "1,2,3","3,4,5"

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...

OML

Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
<snap>
  <file>ASCII text</>
  <flags>f,avbstclinmed,,</>
  <lstat>303507,33200,1,1000,20,914,8</>
  <md5>2e1240f444fc3f984186fc5a4fd28eb0</>
  <times>1040087752,230,1740993,10890145</>
</snap>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; www.xml.com) for some ideas, however.

XML

Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; www.w3.org) is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site (www.xml.com) is a good place to start your journey...

YAML

CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
snap:
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
snap:
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text
    isn't.

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site (www.yaml.org) is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; www.xml.com) for an informal introduction.


Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (www.ptf.com), a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at rdm@ptf.com.

 
AAPL
$98.20
Apple Inc.
-0.18
MSFT
$43.46
Microsoft Corpora
-0.43
GOOG
$586.32
Google Inc.
+0.71

MacTech Search:
Community Search:

Software Updates via MacUpdate

Mellel 3.3.6 - Powerful word processor w...
Mellel is the leading word processor for OS X and has been widely considered the industry standard since its inception. Mellel focuses on writers and scholars for technical writing and multilingual... Read more
LibreOffice 4.3.0.4 - Free Open Source o...
LibreOffice is an office suite (word processor, spreadsheet, presentations, drawing tool) compatible with other major office suites. The Document Foundation is coordinating development and... Read more
Freeway Pro 7.0 - Drag-and-drop Web desi...
Freeway Pro lets you build websites with speed and precision... without writing a line of code! With it's user-oriented drag-and-drop interface, Freeway Pro helps you piece together the website of... Read more
Drive Genius 3.2.4 - Powerful system uti...
Drive Genius is an OS X utility designed to provide unsurpassed storage management. Featuring an easy-to-use interface, Drive Genius is packed with powerful tools such as a drive optimizer, a... Read more
Vitamin-R 2.15 - Personal productivity t...
Vitamin-R creates the optimal conditions for your brain to work at its best by structuring your work into short bursts of distraction-free, highly focused activity alternating with opportunities for... Read more
Toast Titanium 12.0 - The ultimate media...
Toast Titanium goes way beyond the very basic burning in the Mac OS and iLife software, and sets the standard for burning CDs, DVDs, and now Blu-ray discs on the Mac. Create superior sounding audio... Read more
OS X Yosemite Wallpaper 1.0 - Desktop im...
OS X Yosemite Wallpaper is the gorgeous new background image for Apple's upcoming OS X 10.10 Yosemite. This wallpaper is available for all screen resolutions with a source file that measures 5,418... Read more
Acorn 4.4 - Bitmap image editor. (Demo)
Acorn is a new image editor built with one goal in mind - simplicity. Fast, easy, and fluid, Acorn provides the options you'll need without any overhead. Acorn feels right, and won't drain your bank... Read more
Bartender 1.2.20 - Organize your menu ba...
Bartender lets you organize your menu bar apps. Features: Lets you tidy your menu bar apps how you want. See your menu bar apps when you want. Hide the apps you need to run, but do not need to... Read more
TotalFinder 1.6.2 - Adds tabs, hotkeys,...
TotalFinder is a universally acclaimed navigational companion for your Mac. Enhance your Mac's Finder with features so smart and convenient, you won't believe you ever lived without them. Tab-based... Read more

Latest Forum Discussions

See All

Become a Guardians of Galactic Peace Wit...
Become a Guardians of Galactic Peace With the New Spacefaring Sim, Kairobotica. Posted by Jessica Fisher on July 30th, 2014 [ permalink ] | Read more »
Soul Guardians: Age of Midgard Review
Soul Guardians: Age of Midgard Review By George Fagundes on July 30th, 2014 Our Rating: :: SO MUCH GRIND IT CRUNCHESUniversal App - Designed for iPhone and iPad Swords and trading cards are fun, right? So is Soul Guardians: Age of... | Read more »
NFL Fantasy Football App Redesigned Ahea...
NFL Fantasy Football App Redesigned Ahead of Upcoming 2014 Season Posted by Ellis Spice on July 30th, 2014 [ permalink ] | Read more »
Matter Review
Matter Review By Jennifer Allen on July 30th, 2014 Our Rating: :: ORIGINAL PHOTO MANIPULATIONUniversal App - Designed for iPhone and iPad Matter lets you add geometric 3d shapes to your images and manipulate things so each image... | Read more »
Note Review
Note Review By Jennifer Allen on July 29th, 2014 Our Rating: :: TOO SIMPLEiPhone App - Designed for the iPhone, compatible with the iPad Note is a note taking app that’s a little too short on features to be worth its asking price... | Read more »
Chainsaw Warrior Goes on Sale & Ther...
Chainsaw Warrior Goes on Sale & There’s a Chance to Win a Copy of the Original Board Game Posted by Jennifer Allen on July 29th, 2014 [ permalink | Read more »
It Came From Canada: Tiny Tower Vegas
If you go to a casino, you might make a lot of money. If you run a casino, you’re guaranteed to make a lot of money. The choice seems pretty obvious. So while waiting for your shady real estate deals to move forward, get prepared with Tiny Tower... | Read more »
Z Hunter Review
Z Hunter Review By Lee Hamlet on July 29th, 2014 Our Rating: :: RIGHT ON TARGETUniversal App - Designed for iPhone and iPad While it might not necessarily break new ground, Z Hunter has enough tricks up its sleeve to ensure that... | Read more »
Huge Update Comes To Duet, Adding 48 New...
Huge Update Comes To Duet, Adding 48 New Stages Posted by Jennifer Allen on July 29th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Sharknado: The Video Game Available Now....
Sharknado: The Video Game Available Now. Seriously. Posted by Rob Rich on July 29th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »

Price Scanner via MacPrices.net

13-inch 2.5GHz MacBook Pro on sale for $999,...
Best Buy has the 13″ 2.5GHz MacBook Pro available for $999.99 on their online store. Choose free shipping or free instant local store pickup (if available). Their price is $100 off MSRP. Price is... Read more
WaterField Unveils 15″ Outback Solo & 13″...
Hard on the heels of Apple’s refreshed MacBook Pro Retina laptops announcement, WaterField Designs has unveiled a 15-inch version of the waxed-canvas and leather Outback Solo and a 13-inch version of... Read more
New Roxio Toast 12 Delivers Digital Media Pow...
Roxio Toast 12 is a hub for sharing digital media to virtually any platform or device. has introduced two new additions to its Roxio Toast product family – Roxio Toast 12 Titanium and Roxio Toast 12... Read more
The lowest prices on leftover Retina MacBook...
Best Buy has dropped prices on leftover 13″ and 15″ Retina MacBook Pros by up to $300 off original MSRP on their online store for a limited time. Choose free local store pickup (if available) or free... Read more
Apple Updates MacBook Pro with Retina Display...
Apple today updated its MacBook Pro with Retina display with faster processors and double the amount of memory in both entry-level configurations. MacBook Pro with Retina display features a Retina... Read more
Up to $250 price drop on leftover 15-inch Mac...
B&H Photo has dropped prices on 2013 15″ Retina MacBook Pros by as much as $250 off original MSRP. Shipping is free, and B&H charges NY sales tax only: - 15″ 2.3GHz Retina MacBook Pro: $2349... Read more
Updated MacBook Pro Price Trackers
We’ve updated our MacBook Pro Price Trackers with the latest information on prices, bundles, and availability on the new 2014 models from Apple’s authorized internet/catalog resellers as well as... Read more
Apple updates MacBook Pros with slightly fast...
Apple updated 13″ and 15″ Retina MacBook Pros today with slightly faster Haswell processors. 13″ models now ship with 8GB of RAM standard, while 15″ MacBook Pros ship with 16GB across the board. Most... Read more
Apple drops price on 13″ 2.5GHz MacBook Pro b...
The Apple Store has dropped their price for the 13″ 2.5GHz MacBook Pro by $100 to $1099 including free shipping. Read more
Apple drops prices on refurbished 2013 MacBoo...
The Apple Store has dropped prices on Apple Certified Refurbished 13″ and 15″ 2013 MacBook Pros, with model now available starting at $929. Apple’s one-year warranty is standard, and shipping is free... Read more

Jobs Board

Sr Software Lead Engineer, *Apple* Online S...
Sr Software Lead Engineer, Apple Online Store Publishing Systems Keywords: Company: Apple Job Code: E3PCAK8MgYYkw Location (City or ZIP): Santa Clara Status: Full Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
Sr. Product Leader, *Apple* Store Apps - Ap...
**Job Summary** Imagine what you could do here. At Apple , great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.