TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats

CSV, OML, XML, YAML...

by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.

CSV

Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

/etc/crontab
  15 3 * * * root periodic daily
/etc/gettytab
  a|std.110|110-baud:\
        :np:nd#1:cd#1:uc:sp#110:
/var/log/netinfo.log
  Dec 21 17:59:33 cerberus netinfod ...
An_Excel_File.csv
  1,2,3
  "1,2,3","3,4,5"

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...

OML

Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
<snap>
  <file>ASCII text</>
  <flags>f,avbstclinmed,,</>
  <lstat>303507,33200,1,1000,20,914,8</>
  <md5>2e1240f444fc3f984186fc5a4fd28eb0</>
  <times>1040087752,230,1740993,10890145</>
</snap>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; www.xml.com) for some ideas, however.

XML

Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; www.w3.org) is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site (www.xml.com) is a good place to start your journey...

YAML

CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
snap:
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
snap:
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text
    isn't.

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site (www.yaml.org) is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; www.xml.com) for an informal introduction.


Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (www.ptf.com), a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at rdm@ptf.com.

 
AAPL
$102.50
Apple Inc.
+0.25
MSFT
$45.43
Microsoft Corpora
+0.55
GOOG
$571.60
Google Inc.
+2.40

MacTech Search:
Community Search:

Software Updates via MacUpdate

VueScan 9.4.41 - Scanner software with a...
VueScan is a scanning program that works with most high-quality flatbed and film scanners to produce scans that have excellent color fidelity and color balance. VueScan is easy to use, and has... Read more
Cloud 3.0.0 - File sharing from your men...
Cloud is simple file sharing for the Mac. Drag a file from your Mac to the CloudApp icon in the menubar and we take care of the rest. A link to the file will automatically be copied to your clipboard... Read more
LibreOffice 4.3.1.2 - Free Open Source o...
LibreOffice is an office suite (word processor, spreadsheet, presentations, drawing tool) compatible with other major office suites. The Document Foundation is coordinating development and... Read more
SlingPlayer Plugin 3.3.20.505 - Browser...
SlingPlayer is the screen interface software that works hand-in-hand with the hardware inside the Slingbox to make your TV viewing experience just like that at home. It features an array of... Read more
Get Lyrical 3.8 - Auto-magically adds ly...
Get Lyrical auto-magically add lyrics to songs in iTunes. You can choose either a selection of tracks, or the current track. Or turn on "Active Tagging" to get lyrics for songs as you play them.... Read more
Viber 4.2.2 - Send messages and make cal...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device,... Read more
Cocktail 7.6 - General maintenance and o...
Cocktail is a general purpose utility for OS X that lets you clean, repair and optimize your Mac. It is a powerful digital toolset that helps hundreds of thousands of Mac users around the world get... Read more
LaunchBar 6.1 - Powerful file/URL/email...
LaunchBar is an award-winning productivity utility that offers an amazingly intuitive and efficient way to search and access any kind of information stored on your computer or on the Web. It provides... Read more
Maya 2015 - Professional 3D modeling and...
Maya is an award-winning software and powerful, integrated 3D modeling, animation, visual effects, and rendering solution. Because Maya is based on an open architecture, all your work can be scripted... Read more
BBEdit 10.5.12 - Powerful text and HTML...
BBEdit is the leading professional HTML and text editor for the Mac. Specifically crafted in response to the needs of Web authors and software developers, this award-winning product provides a... Read more

Latest Forum Discussions

See All

This Week at 148Apps: August 25-29, 2014
Shiny Happy App Reviews   | Read more »
Qube Kingdom – Tips, Tricks, Strategies,...
Qube Kingdom is a tower defense game from DeNA. You rally your troops – magicians, archers, knights, barbarians, and others – and fight against an evil menace looking to dominate your kingdom of tiny squares. Planning a war isn’t easy, so here are a... | Read more »
Qube Kingdom Review
Qube Kingdom Review By Nadia Oxford on August 29th, 2014 Our Rating: :: KIND OF A SQUARE KINGDOMUniversal App - Designed for iPhone and iPad Qube Kingdom has cute visuals, but it’s a pretty basic tower defense game at heart.   | Read more »
Fire in the Hole Review
Fire in the Hole Review By Rob Thomas on August 29th, 2014 Our Rating: :: WALK THE PLANKUniversal App - Designed for iPhone and iPad Seafoam’s Fire in the Hole looks like a bright, 8-bit throwback, but there’s not enough booty to... | Read more »
Alien Creeps TD is Now Available Worldwi...
Alien Creeps TD is Now Available Worldwide Posted by Ellis Spice on August 29th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Dodo Master Review
Dodo Master Review By Jordan Minor on August 29th, 2014 Our Rating: :: NEST EGGiPad Only App - Designed for the iPad Dodo Master is tough but fair, and that’s what makes it a joy to play.   | Read more »
Motorsport Manager Review
Motorsport Manager Review By Lee Hamlet on August 29th, 2014 Our Rating: :: MARVELOUS MANAGEMENTUniversal App - Designed for iPhone and iPad Despite its depth and sense of tactical freedom, Motorsport Manager is one of the most... | Read more »
Motorsport Manager – Beginner Tips, Tric...
The world of Motorsport management can be an unforgiving and merciless one, so to help with some of the stress that comes with running a successful race team, here are a few hints and tips to leave your opponents in the dust. | Read more »
CalPal Update Brings the App to 2.0, Add...
CalPal Update Brings the App to 2.0, Adds Lots of New Stuff Posted by Ellis Spice on August 29th, 2014 [ permalink ] | Read more »
Baseball Battle Review
Baseball Battle Review By Jennifer Allen on August 29th, 2014 Our Rating: :: SIMPLE HITTINGUniversal App - Designed for iPhone and iPad Simple and cute, Baseball Battle is a fairly fun baseball game for those looking for something... | Read more »

Price Scanner via MacPrices.net

Labor Day Weekend MacBook Pro sale; 15-inch m...
B&H Photo has the new 2014 15″ Retina MacBook Pros on sale for up to $125 off MSRP. Shipping is free, and B&H charges NY sales tax only. They’ll also include free copies of Parallels Desktop... Read more
Labor Day Weekend iPad mini sale; $50 to $100...
Best Buy has the iPad mini with Retina Display (WiFi models) on sale for $50 off MSRP on their online store for Labor Day Weekend. Choose free shipping or free local store pick up. Price is for... Read more
13-inch 1.4GHz MacBook Air on sale for $899,...
Adorama has the new 2014 13″ 1.4GHz/128GB MacBook Air on sale for $899.99 including free shipping plus NY & NJ tax only. Their price is $100 off MSRP. Read more
It’s Official: Apple Issues Invitations To Se...
Apple has issued one of its characteristically cryptic press invitations for a special event to be held at the Flint Center for the Performing Arts in hometown Cupertino on Sept. 9, 2014 at 10:00 am... Read more
Tablet Shipments To See First On-year Decline...
TrendForce analyst Caroline Chen notes that when the iPad launched in 2010, it was an instant hit and spurred a tablet PC revolution, with tablets so popular that that notebook PC sales stagnated and... Read more
SOBERLINK Releases Apple iOS Compatible Handh...
Cypress, California based SOBERLINK, Inc., creator of the first handheld Breathalyzer designed to improve recovery outcomes, continues to show prominence in the mobile alcohol monitoring space with... Read more
New 21″ 1.4GHz iMac on sale again for $999, s...
Best Buy has the new 21″ 1.4GHz iMac on sale for $999.99 on their online store. Their price is $100 off MSRP. Choose free shipping or free local store pick up. Price is for online orders only, in-... Read more
Smartphone Outlook Remains Strong for 2014, U...
According to a new mobile phone forecast from the International Data Corporation (IDC) Worldwide Quarterly Mobile Phone Tracker, more than 1.25 billion smartphones will be shipped worldwide in 2014,... Read more
Save up to $60 with Apple refurbished iPod to...
The Apple Store has Apple Certified Refurbished 5th generation iPod touches available starting at $149. Apple’s one-year warranty is included with each model, and shipping is free. Many, but not all... Read more
12-Inch MacBook Air Coming in 4Q14 or 2015 –...
Digitimes’ Aaron Lee and Joseph Tsai report that according to Taiwan-based upstream supply chain insiders, Apple plans to launch a thinner MacBook model either at year end 2014 or in 2015, and that... Read more

Jobs Board

*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
Senior Event Manager, *Apple* Retail Market...
…This senior level position is responsible for leading and imagining the Apple Retail Team's global event strategy. Delivering an overarching brand story; in-store, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.