TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats

CSV, OML, XML, YAML...

by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.

CSV

Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

/etc/crontab
  15 3 * * * root periodic daily
/etc/gettytab
  a|std.110|110-baud:\
        :np:nd#1:cd#1:uc:sp#110:
/var/log/netinfo.log
  Dec 21 17:59:33 cerberus netinfod ...
An_Excel_File.csv
  1,2,3
  "1,2,3","3,4,5"

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...

OML

Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
<snap>
  <file>ASCII text</>
  <flags>f,avbstclinmed,,</>
  <lstat>303507,33200,1,1000,20,914,8</>
  <md5>2e1240f444fc3f984186fc5a4fd28eb0</>
  <times>1040087752,230,1740993,10890145</>
</snap>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; www.xml.com) for some ideas, however.

XML

Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; www.w3.org) is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site (www.xml.com) is a good place to start your journey...

YAML

CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
snap:
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
snap:
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text
    isn't.

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site (www.yaml.org) is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; www.xml.com) for an informal introduction.


Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (www.ptf.com), a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at rdm@ptf.com.

 
AAPL
$501.11
Apple Inc.
+2.43
MSFT
$34.64
Microsoft Corpora
+0.15
GOOG
$898.03
Google Inc.
+16.02

MacTech Search:
Community Search:

Software Updates via MacUpdate

CrossOver 12.5.1 - Run Windows apps on y...
CrossOver can get your Windows productivity applications and PC games up and running on your Mac quickly and easily. CrossOver runs the Windows software that you need on Mac at home, in the office,... Read more
Paperless 2.3.1 - Digital documents mana...
Paperless is a digital documents manager. Remember when everyone talked about how we would soon be a paperless society? Now it seems like we use paper more than ever. Let's face it - we need and we... Read more
Apple HP Printer Drivers 2.16.1 - For OS...
Apple HP Printer Drivers includes the latest HP printing and scanning software for Mac OS X 10.6, 10.7 and 10.8. For information about supported printer models, see this page.Version 2.16.1: This... Read more
Yep 3.5.1 - Organize and manage all your...
Yep is a document organization and management tool. Like iTunes for music or iPhoto for photos, Yep lets you search and view your documents in a comfortable interface, while offering the ability to... Read more
Apple Canon Laser Printer Drivers 2.11 -...
Apple Canon Laser Printer Drivers is the latest Canon Laser printing and scanning software for Mac OS X 10.6, 10.7 and 10.8. For information about supported printer models, see this page.Version 2.11... Read more
Apple Java for Mac OS X 10.6 Update 17 -...
Apple Java for Mac OS X 10.6 delivers improved security, reliability, and compatibility by updating Java SE 6.Version Update 17: Java for Mac OS X 10.6 Update 17 delivers improved security,... Read more
Arq 3.3 - Online backup (requires Amazon...
Arq is online backup for the Mac using Amazon S3 and Amazon Glacier. It backs-up and faithfully restores all the special metadata of Mac files that other products don't, including resource forks,... Read more
Apple Java 2013-005 - For OS X 10.7 and...
Apple Java for OS X 2013-005 delivers improved security, reliability, and compatibility by updating Java SE 6 to 1.6.0_65. On systems that have not already installed Java for OS X 2012-006, this... Read more
DEVONthink Pro 2.7 - Knowledge base, inf...
Save 10% with our exclusive coupon code: MACUPDATE10 DEVONthink Pro is your essential assistant for today's world, where almost everything is digital. From shopping receipts to important research... Read more
VirtualBox 4.3.0 - x86 virtualization so...
VirtualBox is a family of powerful x86 virtualization products for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers... Read more

Briquid Gets Updated with New Undo Butto...
Briquid Gets Updated with New Undo Button, Achievements, and Leaderboards, on Sale for $0.99 Posted by Andrew Stevens on October 16th, 2013 [ | Read more »
Halloween – iLovecraft Brings Frightenin...
Halloween – iLovecraft Brings Frightening Stories From Author H.P. | Read more »
The Blockheads Creator David Frampton Gi...
The Blockheads Creator David Frampton Gives a Postmortem on the Creation Process of the Game Posted by Andrew Stevens on October 16th, 2013 [ permalink ] Hey, a | Read more »
Sorcery! Enhances the Gameplay in Latest...
Sorcery! | Read more »
It Came From Australia: Tiny Death Star
NimbleBit and Disney have teamed up to make Star Wars: Tiny Death Star, a Star Wars take on Tiny Tower. Right now, the game is in testing in Australia (you will never find a more wretched hive of scum and villainy) but we were able to sneak past... | Read more »
FIST OF AWESOME Review
FIST OF AWESOME Review By Rob Rich on October 16th, 2013 Our Rating: :: TALK TO THE FISTUniversal App - Designed for iPhone and iPad A totalitarian society of bears is only the tip of the iceberg in this throwback brawler.   | Read more »
PROVERBidioms Paints English Sayings in...
PROVERBidioms Paints English Sayings in a Picture for Users to Find Posted by Andrew Stevens on October 16th, 2013 [ permalink ] | Read more »
OmniFocus 2 for iPhone Review
OmniFocus 2 for iPhone Review By Carter Dotson on October 16th, 2013 Our Rating: :: OMNIPOTENTiPhone App - Designed for the iPhone, compatible with the iPad OmniFocus 2 for iPhone is a task management app for people who absolutely... | Read more »
Ingress – Google’s Augmented-Reality Gam...
Ingress – Google’s Augmented-Reality Game to Make its Way to iOS Next Year Posted by Andrew Stevens on October 16th, 2013 [ permalink ] | Read more »
CSR Classics is Full of Ridiculously Pre...
CSR Classics is Full of Ridiculously Pretty Classic Automobiles Posted by Rob Rich on October 16th, 2013 [ permalink ] | Read more »

Price Scanner via MacPrices.net

Apple Store Canada offers refurbished 11-inch...
 The Apple Store Canada has Apple Certified Refurbished 2013 11″ MacBook Airs available starting at CDN$ 849. Save up to $180 off the cost of new models. An Apple one-year warranty is included with... Read more
Updated MacBook Price Trackers
We’ve updated our MacBook Price Trackers with the latest information on prices, bundles, and availability on MacBook Airs, MacBook Pros, and the MacBook Pros with Retina Displays from Apple’s... Read more
13-inch Retina MacBook Pros on sale for up to...
B&H Photo has the 13″ 2.5GHz Retina MacBook Pro on sale for $1399 including free shipping. Their price is $100 off MSRP. They have the 13″ 2.6GHz Retina MacBook Pro on sale for $1580 which is $... Read more
AppleCare Protection Plans on sale for up to...
B&H Photo has 3-Year AppleCare Warranties on sale for up to $105 off MSRP including free shipping plus NY sales tax only: - Mac Laptops 15″ and Above: $244 $105 off MSRP - Mac Laptops 13″ and... Read more
Apple’s 64-bit A7 Processor: One Step Closer...
PC Pro’s Darien Graham-Smith reported that Canonical founder and Ubuntu Linux creator Mark Shuttleworth believes Apple intends to follow Ubuntu’s lead and merge its desktop and mobile operating... Read more
MacBook Pro First, Followed By iPad At The En...
French site Info MacG’s Florian Innocente says he has received availability dates and order of arrival for the next MacBook Pro and the iPad from the same contact who had warned hom of the arrival of... Read more
Chart: iPad Value Decline From NextWorth
With every announcement of a new Apple device, serial upgraders begin selling off their previous models – driving down the resale value. So, with the Oct. 22 Apple announcement date approaching,... Read more
SOASTA Survey: What App Do You Check First in...
SOASTA Inc., the leader in cloud and mobile testing announced the results of its recent survey showing which mobile apps are popular with smartphone owners in major American markets. SOASTA’s survey... Read more
Apple, Samsung Reportedly Both Developing 12-...
Digitimes’ Aaron Lee and Joseph Tsai report that Apple and Samsung Electronics are said to both be planning to release 12-inch tablets, and that Apple is currently cooperating with Quanta Computer on... Read more
Apple’s 2011 MacBook Pro Lineup Suffering Fro...
Appleinsider’s Shane Cole says that owners of early-2011 15-inch and 17-inch MacBook Pros are reporting issues with those models’ discrete AMD graphics processors, which in some cases results in the... Read more

Jobs Board

*Apple* Retail - Manager - Apple (United Sta...
Job SummaryKeeping an Apple Store thriving requires a diverse set of leadership skills, and as a Manager, youre a master of them all. In the stores fast-paced, dynamic Read more
*Apple* Support / *Apple* Technician / Mac...
Apple Support / Apple Technician / Mac Support / Mac Set up / Mac TechnicianMac Set up and Apple Support technicianThe person we are looking for will have worked Read more
Senior Mac / *Apple* Systems Engineer - 318...
318 Inc, a top provider of Apple solutions is seeking a new Senior Apple Systems Engineer to be based out of our Santa Monica, California location. We are a Read more
*Apple* Retail - Manager - Apple Inc. (Unite...
Job Summary Keeping an Apple Store thriving requires a diverse set of leadership skills, and as a Manager, you’re a master of them all. In the store’s fast-paced, Read more
*Apple* Solutions Consultant - Apple (United...
**Job Summary** Apple Solutions Consultant (ASC) - Retail Representatives Apple Solutions Consultants are trained by Apple on selling Apple -branded products Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.