TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats

CSV, OML, XML, YAML...

by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.

CSV

Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

/etc/crontab
  15 3 * * * root periodic daily
/etc/gettytab
  a|std.110|110-baud:\
        :np:nd#1:cd#1:uc:sp#110:
/var/log/netinfo.log
  Dec 21 17:59:33 cerberus netinfod ...
An_Excel_File.csv
  1,2,3
  "1,2,3","3,4,5"

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...

OML

Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
<snap>
  <file>ASCII text</>
  <flags>f,avbstclinmed,,</>
  <lstat>303507,33200,1,1000,20,914,8</>
  <md5>2e1240f444fc3f984186fc5a4fd28eb0</>
  <times>1040087752,230,1740993,10890145</>
</snap>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; www.xml.com) for some ideas, however.

XML

Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; www.w3.org) is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site (www.xml.com) is a good place to start your journey...

YAML

CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
snap:
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
snap:
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text
    isn't.

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site (www.yaml.org) is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; www.xml.com) for an informal introduction.


Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (www.ptf.com), a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at rdm@ptf.com.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

jAlbum Pro 12.6.4 - Organize your digita...
jAlbum Pro has all the features you love in jAlbum, but comes with a commercial license. With jAlbum, you can create gorgeous custom photo galleries for the Web without writing a line of code!... Read more
jAlbum 12.6.4 - Create custom photo gall...
With jAlbum, you can create gorgeous custom photo galleries for the Web without writing a line of code! Beginner-friendly, with pro results Simply drag and drop photos into groups, choose a design... Read more
Microsoft Remote Desktop 8.0.16 - Connec...
With Microsoft Remote Desktop, you can connect to a remote PC and your work resources from almost anywhere. Experience the power of Windows with RemoteFX in a Remote Desktop client designed to help... Read more
Spotify 1.0.4.90. - Stream music, create...
Spotify is a streaming music service that gives you on-demand access to millions of songs. Whether you like driving rock, silky R&B, or grandiose classical music, Spotify's massive catalogue puts... Read more
djay Pro 1.1 - Transform your Mac into a...
djay Pro provides a complete toolkit for performing DJs. Its unique modern interface is built around a sophisticated integration with iTunes and Spotify, giving you instant access to millions of... Read more
Vivaldi 1.0.118.19 - Lightweight browser...
Vivaldi browser. In 1994, two programmers started working on a web browser. Our idea was to make a really fast browser, capable of running on limited hardware, keeping in mind that users are... Read more
Stacks 2.6.11 - New way to create pages...
Stacks is a new way to create pages in RapidWeaver. It's a plugin designed to combine drag-and-drop simplicity with the power of fluid layout. Features: Fluid Layout: Stacks lets you build pages... Read more
xScope 4.1.3 - Onscreen graphic measurem...
xScope is powerful set of tools that are ideal for measuring, inspecting, and testing on-screen graphics and layouts. Its tools float above your desktop windows and can be accessed via a toolbar,... Read more
Cyberduck 4.7 - FTP and SFTP browser. (F...
Cyberduck is a robust FTP/FTP-TLS/SFTP browser for the Mac whose lack of visual clutter and cleverly intuitive features make it easy to use. Support for external editors and system technologies such... Read more
Labels & Addresses 1.7 - Powerful la...
Labels & Addresses is a home and office tool for printing all sorts of labels, envelopes, inventory labels, and price tags. Merge-printing capability makes the program a great tool for holiday... Read more

Discover Your Reflexes With Minimalist G...
Discover O, BYOF Studios, may look simple at first with its' color matching premise and swipe controls, but as you speed up the task becomes more daunting. [Read more] | Read more »
Here's Another Roundup of Notable A...
Now that the Apple Watch is publically available (kind of), even more apps and games have been popping up for it. Some of them are updates to existing software, others are brand new. The main thing is that they're all for the Apple Watch, and if you... | Read more »
Use Batting Average and the Apple Watch...
Batting Average, by Pixolini, is designed to help you manage your statistics. Every time you go to bat, you can use your Apple Watch to track  your swings, strikes, and hits. [Read more] | Read more »
Celebrate Studio Pango's 3rd Annive...
It is time to party, Pangoland pals! Studio Pango is celebrating their 3rd birthday and their gift to you is a new update to Pangoland. [Read more] | Read more »
Become the World's Most Important D...
Must Deliver, by cherrypick games, is a top-down endless-runner witha healthy dose of the living dead. [Read more] | Read more »
SoundHound + LiveLyrics is Making its De...
SoundHound Inc. has announced that SoundHound + LiveLyrics, will be one of the first third-party apps to hit the Apple Watch. With  SoundHound you'll be able to tap on your watch and have the app recognize the music you are listening to, then have... | Read more »
Adobe Joins the Apple Watch Lineup With...
A whole tidal wave of apps are headed for the Apple Watch, and Adobe has joined in with 3 new ways to enhance your creativity and collaborate with others. The watch apps pair with iPad/iPhone apps to give you total control over your Adobe projects... | Read more »
Z Steel Soldiers, Sequel to Kavcom'...
Kavcom has released Z Steel Soldiers, which continues the story of the comedic RTS originally created by the Bitmap Brothers. [Read more] | Read more »
Seene Lets You Create 3D Images With You...
Seene, by Obvious Engineering, is a 3D capture app that's meant to allow you to create visually stunning 3D images with a tap of your finger, and then share them as a 3D photo, video or gif. [Read more] | Read more »
Lost Within - Tips, Tricks, and Strategi...
Have you just downloaded Lost Within and are you in need of a guiding hand? While it’s not the toughest of games out there you might still want some helpful tips to get you started. [Read more] | Read more »

Price Scanner via MacPrices.net

Zoho Business Apps for Apple Watch Put Select...
Pleasanton, California based Zoho has launched Zoho Business Apps for Apple Watch, a line of apps that extends Zoho business applications to let users perform select functions from an Apple Watch.... Read more
Universal Stylus Initiative Launched to Creat...
OEMs, stylus and touch controller manufacturers have announced the launch of Universal Stylus Initiative (USI), a new organization formed to develop and promote an industry specification for an... Read more
Amazon Shopping App for Apple Watch
With the new Amazon shopping app for Apple Watch, Amazon customers with one of the wearable devices can simply tap the app on the watch to purchase items in seconds, or save an idea for later. The... Read more
Intel Compute Stick: A New Mini-Computing For...
The Intel Compute Stick, a new pocket-sized computer based on a quad-core Intel Atom processor running Windows 8.1 with Bing, is available now through Intel Authorized Dealers across much of the... Read more
Heal to Launch First One-Touch House Call Doc...
Santa Monica, California based Heal, a pioneer in on-demand personal health care services — will offer the first one-touch, on-demand house call doctor app for the Apple Watch. Heal’s Watch app,... Read more
Mac Notebooks: Avoiding MagSafe Power Adapter...
Apple Support says proper usage, care, and maintenance of Your Mac notebook’s MagSafe power adapter can substantially increase the the adapter’s service life. Of course, MagSafe itself is an Apple... Read more
12″ Retina MacBook In Shootout With Air And P...
BareFeats’ rob-ART morgan has posted another comparison of the 12″ MacBook with other Mac laptops, noting that the general goodness of all Mac laptops can make which one to purchase a tough decision... Read more
FileMaker Go for iPad and iPhone: Over 1.5 Mi...
FileMaker has announced that its FileMaker Go for iPad and iPhone app has surpassed 1.5 million downloads from the iTunes App Store. The milestone confirms the continued popularity of the FileMaker... Read more
Sale! 13-inch 2.7GHz Retina MacBook Pro for $...
 Best Buy has the new 2015 13″ 2.7GHz/128GB Retina MacBook Pro on sale for $1099 – $200 off MSRP. Choose free shipping or free local store pickup (if available). Price for online orders only, in-... Read more
Minimalist MacBook Confirms Death of Steve Jo...
ReadWrite’s Adriana Lee has posted a eulogy for the “Digital Hub” concept Steve Jobs first proposed back in 2001, declaring the new 12-inch MacBook with its single, over-subscribed USB-C port to be... Read more

Jobs Board

*Apple* Client Systems Solution Specialist -...
…drive revenue and profit in assigned sales segment and/or region specific to the Apple brand and product sets. This person will work directly with CDW Account Managers Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Support Technician IV - Jack Henry a...
Job Description Jack Henry & Associates is seeking an Apple Support Technician. This position while acting independently, ensures the proper day-to-day control of Read more
*Apple* Client Systems Solution Specialist -...
…drive revenue and profit in assigned sales segment and/or region specific to the Apple brand and product sets. This person will work directly with CDW Account Managers Read more
*Apple* Software Support - Casper (Can work...
…experience . Full knowledge of Mac OS X and prior . Mac OSX / Server . Apple Remote Desktop . Process Documentation . Ability to prioritize multiple tasks in a fast pace Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.