TweetFollow Us on Twitter

Text-based File Formats

Volume Number: 19 (2003)
Issue Number: 2
Column Tag: Section 7

Text-based File Formats


by Rich Morin

BSD and OSX inherit a long tradition (stretching back into the earliest days of Unix) of using text files for data storage. Although there are some exceptions, most control, log, and other system data files are written in ASCII. This makes them easier to inspect, post-process, and even edit.

Apple, whose historic bent has been more toward binary file formats (e.g., the resource fork), seems to have adopted this idea wholeheartedly. In fact, they have gone a bit further, adopting XML (rather than line-oriented files) and Unicode (rather than ASCII). As a result, many OSX files are well structured, language-independent, and quite accessible to both humans and programs.

Many vendors (e.g., Microsoft) are also joining the XML caravan. Assuming that they document both the syntax and semantics of their interchange formats, we could see a dramatic change in possibilities for file interchange.

It is not clear, however, that XML is the Right Answer for all problems. Let's look at some of the alternatives, examining their strengths and weaknesses. Don't expect a comprehensive list; there are zillions of data formats in use. Here, in any event, are some that I would recommend.


Although CSV stands for "comma-separated values", commas are by no means essential to the idea. In fact, another term for this is "flat file format". Basically, the idea is that each line is a record and that some other delimiter (e.g., colons, commas, white space) is used to separate fields. Quotes or other devices are sometimes used to protect instances of the delimiter in the body of a field. Here are some examples:

  15 3 * * * root periodic daily
  Dec 21 17:59:33 cerberus netinfod ...

Some files get a bit complex, adding syntax to support block structure, comments, line breaks, shell commands, etc. A highly-ornamented CSV file can begin to look like a "little language". Here is a fairly complex format, drawn from /var/named/named.local:

$TTL 86400
@ IN SOA localhost. root.localhost. (
    1997022700 ; Serial
    28800      ; Refresh
    14400      ; Retry
    3600000    ; Expire
    86400 )    ; Minimum
  IN NS  localhost.
1 IN PTR localhost.

Many BSD control files, logs, and reports use white space to delineate fields, making the assumption that the included data will not contain spaces. This was never a safe assumption for path names, but the advent of OSX has made the problem all too real.

Try running "ps -axww" and see how man path names (for both commands and arguments) contain spaces. Then, consider how you would code up a way to determine which spaces are field separators and which are not. Not simple...

Of course, ps has other problems. Run "vi 1 '2 3' 4" in one window and "ps" in another. In the ps output, the COMMAND field looks like "vi 1 2 3 4", dooming any effort to parse it into a command path and distinct command-line arguments. Some sort of quoting convention is desperately needed here.

Blanks in path names make it impossible to parse certain log files, have already broken one Apple installer script, and could well foul up many BSD control files. It's probably too late to get Apple to back off from their use of embedded blanks in system path names, so you can expect to see some problems of this nature coming along...


Although CSV is attractively simple, it may be too simple for your needs. For instance, you may need to support optional data, hierarchical structures, etc. On the other hand, you may not be ready for the formality of XML (Extensible Markup Language) and unwilling to design your own markup language (and parser).

OML (Ostensible Markup Language :-) is a powerful, simple, and convenient solution to this dilemma:

# This is a sample of file-system metadata.
  <file>ASCII text</>

This looks quite a bit like XML, but there are some small peculiarities. That comment, for instance, isn't legal XML. Nor is "</>" a legal termination for a tag. Is that really CSV syntax in the middle of some fields? Finally, where are the header lines?

Though OML is seemingly designed to give hives to XML purists, it is also designed to work smoothly with existing XML tools. A couple of lines of Perl will strip out the comments and fill out the terminations. A quick pass through an XML parser extracts all of the named fields and attributes. Perl's split() operator, if need be, can break up the CSV data.

Since OML is pretty much a "roll your own" kind of thing, there isn't any real documentation. I'd suggest a look at "Doing it Simpler" (Leigh Dodds; for some ideas, however.


Despite any appearance to the contrary, I am quite a fan of XML. An enormous amount of meticulous and thoughtful effort (and some rather fancy computer science!) is going into creating "industrial strength" data formats and processing tools. In addition, the W3C (World Wide Web Consortium; is being very careful to make sure that the official standards are open to all players.

For some projects, you really need to bring in power tools. The host of translators, validators, and other tools that XML provides can make otherwise impossible projects feasible, if not necessarily reasonable. Using XML also increases the chance that someone else's program will be able to parse your data. Given all of that, complaining about a bit of formality seems rather petty.

I won't try to cover XML here; there are shelves of books on the subject, with more coming out on a weekly basis. O'Reilly and Addison-Wesley have the broadest coverage; O'Reilly's XML web site ( is a good place to start your journey...


CSV has low overhead, is simple to read and edit, and handles lists and arrays well. OML and XML are a bit more bulky, but handle optional data and hierarchies smoothly. XML has an impressive suite of documentation, standards activities, and support software. Sometimes, however, you want low overhead, simplicity, and support for arbitrary data structures.

YAML (YAML Ain't Markup Language) fills this niche quite admirably. The syntax is simple and clean. The basic data structures are sequences (i.e., Perl arrays) and mappings (i.e., Perl hashes). YAML handles lists, arrays, and hierarchies easily; with a bit of extra work, it can handle arbitrary Perl data structures (e.g., cyclic graphs).

Here is the previous example, transliterated into YAML:

# This is a sample of file-system metadata.
  file:  'ASCII text'
  flags: 'f,avbstclinmed,,'
  lstat: '303507,33200,1,1000,20,914,8'
  md5:   '2e1240f444fc3f984186fc5a4fd28eb0'
  times: '1040087752,230,1740993,10890145'

A more idiomatic rendering, however, would look like:

# This is a sample of file-system metadata.
  file:  ASCII text
  flags: [ f, avbstclinmed, , ]
  lstat: [ 303507, 33200, 1, 1000, 20, 914, 8 ]
  md5:   2e1240f444fc3f984186fc5a4fd28eb0
  times: [ 1040087752, 230, 1740993, 10890145 ]

Aside from the fact that some spaces have been added after commas and the quotes have been eliminated (some turned into brackets), the second version looks very similar to the first. The resulting data structure is quite different, however; the bracketed lists have been turned into YAML sequences. This means that they don't have to be parsed in a follow-on step. Here is some access code, in Perl:

$file = $yaml{snap}{file};
$uid  = $yaml{snap}{lstat}[3];

YAML has several ways to write textual data. Here are some examples:

  - a simple text item
  - "double-quoted text\n "
  - 'single-quoted text'
- >
    This text
    is freeform.
- |
    This text

Although YAML has nowhere near the amount of documentation that XML has, there are some useful resources to recommend. The YAML web site ( is the logical place to start; be sure to visit the YAML wiki. I'd also recommend a look at "Look Ma, No Tags" (Kendall Clark Grant; for an informal introduction.

Rich Morin has been using computers since 1970, Unix since 1983, and Mac-based Unix since 1986 (when he helped Apple create A/UX 1.0). When he isn't writing this column, Rich runs Prime Time Freeware (, a publisher of books and CD-ROMs for the Free and Open Source software community. Feel free to write to Rich at


Community Search:
MacTech Search:

Software Updates via MacUpdate

Civilization VI 1.1.0 - Next iteration o...
Sid Meier’s Civilization VI is the next entry in the popular Civilization franchise. Originally created by legendary game designer Sid Meier, Civilization is a strategy game in which you attempt to... Read more
Network Radar 2.3.3 - $17.99
Network Radar is an advanced network scanning and managing tool. Featuring an easy-to-use and streamlined design, the all-new Network Radar 2 has been engineered from the ground up as a modern Mac... Read more
Quicken 5.5.6 - Complete personal financ...
Quicken makes managing your money easier than ever. Whether paying bills, upgrading from Windows, enjoying more reliable downloads, or getting expert product help, Quicken's new and improved features... Read more
Civilization VI 1.1.0 - Next iteration o...
Sid Meier’s Civilization VI is the next entry in the popular Civilization franchise. Originally created by legendary game designer Sid Meier, Civilization is a strategy game in which you attempt to... Read more
Network Radar 2.3.3 - $17.99
Network Radar is an advanced network scanning and managing tool. Featuring an easy-to-use and streamlined design, the all-new Network Radar 2 has been engineered from the ground up as a modern Mac... Read more
Printopia 3.0.8 - Share Mac printers wit...
Run Printopia on your Mac to share its printers to any capable iPhone, iPad, or iPod Touch. Printopia will also add virtual printers, allowing you to save print-outs to your Mac and send to apps.... Read more
ForkLift 3.2.1 - Powerful file manager:...
ForkLift is a powerful file manager and ferociously fast FTP client clothed in a clean and versatile UI that offers the combination of absolute simplicity and raw power expected from a well-executed... Read more
BetterTouchTool 2.417 - Customize multi-...
BetterTouchTool adds many new, fully customizable gestures to the Magic Mouse, Multi-Touch MacBook trackpad, and Magic Trackpad. These gestures are customizable: Magic Mouse: Pinch in / out (zoom... Read more
Little Snitch 4.0.6 - Alerts you about o...
Little Snitch gives you control over your private outgoing data. Track background activity As soon as your computer connects to the Internet, applications often have permission to send any... Read more
Google Chrome 65.0.3325.181 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more

Latest Forum Discussions

See All

The best games that came out for iPhone...
It's not a huge surprise that there's not a massive influx of new, must-buy games on the App Store this week. After all, GDC is happening, so everyone's busy at parties and networking and dying from a sinister form of jetlag. That said, there are... | Read more »
Destiny meets its mobile match - Everyth...
Shadowgun Legends is the latest game in the Shadowgun series, and it's taking the franchise in some interesting new directions. Which is good news. The even better news is that it's coming out tomorrow, so if you didn't make it into the beta you... | Read more »
How PUBG, Fortnite, and the battle royal...
The history of the battle royale genre isn't a long one. While the nascent parts of the experience have existed ever since players first started killing one another online, it's really only in the past six years that the genre has coalesced into... | Read more »
Around the Empire: What have you missed...
Oh hi nice reader, and thanks for popping in to check out our weekly round-up of all the stuff that you might have missed across the Steel Media network. Yeah, that's right, it's a big ol' network. Obviously 148Apps is the best, but there are some... | Read more »
All the best games on sale for iPhone an...
It might not have been the greatest week for new releases on the App Store, but don't let that get you down, because there are some truly incredible games on sale for iPhone and iPad right now. Seriously, you could buy anything on this list and I... | Read more »
Everything You Need to Know About The Fo...
In just over a week, Epic Games has made a flurry of announcements. First, they revealed that Fortnite—their ultra-popular PUBG competitor—is coming to mobile. This was followed by brief sign-up period for interested beta testers before sending out... | Read more »
The best games that came out for iPhone...
It's not been the best week for games on the App Store. There are a few decent ones here and there, but nothing that's really going to make you throw down what you're doing and run to the nearest WiFi hotspot in order to download it. That's not to... | Read more »
Death Coming (Games)
Death Coming Device: iOS Universal Category: Games Price: $1.99, Version: (iTunes) Description: --- Background Story ---You Died. Pure and simple, but death was not the end. You have become an agent of Death: a... | Read more »
Hints, tips, and tricks for Empires and...
Empires and Puzzles is a slick match-stuff RPG that mixes in a bunch of city-building aspects to keep things fresh. And it's currently the Game of the Day over on the App Store. So, if you're picking it up for the first time today, we thought it'd... | Read more »
What You Need to Know About Sam Barlow’s...
Sam Barlow’s follow up to Her Story is #WarGames, an interactive video series that reimagines the 1983 film WarGames in a more present day context. It’s not exactly a game, but it’s definitely still interesting. Here are the top things you should... | Read more »

Price Scanner via

Thursday roundup of the best 13″ MacBook Pro...
B&H Photo has new 2017 13″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only. Their prices are the lowest available for... Read more
Sale: 9.7-inch 2017 WiFi iPads starting at $2...
B&H Photo has 9.7″ 2017 WiFi Apple iPads on sale for $40 off MSRP for a limited time. Shipping is free, and pay sales tax in NY & NJ only: – 32GB iPad WiFi: $289, $40 off – 128GB iPad WiFi: $... Read more
Roundup of Certified Refurbished iPads, iPad...
Apple has Certified Refurbished 9.7″ WiFi iPads available for $50-$80 off the cost of new models. An Apple one-year warranty is included with each iPad, and shipping is free: – 9″ 32GB WiFi iPad: $... Read more
Back in stock! Apple’s full line of Certified...
Save $300-$300 on the purchase of a 2017 13″ MacBook Pro today with Certified Refurbished models at Apple. Apple’s refurbished prices are the lowest available for each model from any reseller. A... Read more
Wednesday deals: Huge sale on Apple 15″ MacBo...
Adorama has new 2017 15″ MacBook Pros on sale for $250-$300 off MSRP. Shipping is free, and Adorama charges sales tax in NJ and NY only: – 15″ 2.8GHz Touch Bar MacBook Pro Space Gray (MPTR2LL/A): $... Read more
Apple offers Certified Refurbished Series 3 A...
Apple has Certified Refurbished Series 3 Apple Watch GPS models available for $50, or 13%, off the cost of new models. Apple’s standard 1-year warranty is included, and shipping is free. Numerous... Read more
12″ 1.2GHz Space Gray MacBook on sale for $11...
B&H Photo has the Space Gray 12″ 1.2GHz MacBook on sale for $100 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 12″ 1.2GHz Space Gray MacBook: $1199 $... Read more
Mac minis available for up to $150 off MSRP w...
Apple has restocked Certified Refurbished Mac minis starting at $419. Apple’s one-year warranty is included with each mini, and shipping is free: – 1.4GHz Mac mini: $419 $80 off MSRP – 2.6GHz Mac... Read more
Back in stock: 13-inch 2.5GHz MacBook Pro (Ce...
Apple has Certified Refurbished 13″ 2.5GHz MacBook Pros (MD101LL/A) available for $829, or $270 off original MSRP. Apple’s one-year warranty is standard, and shipping is free: – 13″ 2.5GHz MacBook... Read more
Apple restocks Certified Refurbished 2017 13″...
Apple has Certified Refurbished 2017 13″ MacBook Airs available starting at $849. An Apple one-year warranty is included with each MacBook, and shipping is free: – 13″ 1.8GHz/8GB/128GB MacBook Air (... Read more

Jobs Board

Payments Counsel - *Apple* Pay (payments, c...
# Payments Counsel - Apple Pay (payments, credit/debit) Job Number: 112941729 Santa Clara Valley, California, United States Posted: 26-Feb-2018 Weekly Hours: 40.00 Read more
Firmware Engineer - *Apple* Accessories - A...
# Firmware Engineer - Apple Accessories Job Number: 113452350 Santa Clara Valley, California, United States Posted: 28-Feb-2018 Weekly Hours: 40.00 **Job Summary** Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 113501424 Norman, Oklahoma, United States Posted: 15-Feb-2018 Weekly Hours: 40.00 **Job Summary** Are you passionate about Read more
*Apple* Inc. Is Look For *Apple* Genius Te...
Apple Inc. Is Look For Apple Genius Technical Customer Service Minneapolis Mn In Minneapolis - Apple , Inc. Apple Genius Technical Customer Service Read more
*Apple* Genius Technical Customer Service Co...
Apple Genius Technical Customer Service Columbus Oh Apple Inc. - Apple , Inc. Apple Genius Technical Customer Service Columbus Oh - Apple , Inc. Job Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.