TweetFollow Us on Twitter

Adding Regular Expressions To Your Cocoa Application.

Volume Number: 19 (2003)
Issue Number: 4
Column Tag: Cocoa Development

Adding Regular Expressions To Your Cocoa Application.

Using MOKit to add the ability to match regular expressions in Cocoa.

by Ron Davis

Does your application need to parse data out of a bunch of text, or match strings that can vary some, but have a regular syntax? Do you have a Find command in your text editor? If you do you need to add regular expression matching to your app. Regular Expressions are textual representations of strings match pattern. They go beyond just finding a string and let you do things like find a string that begins and ends with certain characters, but can have anything in the middle. Or a string that contain four numbers followed by a letter.

I've been around the Mac a long time and never really thought about grep or regex or other commands that use regular expressions. But OSX changes that. Every UNIX geek out there knows about grep and it various offspring. Scripting languages like Perl use regular expressions as well, so I thought I needed to learn about them. Once I did I was hooked, and wanted to use them in my own applications. That lead me to Mike Ferris' MOKit, a Cocoa framework that lets you easily deal with regular expressions in your application.

Introduction to Regular Expressions

We'll start with a quick look at regular expression syntax for those of you who have no idea what I'm talking about. The introduction will be fast and shallow. If you need more information check out the URL in the Bibliography at the end of the article.

  
Symbol         Meaning                       Example
character      The character typed,          A is a, b is b, etc.
               with the exception of 
               special characters.

[character -   Any of a range of .           [a-d] = a,b,c, or d.
 character]    characters
 
.              Period matches any one 
               character, except line 
               breaks.
               
#              Matches any digit.            0,1,2,3,4,5,6,7,8,9 
   
\r             return

\t             tab 

\              The escape character like     \. matches a period. 
               in printf. Putting a slash    \\ matches a slash.
               in front of a special 
               character allows that 
               character to be matched.
               
?              0 or 1 of the previous .      ca?t, matches cat, or ct,
               characters                    but not caat.

*              0 or more of the              ca*t, matches ct, cat, 
               previous characters           caat, caaat.
               
+              1 or more of the              ca+t, matches cat, caat,
               previous characters           caaat, but not ct.
               
^              any character but the         (^r23) any character 
               ones after the carat.         but r, 2, or 3.
   
pattern |      match pattern or pattern.     ca|t, matches ca or t, 
pattern                                      but not cat.

(pattern)      Matching: treats what is      (ca)*t, matches cat, or 
               in the parenthesis as a       cacat, but not ct.
               single character.   c(*?)t,   on string coat, 
               Searching: delineates the     returns "oa".
               information to be 
               remembered in a find.

The last pattern there gives you a hint that regular expression can be used in two different ways. One way is matching, where you have a string and you want to know if it is equal to a regular expression. This returns a Boolean value, either the string matches or it doesn't. The other way to use regular expressions is to find a substring or strings in a longer string. When you do this you give an expression and you specify what part of the matched string you want back by placing that part in parentheses.

Let's look at an example or two. Say you let the user input a seven digit zip code and you want to make sure they didn't put any letters in there. You could get their input string and compare it against the regular expression "#+", which matches 1 or more digits, but wouldn't match an empty string, nor one with letters in it.

Now say you have an HTML tag for a link like <A HREF=http://www.radproductions.net/>RAD productions</A> and you wanted to pull out the URL. You could search with the regular expression "=(.*?)>" and you would get back http://www.radproductions.net. You may wonder why the ? is there. If you just put ".*", which means match 0 or more characters, you get to the end of the string because quotes and brackets are characters too. This is called a greedy search. Putting the ? tells it to only search until it finds the next part of the expression string.

MOKit

MOKit is a Cocoa framework written by Mike Ferris. It contains some text manipulation classes, one of which handles regular expressions. The underlying regular expression engine is actually a standard package written by Henry Spencer and used in one form or another by a lot of interesting things such as tcl and perl. MOKit classes are "not public domain, but they are free" according to the web page. The code can be downloaded at http://www.lorax.com/FreeStuff/MOKit.html. You can get both compiled frameworks and the source to MOKit. Version 2.6 was used for this article.

MOKit has two main parts, classes for text completion and classes for regular expressions. We'll only be talking about the regular expression classes here. These classes are MORegularExpression and MORegexFormatter. MORegularExpression is the main class for handling the evaluation of regular expressions. It is the one we'll use in our sample code. Here's its declaration.

Listing 1: MORegularExpression interface.

@interface MORegularExpression : NSObject <NSCopying, NSCoding> {
  @private
    NSString *_expressionString;
    NSString *_lastMatch;
    NSRange _lastSubexpressionRanges 
                           [MO_REGEXP_MAX_SUBEXPRESSIONS];
    void *_compiledExpression;
    BOOL _ignoreCase;
}
+ (BOOL)validExpressionString:(NSString *)expressionString;
+ (id)regularExpressionWithString:(NSString *)
               expressionString ignoreCase:(BOOL)ignoreCaseFlag;
+ (id)regularExpressionWithString:(NSString *)
               expressionString;
- (id)initWithExpressionString:(NSString *)expressionString
                ignoreCase:(BOOL)ignoreCaseFlag;
    
- (id)initWithExpressionString:(NSString *)
               expressionString;
- (NSString *)expressionString;
- (BOOL)matchesString:(NSString *)candidate;
- (NSRange)rangeForSubexpressionAtIndex:(unsigned)index
                inString:(NSString *)candidate;
- (NSString *)substringForSubexpressionAtIndex:
               (unsigned)index inString:(NSString *)candidate;
- (NSArray *)subexpressionsForString:(NSString *)candidate;
@end

As you can see, it is a fairly simple class. To use a regular expression in your code you create an instance of this class. If you need to keep it around, using the initWithExpressionString methods will probably be easiest. If you're just going to use it in the scope of a single method, use the class methods regularExpressionWithString, so you won't have to deal with releasing. Both of these methods have twins that take an ignoreCase parameter which, if set to YES, will cause evaluations to ignore the case of the characters in the expression and the search string. If you don't explicitly set case sensitivity then searches are case sensitive. Here's an example of how to create an expression for finding HREFs in a string of HTML:

MORegularExpression*   linkURLExp = [MORegularExpression regularExpressionWithString: 
                                    @"<A HREF=.*?</A>" ignoreCase:YES];

If you want to make sure the expression you create is valid you can call the class method validExpressionString, which will return YES if the expression is a valid regular expression. If you want to know what an MORegularExpression object's expression is you can get it from the expressionString accessor.

Now we can actually do some evaluations. As I said previously, there are two ways to use regular expressions, to match a string and to find a sub-string. If you have a string and you want to make sure it conforms to the regular expression you created, you can pass it into matchesString and the result will tell you if it matches. This is what MORegexFormatter does. It is a formatter you can add to a field and it will validate the value in that field by the regular expression you give it.

Getting sub-expressions is interesting. If you just want to find the location in the target string of a sub-string, you can use the rangeForSubexpressionAtIndex method. If you want the whole sub-string back as a new NSString* you use the substringForSubexpressionAtIndex, passing the string you are searching for in the inString parameter. The index is which value in parentheses you want back. There can be 0 to 20 sets of parentheses in a MOKit expression, and the index indicates which one you want the range for. So you could create an expression like "<A HREF=(.*?)>(.*?)</A>" to search for a link in an HTML page. If we used the HTML in Listing 2, and you asked for index 0 you would get the whole HREF tag: "<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>". If you asked for index 1, you'd get the link back "http://www.radproductions.net/". If you asked for index 2, you'd get back the text "R.A.D. Productions".

Listing 2: Sample HTML

<HTML>
<TITLE>R.A.D. Productions Home Page</TITLE>
<BODY>
<A HREF=http://www.radproductions.net/>R.A.D. Productions</A>
</BODY>
</HTML>

In a nutshell, that is all there is to finding sub-strings with MORegularExpression. The last method in the interface, subexpressionsForString, is there for backwards compatibility and I'm not even going to explain it.

There is one tricky thing about using MORegularExpression in a large amount of text. What happens if you want to find every link in an HTML page? substringForSubexpressionAtIndex is only going to return the first occurrence in the string. Turns out there is no way to say, start searching at character n in the candidate string. What I did was truncate the string after each search to find the next one. Here's my code to find all of the links and their URL in an HTML page.

Listing 3: Finding all of the links.

-(void)handleHTML:(NSString*)inHTML
{
   MORegularExpression*   bothExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<A HREF=(.*?)>(.*?)</A>" 
                        ignoreCase:YES];
   
   MORegularExpression*   startStopExp = 
                        [MORegularExpression 
                        regularExpressionWithString:
                        @"<HTML>(.*?)</HTML>"];
   NSString*            result = nil;
   NSRange               range;
   NSString*            curString = [startStopExp 
                        substringForSubexpressionAtIndex:1
                        inString:inHTML];
   
   do 
   {
      range = [bothExp rangeForSubexpressionAtIndex:0
                     inString:curString ];
      if ( range.length > 0 )
         {
         NSString*   URLString;
         NSString*   linkString;
         NSURL*      fullURL;
         
         result = [linkURLExp 
                        substringForSubexpressionAtIndex:0
                        inString:curString];
         URLString = [bothExp 
                        substringForSubexpressionAtIndex:1
                        inString:curString];
         fullURL = [NSURL URLWithString:URLString 
                        relativeToURL:baseURL];
         URLString = [fullURL absoluteString];
         
         linkString = [bothExp
                        substringForSubexpressionAtIndex:2
                        inString:curString];
         if ( linkString == nil || 
               URLString == nil || 
               ([linkString length]== 0) || 
                     ([URLString length]== 0) )
            {} else 
            {
            [self addURL:URLString withText:linkString];
            }
         curString = [curString substringFromIndex:
                        (range.location + range.length)];
         }
   }
   while ([curString length] > 0 && 
               range.location != NSNotFound );
}

A little explanation. The method is in a class that has a method addURL. The class also keeps two arrays, one for URLs and one for the link text. When you call addURL the URL and the link string are added to the arrays for future reference. The class also knows what the URL of the page you are parsing is, and saves it in a variable called baseURL.

The first thing the method does is set up our regular expression for links. Then it makes a new string that will contain only the text between the <HTML> tag. You can use this to limit the search to just a certain part of the page. Then it sets up a loop, which will always execute once and will end when we don't get anything back from our search, or we run out of HTML to parse. Inside the loop we first try to find our expression's range in the HTML. If it isn't there, were done. If we find something, then we use our expression to get the sub-string for the URL. Some times a URL will be relative, so we use NSURL with the page's URL to create a full URL. Then we ask for the second index, which is the link text. If we get both, we add it to our list.

If we find something, then we need to search from the end of the string we found. So we create a sub-string from our current HTML string, that starts at the end of what we found and ends at the end of the current string. This effectively chops off everything from the beginning of the string to the end of what we just found. Then we loop.

Hopefully you've seen the coolness of regular expressions and want to use them in your Cocoa apps. MOKit makes this easy and is easy to use. So go to Mike Ferris' website and download it and add regular expressions to your app.

Bibliography

Mastering Regular Expressions, Jeffrey E. F. Friedl,

http://www.ora.com/catalog/regex2/

Using Regular Expressions, Stephen Ramsay,

http://etext.lib.virginia.edu/helpsheets/regex.html

Regular Expressions specification,

http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

A Tao of Regular Expressions, http://sitescooper.org/tao_regexps.html

BBEdit Grep Tutorial, http://www.anybrowser.org/bbedit/grep.shtml


Ron Davis is a long time Mac programmer, having worked on everything from Virex Anti-Virus to CodeWarrior. His day job is working for Alsoft, and his evening job is R.A.D. Productions, makers of Suck It Down and FinderEye.

 
AAPL
$108.00
Apple Inc.
+1.02
MSFT
$46.95
Microsoft Corpora
+0.90
GOOG
$559.08
Google Inc.
+8.77

MacTech Search:
Community Search:

Software Updates via MacUpdate

Vitamin-R 2.20b1 - Personal productivity...
Vitamin-R creates the optimal conditions for your brain to work at its best by structuring your work into short bursts of distraction-free, highly focused activity alternating with opportunities for... Read more
Dropbox 2.10.44 - Cloud synchronization...
Dropbox is an application that creates a special Finder folder that automatically syncs online and between your computers. It allows you to both backup files and keep them up-to-date between systems... Read more
Sandvox 2.9.2 - Easily build eye-catchin...
Sandvox is for Mac users who want to create a professional looking website quickly and easily. With Sandvox, you don't need to be a Web genius to build a stylish, feature-rich, standards-compliant... Read more
Cocktail 8.0.1 - General maintenance and...
Cocktail is a general purpose utility for OS X that lets you clean, repair and optimize your Mac. It is a powerful digital toolset that helps hundreds of thousands of Mac users around the world get... Read more
LibreOffice 4.3.3.2 - Free Open Source o...
LibreOffice is an office suite (word processor, spreadsheet, presentations, drawing tool) compatible with other major office suites. The Document Foundation is coordinating development and... Read more
VMware Fusion 7.0.1 - Run Windows apps a...
VMware Fusion allows you to create a Virtual Machine on your Mac and run Windows (including Windows 8.1) and Windows software on your Mac. Run your favorite Windows applications alongside Mac... Read more
OneNote 15.3.2 - Free digital notebook f...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
Audio Hijack Pro 2.11.4 - Record and enh...
Audio Hijack Pro drastically changes the way you use audio on your computer, giving you the freedom to listen to audio when you want and how you want. Record and enhance any audio with Audio Hijack... Read more
Iridient Developer 3.0.0 beta 3 - Powerf...
Iridient Developer (was RAW Developer) is a powerful image conversion application designed specifically for OS X. Iridient Developer gives advanced photographers total control over every aspect of... Read more
TextWrangler 4.5.11 - Free general purpo...
TextWrangler is the powerful general purpose text editor, and Unix and server administrator's tool. Oh, and also, like the best things in life, it's free. TextWrangler is the "little brother" to... Read more

Latest Forum Discussions

See All

Monster Flash Review
Monster Flash Review By Jordan Minor on October 31st, 2014 Our Rating: :: ALONE IN THE DARKUniversal App - Designed for iPhone and iPad Solid shooting and a surprising amount of spooky tension make Monster Flash a great portable... | Read more »
Retry Review
Retry Review By Rob Thomas on October 31st, 2014 Our Rating: :: SOARING HIGHUniversal App - Designed for iPhone and iPad Flappy who? Let Retry wash all those bad bird-related memories away on a cool retro-flavored flight… right... | Read more »
Dementia: Book of the Dead Review
Dementia: Book of the Dead Review By Lee Hamlet on October 31st, 2014 Our Rating: :: A TOUGH READUniversal App - Designed for iPhone and iPad A witch hunter is sent after a demonic book in the spooky but short-lived Dementia: Book... | Read more »
Card Dungeon, the Semi-Board Game Roguel...
Card Dungeon, the Semi-Board Game Roguelike, Has Been Renovated Posted by Jessica Fisher on October 31st, 2014 [ permalink ] | Read more »
Logitech Protection + Power iPhone5/5S C...
Made by: Logitech Price: $99.99 Hardware/iOS Integration Rating: 3 out of 5 stars Usability Rating: 0.5 out of 5 stars Reuse Value Rating: 0.75 out of 5 stars Build Quality Rating: 0.75 out of 5 stars Overall Rating: 1.25 out of 5 stars | Read more »
This Is Not a Test Goes Free, Permanentl...
This Is Not a Test Goes Free, Permanently Posted by Jessica Fisher on October 31st, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Swap Heroes Review
Swap Heroes Review By Campbell Bird on October 31st, 2014 Our Rating: :: STRATEGIC SWAPPINGUniversal App - Designed for iPhone and iPad Rotate a cast of heroes to fend of waves of monsters in this difficult, puzzle rpg.   | Read more »
Night Sky Pro™ (Reference)
Night Sky Pro™ 3.0.1 Device: iOS Universal Category: Reference Price: $2.99, Version: 3.0.1 (iTunes) Description: Night Sky Pro™Wonder No More™ Night Sky Pro™ is the ultimate stargazing experience. From the creators of the original... | Read more »
Audio Defence : Zombie Arena (Games)
Audio Defence : Zombie Arena 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: A zombie shooter audio game. Made from gut-wrenching 3D binaural sound, for a new kind of weird immersion. You... | Read more »
RPG Asdivine Hearts (Games)
RPG Asdivine Hearts 1.1.0 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1.0 (iTunes) Description: SPECIAL PRICE50% OFF (USD 7.99 -> USD 3.99)!!! Travel alongside four companions and a cat in the adventure of a... | Read more »

Price Scanner via MacPrices.net

Tablets Ascendent Again; Global Tablet Market...
The worldwide tablet grew 11.5% year over year in the third quarter of 2014 (3Q14) with shipments reaching 53.8 million units according to preliminary data from the International Data Corporation (... Read more
OWC Unveils New 2.0TB Option for Mercury On-T...
Other World Computing (OWC) has announced a new 2.0TB option today offering 33% more capacity for its OWC Mercury On-The-Go Pro bus-powered portable storage solution. Pocket-Size and Bus-Powered –... Read more
Apple now offering refurbished 2014 13-inch R...
The Apple Store is now offering Apple Certified Refurbished 2014 13″ Retina MacBook Pros for up to $270 off the cost of new models. An Apple one-year warranty is included with each model, and... Read more
Apple Regains Momentum As Windows Stutters An...
The latest smartphone sales data from Kantar Worldpanel ComTech, for the three months to March 2014, shows Apple performing strongly in the first quarter of the year, with sales bouncing back in... Read more
Worldwide Smartphone Shipments Increase 25.2%...
New smartphone releases and an increased emphasis on emerging markets drove global smartphone shipments above 300 million units for the second consecutive quarter, according to preliminary data from... Read more
Apple now offering refurbished 2014 15-inch M...
The Apple Store is now offering Apple Certified Refurbished 2014 15″ Retina MacBook Pros for up to $400 off the cost of new models. An Apple one-year warranty is included with each model, and... Read more
Apple drops prices on refurbished 2013 Retina...
The Apple Store has dropped prices on 2013 Apple Certified Refurbished 13″ and 15″ Retina MacBook Pros, with Retina models now available starting at $999. Apple’s one-year warranty is standard, and... Read more
New 2.8GHz Mac mini on sale for $949, save $5...
Abt Electronics has the new 2.8GHz Mac mini in stock and on sale for $949.05 including free shipping. Their price is $50 off MSRP, and it’s the lowest price available for this model from any reseller... Read more
Sale! 3.7GHz Quad Core Mac Pro available for...
 B&H Photo has the 3.7GHz Quad Core Mac Pro on sale for $2649 including free shipping plus NY sales tax only. Their price is $350 off MSRP, and it’s the lowest price for this model from any... Read more
Mujjo Steps Up The Game With Refined Touchscr...
Netherlands based Mujjo have just launched their Refined Touchscreen Gloves, stepping up their game. The gloves feature a updated elegant design that takes these knitted gloves to the next level. A... Read more

Jobs Board

Position Opening at *Apple* - Apple (United...
…Summary** As a Specialist, you help create the energy and excitement around Apple products, providing the right solutions and getting products into customers' hands. You Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** Being a Business Manager at an Apple Store means you're the catalyst for businesses to discover and leverage the power, ease, and flexibility of Apple Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** As more and more people discover Apple , they visit our stores seeking ways to incorporate our products into their lives. It's your job, as a Store Read more
Position Opening at *Apple* - Apple (United...
**Job Summary** At the Apple Store, you connect business professionals and entrepreneurs with the tools they need in order to put Apple solutions to work in their Read more
Solutions Specialist with *Apple* Knowledge...
Company Description: We are an Apple Authorized Sales and Service Provider. We have been selling and servicing Apple computers in the Fairfield County area for over Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.