Skip to content

Regular Expression resources

Anybody who has attended an Inaport demonstration or training course will know that I am a big fan of regular expressions. For those that have not, here are the highlights:

  • Regular expressions are a language for matching patterns in text.
  • They let you do literally anything imaginable to data.
  • While very powerful, they can be a little intimidating to start with.
  • Inaport has full support for regular expressions, using the .NET regex engine.
  • Using the Inaport expression editor test capabilities, and preview capabilities in the Profile Editor, you can build up regex’s gradually and test on your data to see the results.

As an example of the power of regex, consider migrating GoldMine to Microsoft Dynamics CRM, SalesLogix, or SageCRM. The notes field in GoldMine history and activities can be formatted as an HTML page; this does not play well with Dynamics et al. The following Inaport expression can be used to strip all html formatting from the notes field, leaving plain text:

snip(find(#NOTES, “(?si)<BODY[^>]*>(.+)</BODY>”), “(?i:<[a-z]+[^>]*>)|(?i:</[a-z\d]+[^>]*>)|&nbsp;”)

The inner find() extracts all text from <body> … </body> tag i.e. the main body of the page. The snip() function then removes all html <> format tags and non-breaking spaces, leaving plain text.

I leave it as an exercise for the student to work out how to do this without regular expressions. 

While Inaport’s capabilities and help are very useful for developing and testing regular expressions, I thought it might be useful to highlight some other resources available on the web.

One of the nicest tools is gSkinner, a very cool Flash based tool for building and testing regular expressions. It lets you paste sample text into a large text box, then build a regex. As you build the regex, portions of the text that match are highlighted; as you roll your mouse over the regex, the elements are highlighted and a small help dialogue pops up to explain each fragment. This makes it much easier to see the effect of changes to the regex as you build it.

Another site with a good tool for building and testing regular expressions is NRegEx. NRegEx uses the .NET regular expression engine, which is the same as Inaport; this means that expressions you test here will work in the same way in Inaport.

RegExLib is a web site dedicated to regular expressions, with tutorials and links to other resources. Most usefully, it has a large library of regular expressions submitted by many different people. There is a search facility that lets you serach by category and keyword, and each regex has brief details, facility to review and add comments, and a test button that lets you test the regex against sample text to see what it does.

RegExAdvice is a community forum dedicated to regular expressions; it is integrated with RegExLib.

Regular-Expressions.info offers tutorials, reference material, and links to many other tools

Inaport uses the regex engine provided by the Microsoft .NET framework. The Inaport help and expression editor contain most of the information you will need on a day to day basis for using regex, but the full Microsoft reference site is here.

InaPlex provides a short training course if you would like to learn or improve your regex skills; the course can be focussed on your data to make it relevant for you. Contact info@inaplex.com if you are interested.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: