Regular Expression resources
Anybody who has attended an Inaport demonstration or training course will know that I am a big fan of regular expressions. For those that have not, here are the highlights:
- Regular expressions are a language for matching patterns in text.
- They let you do literally anything imaginable to data.
- While very powerful, they can be a little intimidating to start with.
- Inaport has full support for regular expressions, using the .NET regex engine.
- Using the Inaport expression editor test capabilities, and preview capabilities in the Profile Editor, you can build up regex’s gradually and test on your data to see the results.
As an example of the power of regex, consider migrating GoldMine to Microsoft Dynamics CRM, SalesLogix, or SageCRM. The notes field in GoldMine history and activities can be formatted as an HTML page; this does not play well with Dynamics et al. The following Inaport expression can be used to strip all html formatting from the notes field, leaving plain text:
snip(find(#NOTES, “(?si)<BODY[^>]*>(.+)</BODY>”), “(?i:<[a-z]+[^>]*>)|(?i:</[a-z\d]+[^>]*>)| ”)
The inner find() extracts all text from <body> … </body> tag i.e. the main body of the page. The snip() function then removes all html <> format tags and non-breaking spaces, leaving plain text.
I leave it as an exercise for the student to work out how to do this without regular expressions.
While Inaport’s capabilities and help are very useful for developing and testing regular expressions, I thought it might be useful to highlight some other resources available on the web.
One of the nicest tools is gSkinner, a very cool Flash based tool for building and testing regular expressions. It lets you paste sample text into a large text box, then build a regex. As you build the regex, portions of the text that match are highlighted; as you roll your mouse over the regex, the elements are highlighted and a small help dialogue pops up to explain each fragment. This makes it much easier to see the effect of changes to the regex as you build it.
Another site with a good tool for building and testing regular expressions is NRegEx. NRegEx uses the .NET regular expression engine, which is the same as Inaport; this means that expressions you test here will work in the same way in Inaport.
RegExLib is a web site dedicated to regular expressions, with tutorials and links to other resources. Most usefully, it has a large library of regular expressions submitted by many different people. There is a search facility that lets you serach by category and keyword, and each regex has brief details, facility to review and add comments, and a test button that lets you test the regex against sample text to see what it does.
RegExAdvice is a community forum dedicated to regular expressions; it is integrated with RegExLib.
Regular-Expressions.info offers tutorials, reference material, and links to many other tools
Inaport uses the regex engine provided by the Microsoft .NET framework. The Inaport help and expression editor contain most of the information you will need on a day to day basis for using regex, but the full Microsoft reference site is here.
InaPlex provides a short training course if you would like to learn or improve your regex skills; the course can be focussed on your data to make it relevant for you. Contact info@inaplex.com if you are interested.