Wednesday, June 16, 2004

Regular Expression

The situation

Regular Expressions are worth their weight in gold. I use them daily and can pretty much think in Regex when needed. But what about devs who have never seen one and need to create an expression right now? Sure, one could go to RegexLib and grab any of the hundreds of great expressions freely available. However, what if a custom expression is needed? I propose, and am in the process of developing, an easy to use, result based solution for creating regular expressions. No previous knowledge necessary.

What do I mean by result based? Excuse me while I grab my easel... I need a regular expression to validate a credit card. Simple enough, right? It might look something like this: ^5[1-5][\d]{14}$ (from the begging of the line, any value that starts with 51 through 55 and follows with 14 digits). Simple, right? What about dashes and spaces? Okay, how about: ^5[1-5][\d]{2}[\- ]*[\d]{4}[\- ]*[\d]{4}[\- ]*[\d]{4}$. We are starting to get a little more complex (we can optimize this but I will save that for another day). I would like a solution whereby I could enter "5411-1111-1111-1111" on the IDE canvas and get the pattern from the needed result.

Is there a way to represent a pattern visually without the visual representation becoming more complicated than the pattern itself?

Current tool box

I typically use Expresso for my quick and dirty regexes. I also sometimes use The Regulator  (which now has an intellisense pop-up. w00t!) for more complicated ones. In the super user category I put The Regex Coach. It is a great program for seeing the parse tree and having a "debugger" type interaction with the FSM. However, I already know how to write regular expressions and these applications are designed with that in mind.

What about existing visiual editors? There is really only one program I am aware of and it seems to be on the right track: The KDE RegExp Editor for linux with KDE (also available for Win32). This program basically visually represents a regular expression with a series of nested containers (groups and character classes). It's the most "visual" editor I've found for regex editing (although buggy on the Win32 edition It uses GTK+). In all the various ideas I've come up with (so far), I'm starting to wonder if this is the best visual option. My only beef with this solution is that is is designed for editing the pattern in a visual way; not visually designing the pattern.

YAREE (Yet Another Regular Expression Editor)

I've spent my idle moments (commuting to and from work, shopping with Kristin, watching TV) trying to come up with what I feel would be an easy and intuitive regex editor. I seem to always hit a road block when trying to decide the best way to visually represent a pattern. I don't want to rehash programs like The Regulator, The Regex Coach, or KDE's RegExp Editor (although I do want to incorporate many of the great ideas these programs exhibit). Instead, I want to focus on the end result. I want to visually work with the data I'm trying to match or split on. I want to abstract away the character classes and endless groupings.

I've run into hurdles while trying theorizing this utility. Below are a few hurdles I encountered:

  • Verboseness of expressions - Regex patterns are verbose in nature. Coming up with a way to express this verboseness visually is a challenge. I feel like this is a major bottleneck.
  • Lookaheads/lookbehinds/repeating/nested tokens - Representing things like nested double quotes and repeating values are going to be hard. This ties back in with the verboseness of the language. I think there are certain constructs that can be used as a pattern for repeating and nesting but the pattern may need to be tweaked by hand.
  • Optimizations
  • Plug-in architecture

I will think about this some more over the next few days. I know there is a way to do this. I just can't seem to see the forest for the trees.

Grace and peace to you,


Post a Comment

<< Home