Thoughts and links about programming, by

Regular Expression Article #2
Posted on Thursday, December 10, 2009.

In January 2007 I posted an article on my web site titled “Regular Expression Matching Can Be Simple And Fast.” I intended this to be the first of three; the second would explain how to do submatching using automata, and the third would explain how to make a really fast DFA. I got distracted for a few years but have finally finished the second article.

Simple (easy to follow, I hope) code demonstrating the techniques from the article can be found at

(Comments originally posted via Blogger.)

  • Tobias Svensson (December 10, 2009 12:06 PM) Great write up! I'm happy to see that you're posting regularly again, as I have always greatly enjoyed your blog.

  • tim.henderson (December 10, 2009 7:57 PM) I hope you do write your third post, I found your first two to be among the most readable explanations on this topic I have encountered.

  • rog peppe (December 11, 2009 6:17 AM) beautifully clear article, thanks.
    BTW, i think this paragraph might still be in its nascent stage: "If have other criteria can compare submatch sets and choose the better one. The Eighth Edition Unix library used leftmost longest to match DFA-based tools like awk and egrep."

  • Marius Gedminas (December 11, 2009 6:41 AM) A very interesting article, thank you!

    One note: "Character classes are typically implemented as a special VM instruction rather than expanding them into alternations, just as we did for dot above." refers to something that doesn't exist in the text.

  • Russ Cox (December 11, 2009 8:30 AM) Thanks everyone; I've corrected these mistakes and a couple others.

    I also added a bit more about POSIX submatching; thanks to Chris Kuklewicz.

  • Matt Doar (December 15, 2009 9:56 AM) The article is the best around - well written, obviously carefully edited. Great job! Looking forward to #3


  • Jonathan (December 28, 2009 12:58 AM) Thanks for the article about little virtual machines.

    A few weeks back I implemented a little virtual machine for a processing task. The close resemblance of the two virtual machines makes me feel vindicated.