Here's a PHP regular expression for matching *nested* parentheses (e.g. blocks of code):
((?:[^()]++|\((?1)\))*)
The ?1 is a recursive reference to the regex marked by the outermost parentheses. It is a feature of the PHP regex engine.
See Jeffrey Friedl's Mastering Regular Expressions, 3rd ed., p. 476, "Recursive reference to a set of capturing parentheses".
Another potentially useful regex technique is "named capture":
^(?P<protocol>https?)://(?P<host>[^/:]+)(?::(?P<port>\d+))?
Here you can use either $matches[0], $matches[1], $matches[2] or $matches['protocol'], $matches['host'], $matches['port'].
I didn't know about ?1. Is it also in other languages like Perl or Ruby?
ReplyDeleteWith things like ?1, perhaps regular expressions should now be called context free expressions instead.
-K
Hi Kaushik - Some regex engines have recursive matching. There are a couple of entries in Friedl's book about recursive matching -
ReplyDeletePerl has a "dynamic regex": (??{perl code})
Java had ?1 as an undocumented feature until 1.4.2, after which it was removed.
.NET uses (?<DEPTH>) to achieve something similar.
Interesting! I guess I just didn't venture beyond bare-bones regexes.
ReplyDeleteThanks for your reply!
Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.
ReplyDeletehttp://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html
Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.
Good to know - thanks Wolf!
ReplyDeleteWill recursive regex ever take off? They take the potential of regex for confusion and errors to another dimension. There's an example in this recursive regex tutorial that shows a subtle problem in a recursive expression because of atomic grouping. Anyhow, thanks for spreading the regex "Gospel".
ReplyDelete