OCLUG, 2009-05-05, Richard Guy Briggs, REGEXes

rgb@tricolour.net, http://tricolour.net

What is a REGEX?

From Wikipedia:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.


Why is this useful?

REGEXes can be used for searching text for items of interest to isolate it from a large volume of text. They can also be used in search and replace patterns for automating the replacement of specific patterns of text. There are Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE) with their differences depending on the environment. BREs need some special characters escaped.


What is some of the history?

In the 1950s, mathematician Stephen Kleene came up with the notation for "regular sets". UNIX founder Ken Thompson later incorporated this notation first into the QED editor and subsequently into the ed editor and the seperate command "grep" which gets its name from the ed regular expression search command g/re/p (global / regular expression / print).

The first free REGEX library was written by Henry Spencer in 1986. It is the basis for most of the REGEX implementations to which you would be exposed.

PerlRE was derived from it and extended.

PCRE library developed by Philip Hazel for EXIM, intended to closely mimic this extended functionality of PerlRE.


Concepts: