From Wikipedia:
In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
REGEXes can be used for searching text for items of interest to isolate it from a large volume of text. They can also be used in search and replace patterns for automating the replacement of specific patterns of text. There are Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE) with their differences depending on the environment. BREs need some special characters escaped.
In the 1950s, mathematician Stephen Kleene came up with the notation for "regular sets". UNIX founder Ken Thompson later incorporated this notation first into the QED editor and subsequently into the ed editor and the seperate command "grep" which gets its name from the ed regular expression search command g/re/p (global / regular expression / print).
The first free REGEX library was written by Henry Spencer in 1986. It is the basis for most of the REGEX implementations to which you would be exposed.
PerlRE was derived from it and extended.
PCRE library developed by Philip Hazel for EXIM, intended to closely mimic this extended functionality of PerlRE.
catmatches "cat" anywhere in the input to be searched.
a|bmatches "a" or "b"
grey|grayis the same as
gr(e|a)y
? | 0 or 1 occurance |
* | 0 or more occurances |
+ | 1 or more occurances |
{N} | N occurances |
{M,N} | between M and N occurances |
{,N} | at most N occurances |
{N,} | at least N occurances |
\t | tab |
\r | newline |
\b | match any word boundary |
\B | not word boundary |
^ | beginning of line |
$ | end of line |
[] | "bracket expression" is a set of characters |
^ | within [], means "not the following characters" |
. | match any character except newline |
\. | match "." |
\\ | match "\" |
\^ | match "^" |
\$ | match "$" |
\{ | match "{" |
\} | match "}" |
\( | match "(" |
\) | match ")" |
POSIX | Perl | ASCII | Description |
---|---|---|---|
[:alnum:] |
[A-Za-z0-9] |
Alphanumeric characters | |
[:word:] |
\w |
[A-Za-z0-9_] |
Alphanumeric characters plus "_" |
\W |
[^\w] |
non-word character | |
[:alpha:] |
[A-Za-z] |
Alphabetic characters | |
[:blank:] |
[ \t] |
Space and tab | |
[:cntrl:] |
[\x00-\x1F\x7F] |
Control characters | |
[:digit:] |
\d |
[0-9] |
Digits |
\D |
[^\d] |
non-digit | |
[:graph:] |
[\x21-\x7E] |
Visible characters | |
[:lower:] |
[a-z] |
Lowercase letters | |
[:print:] |
[\x20-\x7E] |
Visible characters and spaces | |
[:punct:] |
[-!"#$%&'()*+,./:;<=>?@[\\\]_`{|}~] |
Punctuation characters | |
[:space:] |
\s |
[ \t\r\n\v\f] |
Whitespace characters |
\S |
[^\s] |
non-whitespace character | |
[:upper:] |
[A-Z] |
Uppercase letters | |
[:xdigit:] |
[A-Fa-f0-9] |
Hexadecimal digits |
regex: Vim, expr, lex, EMACS, SED, GREP, AWK
PerlRE: Perl, Python, Java, Ruby, TCL
PCRE: Perl-Compatible Regular Expressions: PHP, Apache HTTP Server, Exim MTA, KDE, Postfix, Analog, Nmap, Safari
More information: