Archive for January, 2011

Regex: character classes bracket the possible

Sunday, January 23rd, 2011

One of the ways in which regular expressions (regex) are more powerful than simple pattern matching filters is that the regex syntax offers a wide set of metacharacters that can be used to identify complex patterns.

For instance, regex uses a set of square brackets, [], to hold a character class, or a range of possible characters that could fit within a single space.

In other words, using a character class, you can match an expression that could have one of a number characters in a given space.

For instance, the regex h[eu]llo World, would match either Hello World or Hullo World.

Character classes have a range of metacharacters to help advanced searching.

Within a character class, the - character represents a range of characters: <H[1-6]> would match <H1> through <H6>.

Ranges within character classes also work for letters, though they are case sensitive: [a-zA-Z] would work for all letters.

Character classes can consist of a combination of ranges and literal characters: [a-z7!].

Note, however, that each instance of a character class is a set of possible values for a single space: [acquainted] will match every word with the letters, a,c,q,u,a,i,n,t, e or d, not the word acquainted itself.

You can also find phrases that do not have a particular phrase, through the ^ within a character class: [^c] matches any word that does not contain the letter c. s[^k] will highlight any instances where an “s” is not followed by a “k,” and ignore those where it is (such as “sky”).

The dot, “.” is a place holder. It represents any character. For instance, if you are looking for a word with an unknown second character (“h7llo” or “hxllo,”) you could use h[.]llo which would match any occurrence of the pattern “h?llo”

Keep in mind that, within regular expressions, regex metacharacters such as “^” and “-” have different meanings when they are placed inside characters classes than when they are outside them.

Material taken from the book:




all mistakes are my own however…–Joab Jackson