Archive for the ‘regular expressions’ Category

Regex: Encompassing your needs with parentheses

Wednesday, February 2nd, 2011

While Character Classes can be used to sum up the possible variations within a single space, the regular expression language also provides a way to look for multiple multi-character expressions, through the use of parentheses, (), as well as the | symbol.

For instance, if you are looking, in a particular location, for either the word “train” or “bus” you would express that as “(train|bus).”

Alternation can also be used to alternative word spellings as well. If you are looking for either the word “color” or “colour,” one way to build the expression would be “col(o|ou)r.”

Material taken from the book:




all mistakes are my own however…–Joab Jackson





Regex: character classes bracket the possible

Sunday, January 23rd, 2011

One of the ways in which regular expressions (regex) are more powerful than simple pattern matching filters is that the regex syntax offers a wide set of metacharacters that can be used to identify complex patterns.

For instance, regex uses a set of square brackets, [], to hold a character class, or a range of possible characters that could fit within a single space.

In other words, using a character class, you can match an expression that could have one of a number characters in a given space.

For instance, the regex h[eu]llo World, would match either Hello World or Hullo World.

Character classes have a range of metacharacters to help advanced searching.

Within a character class, the - character represents a range of characters: <H[1-6]> would match <H1> through <H6>.

Ranges within character classes also work for letters, though they are case sensitive: [a-zA-Z] would work for all letters.

Character classes can consist of a combination of ranges and literal characters: [a-z7!].

Note, however, that each instance of a character class is a set of possible values for a single space: [acquainted] will match every word with the letters, a,c,q,u,a,i,n,t, e or d, not the word acquainted itself.

You can also find phrases that do not have a particular phrase, through the ^ within a character class: [^c] matches any word that does not contain the letter c. s[^k] will highlight any instances where an “s” is not followed by a “k,” and ignore those where it is (such as “sky”).

The dot, “.” is a place holder. It represents any character. For instance, if you are looking for a word with an unknown second character (“h7llo” or “hxllo,”) you could use h[.]llo which would match any occurrence of the pattern “h?llo”

Keep in mind that, within regular expressions, regex metacharacters such as “^” and “-” have different meanings when they are placed inside characters classes than when they are outside them.

Material taken from the book:




all mistakes are my own however…–Joab Jackson