Current version: 1.0.1

Regular Expression Capabilities


The following is a comprehensive list of the regular expression language that regldg will understand.

Individual Characters
  Meta-characters
Meta-character classes
Groupings
Alternations
Backreferences
Character classes
Quantifiers


Individual characters


You can enter individual characters in a few methods. regldg operates on characters in the ASCII and extended ASCII system, values 0 through 255.

Regular ExpressionMeaningExampleProduces
p (any printable character)p (that printable character)pp
\aBell character (ASCII 7)\a[BEL]
\bBackspace character (ASCII 8)\b[BS]
\tHorizontal tab character (ASCII 9)\t[HT]
\nNewline character (ASCII 10)\n[NL]
\vVertical tab character (ASCII 11)\v[VT]
\fForm feed character (ASCII 12)\f[FF]
\rCarriage return character (ASCII 13)\r[CR]
\eEscape character (ASCII 27)\e[ESC]
\zNNNA character specified by the ASCII code NNN (decimal). NNN can be 1, 2, or 3 digits, less than 256.\z491
\z{NNN}A character specified by the ASCII code NNN (decimal). NNN can be 1, 2, or 3 digits, less than 256.
The { and } help to avoid confusion. See note below.
\z{119}w
\oNNNA character specified by the ASCII code NNN (octal). NNN can be 1, 2, or 3 digits, less than 400 (octal).\o072:
\o{NNN}A character specified by the ASCII code NNN (octal). NNN can be 1, 2, or 3 digits, less than 400 (octal).
The { and } help to avoid confusion. See note below.
\o{12}[NL]
\xNNA character specified by the ASCII code NN (hexadecimal). NN can be 1 or 2 digits, less than FF (hexadecimal).\x5D]
\x{NN}A character specified by the ASCII code NN (hexadecimal). NN can be 1 or 2 digits, less than FF (hexadecimal).
The { and } help to avoid confusion. See note below.
\x{26}&

Possible confusion with numerically-specified characters

Numerically-specified characters (using the constructs \zNNN, \oNNN, and \xNNN are a source of possible confusion. Consider the regular expression \z1234. Does it mean \z1 234, \z12 34, or \z123 4? Who's to say? regldg will interpret it as the last case, because it will continue to build numerically specified characters until the limits of its type are reached. Here, a decimal numerically specified character can use up to three numbers, and since they were available, it used all three. To avoid possible confusion, use the { and } characters to tell regldg exactly which numbers to use in your numerically specified characters.

Meta-characters

Certain characters have two meanings in regular expressions. Alone, their meaning is not what they look like. See below on this page in other sections for each meta-characters special meaning. To use a meta-character's printed meaning, just put a \ before it (this is called "escaping" it). A list of these characters are as follows:

Meta-characters which must be escaped
\|*?
+.()
[]{}

An example regular expression is 1+1. This does not mean 1+1 as it looks, because the + is a quantifier (see the section Quantifiers below). To make 1+1, you must escape the + in the regular expression, making the proper regular expression 1\+1.


Meta-character classes


regldg understands the basic meta-character classes in perl. In regldg, however, meta-character classes are subject to the constraints of the character universe and the strictness of checking the character universe. For more information about the character universe, see character universes.

Meta-character classCharacters includedDescription
.Any character in the current character universe (including \n)
\d0123456789Digits
\DAny character in the current character universe, excluding the members of \d
\s[SPACE][HT][VT][NL][FF]Whitespaces
\SAny character in the current character universe, excluding the members of \s
\wABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789_
Alphanumerics and _
\WAny character in the current character universe, excluding the members of \w
\u{1}ABCDEFGHIJKLMNOPQRSTUVWXYZUppercase letters
\u{2}abcdefghijklmnopqrstuvwxyzLowercase letters
\u{4}0123456789Digits
\u{8}!@#$%^&*Shift-with-numbers
\u{16};`:'[SPACE],".?_Punctuation
\u{32}(){}[]Closures
\u{64}~\/|Others
\u{128}+-=<>Math
\u{NNN}NNN is a number in decimal between 0 and 255, representing the sum of the pre-defined universe character class numbers. The resulting character class will be the union of all the included pre-defined character universes.

Example: \u{233}
233 = 128 + 64 + 32 + 8 + 1
So, \u{233} will be the union of \u{1}, \u{8}, \u{32}, \u{64}, and \u{128}
\U{NNN}NNN is a number in decimal between 0 and 255, representing the sum of the pre-defined universe character class numbers. The resulting character class will be any character in the current universe, excluding the members of the union of all the included pre-defined universe character classes.

Example: \U{189}
189 = 128 + 32 + 16 + 8 + 4 + 1
So, \U{189} will be any character in the current character universe, excluding the members of the union of \u{1}, \u{4}, \u{8}, \u{32}, and \u{128}


Groupings


Groupings, nested groupings, and backreferences to the groupings are supported. Grouping characters together helps clarify alternations, and allows repeating of past patterns (using backreferences and quantifiers) in a singular regular expression output.


  > regldg -m 35 "(firstpart)anotherpart(second(third)part)"
firstpartanotherpartsecondthirdpart
 


Alternations


Alternations allow you to use "this" or "that".


  > regldg "ab|cd"
ab
cd
 

Alternations are often used with groupings when there are things in the regular expression which are not to be involved in the "this" or "that" game.


  > regldg "fla(t|pper)"
flat
flapper
 

regldg can also use multiple alternations to use "this" or "that" or "that" or "that" or "that".


  > regldg "(spl|th|fl|r)at"
splat
that
flat
rat
 


Backreferences


Backreferences are placeholders used to repeat a grouping from before in the same pattern. Groupings are numbers by their starting ( and can be referred to only after they have been closed with a ).


  > regldg -us 19 -m 46 "(Pat|Grandma) went to school today in \1's car\."
Pat went to school today in Pat's car.
Grandma went to school today in Grandma's car.
 


  > regldg -m 9 -us 19 "(a(b)c) \1 \2"
abc abc b
 

regldg includes an alternative method to use backreferences. Instead of \1 to mean a backreference to grouping 1, you can use \!{1}. This will completely avoid the ambiguity of whether it is a backreference or an octally-specified character. (This is, of course, as long as you know this syntax. Otherwise, you might be completely confused as to what it is!) In action:


  > regldg -m 9 -us 19 "(a(b)c) \\!{1} \\!{2}"
abc abc b
 

Note the double-\s... these were required for me to enter this regex in a tcsh. To avoid this problem, you could use the command line option --file=- and enter the regex directly into the program instead.


Character classes


Character classes represent all possible characters for a single location.


  > regldg "[ab][cd]"
ac
bc
ad
bd
 

Some meta-characters don't need to be escaped while in character classes. These are (, *, +, ?, {, [, |, ), and }. The \ and . characters definitely need to be escaped in a character class. The range character - and the end-character-class character ] must be escaped unless they are the only character in the character class.


  > regldg -uc 0 "[(*+?{[|)}\\\-\]\.]"
(
*
+
?
{
[
|
)
}
\
-
]
.
 

regldg is also capable of negated character classes, that is, character classes starting with the ^ character. A negated character class represents all characters in the current character universe, execpt those explicitly written in the negated character class.


  > regldg -us 2 "[^abcde]"
f
g
h
i
j
k
l
m
n
o
p
q
...
 

[-] and []] are both handled correctly: [-] is a character class containing only a - character, and []] is a character class containing only a ] character. Both are actually silly, because a one-element character class could instead be just that character. In any other character class, the - and ] characters are meta-characters, and need to be escaped.


Quantifiers


Quantifiers will allow you to write a character, character class, meta-character class or group once, and have it occur a specifed (possibly variable) number of times.

QuantifierMeaning
*The previous character, character class, meta-character class or group occurs between 0 to unlimited times (inclusive). (Unlimited is controlled by the maximum word length of the program.)
+The previous character, character class, meta-character class or group occurs between 1 to unlimited times (inclusive). (Unlimited is controlled by the maximum word length of the program.)
?The previous character, character class, meta-character class or group occurs between 0 to 1 time (inclusive).
{2}The previous character, character class, meta-character class or group occurs 2 times.
{1,3}The previous character, character class, meta-character class or group occurs between 1 to 3 times (inclusive).
{4,}The previous character, character class, meta-character class or group occurs between 4 and unlimited times (inclusive). (Unlimitied is controlled by the maximum word length of the program.)

It is assumed that you have a good understanding of the usage of these items. A very important example, however, is when a quantifier acts on groups containing alternations. In a single word of output, any number of sides of the alternation could be used. The regular expression [a|b]{2} will produce aa, bb, AND ab and ba. Shown in an explicit example:


  > regldg "(ab|cd){2}"
abab
cdab
abcd
cdcd