regldg home

Current version: 1.0.1

What is regldg?

regldg, as it says above, is a regular expression grammar language dictionary generator. This means regldg can generate all possible strings of text that match a given pattern. This is opposite the usual use of regular expressions in several languages, most notably perl: in these languages, regular expressions are used to see if a string you have can match a pattern. To see the difference another way, in regldg, you give it a pattern, and it will create all the strings of text which match it. In perl, you give it a pattern and a string to test, and perl will tell you if your string matches your pattern.

A regular expression is a concise way to define a pattern of text. As a simple example, let's say you have an English dictionary file on your computer, and you want to see all of the words that start with a and also have a p in them somewhere. Or, you want to list all words containing adjacent, duplicate letters. These two examples are very easy to accomplish using regular expressions.

In computer science speak, a regular expression pattern defines a grammar. All words (in the above paragraph, "strings"), which match the pattern are said to be in the language defined by the grammar. And as we all know, a book containing the words of a language is called a dictionary. Thus the title and function of the program are now clarified.

This website will not teach regular expressions, but will focus on their use with the regldg program. For more information on regular expressions, a simple google search will yield (at the time of this writing) 48,500,000 sites. My source for learning regular expressions was in the perl man pages. (No, their not only for men, it's short for manual.) Make sure to check out the sections entitled "perlrequick", "perlretut", and "perlre". On *nix systems, try man perlrequick and man perlreftut. I can also recommend O'Reilly's book "Mastering Regular Expressions".

regldg is written in c, and should be able to be run (but has not been tested) on any platform which has a c compiler. In other words, if you have a computer, you can use regldg.

Before you begin

Before you jump in, there are a few things to know. First, there are some differences between the regular expressions you find in perl, and the regular expressions this program understands. Implementing all of perl's regular expressions either weren't required for my uses, could be expressed in other ways, or just didn't make sense. A brief of the regular expression differences is given below, but for a more comprehensive list, you'd do better to check out the documentation pages regular expressions capabilities and regular expressions differences between perl and regldg.

Specific ASCII characters can be specified in perl in decimal, octal, hexadecimal, and wide hex formats. regldg can do all these except wide hex, but in slightly different ways. For example, in perl, \40 will produce a space character. (40 in octal is 32 in decimal, and character 32 in ASCII is a space). In regldg, you can use \o40, \x20, or \z32 (z for decimal integer). This was done to remove the ambiguity of \1 in perl-is it an ASCII 1 character, or is a backreference to the first grouping? Some features of perl's regular expressions are missing. perl's or POSIX's named character classes like {IsUpper} or [:digit:] are not implemented in regldg. You can, however, create your own character classes like [A-Z] and [0-9], so that shouldn't be a big deal. Also currently missing are advanced regular expressions features like non- capturing groupings, zero-width assertions, and probably a bunch more things that I don't know about. These weren't necessary for my own applications, and could be added in later versions if necessary. Finally, some regular expressions features just don't make sense for use in regldg. What would you expect the difference in regldg's output to be for the regular expressions a*? and a* ? I don't know either, so I didn't implement it.

As a final note, uniqueness in the output is not guaranteed. Quite to the contrary, regular expressions like a{2,3}a{3,2} are guaranteed to produce duplicates.

Meaning of the logo

O'Reilly's series of computer books use a different animal for each topic. They use an owl for the cover of Mastering Regular Expressions. regldg uses regular expressions and processes them to their bone to extract every possible thing they mean. Thus the owl being processed in a meat grinder, resulting in a dictionary.

Origins of regldg

The development of regldg began under the hot Namibian sun for morally-correct reasons, which include an unreasonable person in a position of authority. She was in charge of a large branch of a volunteer organization, and strangely lacked the right experience, common sense, or concern for others. I just wanted to check my email once a month, and specifically, I knew her password started with omu, was 7 characters long, and I only had access to a very old computer. Thus regldg was born.