November 28th, 2008

When Larry Wall set out to organize RFCs that were submitted to him for regular expressions, he just threw everything out and redid the whole thing. Grammars are the result of that. In Perl6, regular expressions become “first class citizens.” What this means is that there is now a type for regular expressions. The type is regex. In Perl5, regular expressions are simple strings. This did not allow the creation of readable regular expressions, because you might have wanted to do embed a regular expression inside a regular expression and so on so forth. Another nice thing about grammars is that they act like classes do, so you can do inheritance.

Another “side effect” of grammars is that they look like the Backus Naur Form which is used to describe syntax of programming languages. This allows you to very easily implement a grammar for any programming language. This is very powerful, as it allows you to simplify building syntax highlighting for a language in a text editor (one that is written in Perl6). Such power also allows you to easily build your own interpreter for any language.

On to some code. The following has been tested with rakudo perl, revision 33317 running on top of parrot 0.8.1-devel.

Some things to keep in mind

  • ~~ is the smart match operator (Perl5 uses =~), !~~ is !($string ~~ regex)
  • $/ when evaluated as a string is the whole string that matched ($0 in Perl5)
  • say is a print with an added \n at the end.

my $string = “testing234234″;
my $exp = token { \w+ };
say $/ if ($string ~~ $exp);

That will actually produce the string as output, since it contains word characters. Next, let’s try something a bit more interesting. Let’s consider breaking up the characters from the digits.

my $string = “testing234234″;
token characters { <[a..z]>+ };
token digits { \d+ };
say $/ if ($string ~~ /<characters> <digits>/);

The above code will also produce the string as it was a successful match. But now we get to the interesting part. What if we want to know what exactly each token matched? In Perl5 we’d have to deal with groups and the limitation of 9 maximum capturing groups. We can change try the following code:

my $string = “testing234234″;
token characters { <[a..z]>+ };
token digits { \d+ };
if ($string ~~ /<characters> <digits>/) {
say $/<characters>;
say $/<digits>;

The output is:


The key here is that the smart match object returned in $/ is actually a tree. I have attached to this article an example grammar that matches a URL and also shows inheritance when applied to grammars. Some things to note about the code:

  • zip operation does not currently work on lists of match objects (the infix Z on line 29)
  • URL::HTTPS is URL does not currently work, hence the use of HTTPS is URL
  • !~~ does not work, you have to negate a positive match
  • There appears to be a bug requiring a semi-colon after a grammar definition that is before an if statement

The file: grammar.p6

What this is about.

November 28th, 2008

Opinions are like assholes, everyone has one.

Everyone and their mother has a blog. I want to contribute to this world-wide bandwith waste with random crap, but most of it will be somewhat useful.