CYFPL - Part 2 - Inception 01-02-2017, 06:29 PM
#1

Hello all, and welcome to part 2 of the CYFPL series. Today we're going to start working on the fun stuff. We're going to define the rules that our language will live by.

You should remember from last time our non-terminals. Let's start working with those, shall we.

Firstly, I want to start with our factors, these will be the last non-terminal to be parsed before we hit a value (digit-seq or ident). Because these are the last, they will be the easiest to understand.

Let's hit the obvious ones first, a factor can be a number, or a variable name.

Perfect, now we have two cases to look at, so let's move up one step in the chain, to a term.

A term can obviously be a factor, so let's hammer that one off right away. But, aside from that, a term can also be made up of two parts. Those two parts are a factor mult-op another factor. Allowing us to be able to parse 5 * 6, which is handy.

It may not look it, but the order those are in is actually extremely important. See, a RDT works by parsing whichever combination matches first, and since we will only be matching in non-terminals, we aren't going to check if it was the last token in the statement. So, if we had the single factor line first, then the compiler would think that 5 * 5 = 5, not 25.

Now that we have the basics of a term down, let's move up again to an expression.

The obvious case for an expression is that it can be a single term, so we're going to handle that one last. At this point, you may be wondering why a term is not something + something, and this will probably be the most confusing part of the whole series for you. To comply with order of operations, we MUST compute multiplication before addition, therefore we need to look at multiplication after addition. This is because we are writing a descent parser, so code is actually processed backwards by group. So, an expression can also be a term add-op another term, therefore allowing us to handle order correctly.

Alright, now we are able to parse single terms properly, we have created the basis for a descent parser language. Pat yourself on the back, but come right back because we aren't finished yet!

Ok, so we have our basic math language in descent form, but we are trying to build a complete language, which is where the word recursive comes into play. See, right now all this language can handle is single term arithmetic (term used in sense of mathematics). So, let's look at it from the top to the bottom. We have:

However, looking at expression, an expression can easily contain an expression of its own, right?

Also, a term can also include another term.

perfect! If you spend a little time thinking about it, it will make sense why we did it that way. We won't need the extra case because each of those will ultimately evaluate to an integer.

Now, there is one last piece we are missing. Order of operations includes parenthesis. So, we're going to think about it. What does something in parens mean in mathematics?

It means to evaluate the expression independently.

So, that means that we need to include

in one of our steps.

If you remember from just a short while ago, that we execute from bottom up, it will make sense that we should parse this at the factor level, since it is our last hit. We will also scan for this first

Perfect!

Now, we have basically grammar definitions for a calculator, which is really all that a programming language is. However, ours is very basic and we don't have the ability to automate anything. We will start that off with variables.

This is where we will travel one more level up, to the statement.

Now, we have a full fledged calculator. However, we want to make this more of a language than a bad implementation of R. So, let's add a little more functionality.

We want the following to make it seem more code like:

I think those are all pretty obvious, except for printing text. So far, we haven't defined text-seq, though I'm sure you know what it means, just be aware of it.

So, let's look at our entire language spec at this point.

Awesome! We now have the exact grammar for <your name here> programming language. Just for fun, I want you to draw some trees of the language to test it out. At the same time, try and convert this set of trees into their statements.

Keep in mind, the terminology in this one is for a different spec, but the basics are identical.

Next up, let's start writing the code! I will be using the C language to build this parser, but I am happy to work with any compiled language (that means no python or VB) if you guys have issues understanding C code.

As always, thank you for reading this, feel free to leave replies, reputation, or hit me up on discord.

You should remember from last time our non-terminals. Let's start working with those, shall we.

Firstly, I want to start with our factors, these will be the last non-terminal to be parsed before we hit a value (digit-seq or ident). Because these are the last, they will be the easiest to understand.

Let's hit the obvious ones first, a factor can be a number, or a variable name.

Code:

`F -> digit-seq`

F -> ident

Perfect, now we have two cases to look at, so let's move up one step in the chain, to a term.

A term can obviously be a factor, so let's hammer that one off right away. But, aside from that, a term can also be made up of two parts. Those two parts are a factor mult-op another factor. Allowing us to be able to parse 5 * 6, which is handy.

Code:

`T -> F mult-op F`

T -> F

Now that we have the basics of a term down, let's move up again to an expression.

The obvious case for an expression is that it can be a single term, so we're going to handle that one last. At this point, you may be wondering why a term is not something + something, and this will probably be the most confusing part of the whole series for you. To comply with order of operations, we MUST compute multiplication before addition, therefore we need to look at multiplication after addition. This is because we are writing a descent parser, so code is actually processed backwards by group. So, an expression can also be a term add-op another term, therefore allowing us to handle order correctly.

Code:

`E -> T add-op T`

E -> T

Alright, now we are able to parse single terms properly, we have created the basis for a descent parser language. Pat yourself on the back, but come right back because we aren't finished yet!

Ok, so we have our basic math language in descent form, but we are trying to build a complete language, which is where the word recursive comes into play. See, right now all this language can handle is single term arithmetic (term used in sense of mathematics). So, let's look at it from the top to the bottom. We have:

Code:

`E -> T add-op T`

E -> T

T -> F mult-op F

T -> F

F -> digit-seq

F -> ident

However, looking at expression, an expression can easily contain an expression of its own, right?

Also, a term can also include another term.

Code:

`E -> T add-op E`

E -> T

T -> F mult-op T

T -> F

F -> digit-seq

F -> ident

Now, there is one last piece we are missing. Order of operations includes parenthesis. So, we're going to think about it. What does something in parens mean in mathematics?

It means to evaluate the expression independently.

So, that means that we need to include

Code:

`(E)`

If you remember from just a short while ago, that we execute from bottom up, it will make sense that we should parse this at the factor level, since it is our last hit. We will also scan for this first

Code:

`E -> T add-op E`

E -> T

T -> F mult-op T

T -> F

F -> (E)

F -> digit-seq

F -> ident

Perfect!

Now, we have basically grammar definitions for a calculator, which is really all that a programming language is. However, ours is very basic and we don't have the ability to automate anything. We will start that off with variables.

This is where we will travel one more level up, to the statement.

Code:

`S -> ident = E`

We want the following to make it seem more code like:

- Ability to create and delete variables at runtime

- Ability to read input

- Ability to write text

- Ability to write variables

- Ability to call functions

Code:

`S -> var ident`

S -> del ident

S -> read ident

S -> print ident

S -> print text-seq

S -> call ident

S -> return

So, let's look at our entire language spec at this point.

Code:

`S -> var ident`

S -> del ident

S -> read ident

S -> print ident

S -> print text-seq

S -> call ident

S -> return

S -> ident = E

E -> T add-op E

E -> T

T -> F mult-op T

T -> F

F -> (E)

F -> digit-seq

F -> ident

add-op -> + -

mult-op-> * / %

ident -> name

digit-seq-> 0123456789

text-seq-> string

Awesome! We now have the exact grammar for <your name here> programming language. Just for fun, I want you to draw some trees of the language to test it out. At the same time, try and convert this set of trees into their statements.

Keep in mind, the terminology in this one is for a different spec, but the basics are identical.

Next up, let's start writing the code! I will be using the C language to build this parser, but I am happy to work with any compiled language (that means no python or VB) if you guys have issues understanding C code.

As always, thank you for reading this, feel free to leave replies, reputation, or hit me up on discord.