chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 1 Vote(s) - 5 Average


Tutorial CYFA - Creating Your First Assembler - The Processing filter_list
Author
Message
CYFA - Creating Your First Assembler - The Processing #1
So, in Part 8 - The Language we wrote all of the structures, enumerations, format declarations, etc for our language. In a normal dev environment, we would probably use yacc (Yet Another Compiler Compiler), but I wanted to give you all the guts of how it worked.

Before we begin
I realized that I made a mistake in Part 6 - Getting structured, dealing with the structure for handling Operand2 when using a register. Rather than explain all of the changes, I'll simply paste the new contents of instruction/data.h below. I'll also be pasting this same block in part 6 (if I can still edit it).
Code:
#ifndef INSTRUCTION_DATA_H_INCLUDED
#define INSTRUCTION_DATA_H_INCLUDED

#include <stdint.h>

/* fixes */
enum instruction_data_operand2_register_shift_type
{
   kDATA_OP2_LOGIC_LEFT      = 0x0,
   kDATA_OP2_LOGIC_RIGHT     = 0x1,
   kDATA_OP2_LOGIC_ARR_RIGHT = 0x2,
   kDATA_OP2_LOGIC_ROT_RIGHT = 0x3,
};
struct instruction_data_operand_register_shift_imm
{
   uint8_t hard0: 0x1; // set 0x0
   uint8_t type : 0x2; // see enum instruction_data_operand2_register_shift_type
   uint8_t imm  : 0x5; // shift ammount
};
struct instruction_data_operand_register_shift_reg
{
   uint8_t hard1 : 0x1; // set 0x1
   uint8_t type  : 0x2; // see enum instruction_data_operand2_register_shift_type
   uint8_t hard0 : 0x1; // set 0x0
   uint8_t reg   : 0x4; // register holding value
};
union instruction_data_operand2_register_high
{
   struct instruction_data_operand_register_shift_imm imm;
   struct instruction_data_operand_register_shift_reg reg;
};
struct instruction_data_operand2_register
{
   uint8_t Rm : 0x4; // op2 as register
   union instruction_data_operand2_register_high shift;
};
/* operand2 for data processing */
struct instruction_data_operand2_immediate
{
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
/* structure for instruction */
struct instruction_data
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x0
   uint8_t     I       : 0x1; // immediate bit
   uint8_t     opcode  : 0x4; // operation code
   uint8_t     S       : 0x1; // condition set
   uint8_t     Rn      : 0x4; // 1st operand reg
   uint8_t     Rd      : 0x4; // destination reg
   union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};

#endif

So, in this one we're going to start writing the rules, as well as some of the processing. So, the first thing I want to do in this is to make a note. We will not be accepting rules into our parser unless it starts with an opcode. This means that we can set up category rules to catch and organize things. Let's go ahead and write 3 basic rules. These rules will be category rules, meaning that they will allow us to group our instructions into our 3 different types at the top level. Remember these types are as follows:
  1. Data Processing
  2. Single Data Transfer
  3. Branch
So, let's go ahead and navigate to language/rules.c. I'll walk you through the first one.
First, we will need to define (and undefine at the end) two macros. These make it possible for us to initialize our list statically. They are as follows:
Code:
#define SYN_SRT (enum language_token[]) {
#define SYN_END }
We'll undefine these after our rules block, so don't worry.

Next, we need to modify language/constants.h
I mistakenly told you previously that these should all be const static uint32_t when actually they need to be macros. Go ahead and replace with this text:
Code:
#ifndef CONSTANTS_H_INCLUDED
#define CONSTANTS_H_INCLUDED

#define C_OPCODES_DP \
   kOPCODE_AND | \
   kOPCODE_EOR | \
   kOPCODE_SUB | \
   kOPCODE_RSB | \
   kOPCODE_ADD | \
   kOPCODE_ADC | \
   kOPCODE_SBC | \
   kOPCODE_RSC | \
   kOPCODE_TST | \
   kOPCODE_TEQ | \
   kOPCODE_CMP | \
   kOPCODE_CMN | \
   kOPCODE_ORR | \
   kOPCODE_MOV | \
   kOPCODE_BIC | \
   kOPCODE_MVN
#define C_OPCODES_DT \
   kOPCODE_LDR |\
   kOPCODE_STR
#define C_OPCODES_BR \
   kOPCODE_B   | \
   kOPCODE_BL

#endif // CONSTANTS_H_INCLUDED
Remember our format:
Code:
struct language_rule
{
   char *name;
   enum instruction_type type;
   enum language_token *syntax;
   char *characters;
   uint32_t allowed_opcodes;
};
I like to paste this block (in comments) in the file I'm working with when I write this, so that I don't lose track of anything. Don't worry, I'll also comment the fields.
Let's go over the fields real quick:
Code:
Name : Let's call it CAT_DP. This stands for Category Data Processing. Pretty straightforward
Type : Data processing
Syntax : opcode{.cond} rd,rs,rm
These ones are the basic ones. So let's go ahead and set the name and type:
Code:
{
       "CAT_DP",           /* .name */
       kINSTRUCTION_DATA,  /* .type */
OK. Now, let's break down the syntax into tokens.
Code:
opcode{.cond}
Now, we have two tokens here, technically 3. We have
Code:
OPCODE
CHARACTER
CONDITION
Now, we actually wrote a shorthand for this, it's the token kTOKEN_MNEMONIC, which can either evaluate to a straight opcode, or an opcode and a condition. This is the key to category rules. So that structure looks like this:
Code:
SYN_SRT                   /* .syntax */
kTOKEN_MNEMONIC,
kTOKEN_REGISTER,
kTOKEN_CHARACTER,
kTOKEN_REGISTER,
kTOKEN_CHARACTER,
kTOKEN_REGISTER
SYN_END
I apologize for the formatting....it's acting weird.
Ok. So next we need to fill in the characters, because we have 2 of those. We know these two are comas.
Lastly, what are our allowed opcodes? Well, it's all 16 of the data processing ones. Rather than writing them all out there, we have a handy constant that we made in the last one, called C_OPCODES_DP. Our final rule should look like this:
Code:
{
   "CAT_DP",           /* .name */
   kINSTRUCTION_DATA,  /* .type */
   SYN_SRT             /* .syntax */
       kTOKEN_MNEMONIC,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER
   SYN_END,
   ",,",              /* .characters */
   C_OPCODES_DP       /* .allowed_opcodes */
}

Cool. Now, I'm going to write the rules for DT and BR, feel free to try writing these on your own as well. The final file looks like this:
Code:
#include <stdlib.h>
#include <language.h>

struct language_rule rules[] =
{
   {
       "CAT_DP",           /* .name */
       kINSTRUCTION_DATA,  /* .type */
       SYN_SRT             /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_REGISTER,
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER,
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER
       SYN_END,
       ",,",              /* .characters */
       C_OPCODES_DP       /* .allowed_opcodes */
   },
   {
       "CAT_DT",               /* .name */
       kINSTRUCTION_TRANSFER,  /* .type */
       SYN_SRT                 /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_REGISTER,
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER,
           kTOKEN_CHARACTER,
           kTOKEN_CONSTANT
       SYN_END,
       ",,",                  /* .characters */
       C_OPCODES_DT           /* .allowed_opcodes */
   },
   {
       "CAT_BR",             /* .name */
       kINSTRUCTION_BRANCH,  /* .type */
       SYN_SRT               /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_CONSTANT
       SYN_END,
       NULL,                /* .characters */
       C_OPCODES_BR         /* .allowed_opcodes */
   }
};

Cool. Now, in order for this to work, we need to make 2 hard coded rules (not the same type of rule) into our assembler:
  1. All lines must begin matching with a rule who's Syntax[0] is kTOKEN_MNEMONIC
  2. All matched rules must have Syntax[0] = kTOKEN_OPCODE before being passed to the encoder
What this basically says is that a line must match a category rule, and then be processed from there. This will prevent us from having to write seriously complex rules when we start adding more instructions later, and allow us to have a clear debug path.

Ok, now I want to make a few changes to our rules list. These categories will match the most basic form, and operate only on registers. We actually need 4 rules to match it all. Data processing has two forms. One that takes a register as the third arg, and one that takes an offset. We need two different rules for this. Secondly, Since this is a category rule, we should avoid using kTOKEN_CONSTANT, since that requires the user know the constant at the time, and disallows the use of simple math operators, which are handy. Let's go ahead and change that to kTOKEN_EXPRESSION, which will allow us to use a constant, a simple equation, or even the name of a label (which is really useful for branching into subroutines). My updated file looks like this:
Code:
#include <stdlib.h>
#include <language.h>

#define SYN_SRT (enum language_token[]) {
#define SYN_END }

struct language_rule rules[] =
{
   /* category rules */
   { /* CATEGORY RULE for Data Processing (with register) */
       "CAT_DP_REG",       /* .name */
       kINSTRUCTION_DATA,  /* .type */
       SYN_SRT             /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_REGISTER, /* Rd */
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER, /* Rn */
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER  /* Rm */
       SYN_END,
       ",,",              /* .characters */
       (uint32_t)C_OPCODES_DP       /* .allowed_opcodes */
   },
   { /* CATEGORY RULE for Data Processing (with offset) */
       "CAT_DP_OFF",       /* .name */
       kINSTRUCTION_DATA,  /* .type */
       SYN_SRT             /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_REGISTER,   /* Rd */
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER,   /* Rn */
           kTOKEN_CHARACTER,
           kTOKEN_EXPRESSION  /* Offset */
       SYN_END,
       ",,",              /* .characters */
       C_OPCODES_DP       /* .allowed_opcodes */
   },
   { /* CATEGORY RULE for Single Data Transfer */
       "CAT_DT",               /* .name */
       kINSTRUCTION_TRANSFER,  /* .type */
       SYN_SRT                 /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_REGISTER,   /* Rd */
           kTOKEN_CHARACTER,
           kTOKEN_REGISTER,   /* Rn */
           kTOKEN_CHARACTER,
           kTOKEN_EXPRESSION  /* offset */
       SYN_END,
       ",,",                  /* .characters */
       C_OPCODES_DT           /* .allowed_opcodes */
   },
   { /* CATEGORY RULE for Branches */
       "CAT_BR",             /* .name */
       kINSTRUCTION_BRANCH,  /* .type */
       SYN_SRT               /* .syntax */
           kTOKEN_MNEMONIC,
           kTOKEN_EXPRESSION /* offset */
       SYN_END,
       NULL,                /* .characters */
       C_OPCODES_BR         /* .allowed_opcodes */
   }
   /* parsing rules */
};

#undef SYN_END
#undef SYN_SRT

Just for the sake of being organized, I'm going to add some comments to separate category rules from parsing rules and the null rule. I won't repaste the code, you'll see it in the next paste.



Ok, I want to take a quick break to apologize for the number of errors I found and corrected without much explanation in this and the last part. Like I've said before, I only have a general plan about how this should all fit together, so occasionally I'm off by a little bit. Ok, now back to the tutorial.



So, now its time to write some rules. I want to segment them so that we have some more valuable debug information, even though we don't have to. I'll separate them into three groups for data processing instructions. The groups will be as follows:
Code:
LGC
AND
EOR
ORR
BIC
MVN
CMP
TST
TEQ
CMP
CMN
ARR
SUB
RSB
ADD
ADC
SBC
RSC

So, let's go into language/constants.h and create these three groups:
Code:
#define CG_OPCODES_LGC \
   kOPCODE_AND | \
   kOPCODE_EOR | \
   kOPCODE_ORR | \
   kOPCODE_BIC | \
   kOPCODE_MVN
#define CG_OPCODES_CMP \
   kOPCODE_TST | \
   kOPCODE_TEQ | \
   kOPCODE_CMP | \
   kOPCODE_CMN
#define CG_OPCODES_ARR \
   kOPCODE_SUB | \
   kOPCODE_RSB | \
   kOPCODE_ADD | \
   kOPCODE_ADC | \
   kOPCODE_SBC | \
   kOPCODE_RSC

Now, the only instructions we don't yet have more defined groups for are
Code:
LDR, STR,
B, BL
That's ok, because there is only two instructions per group for these already, we can just use their opcode specifiers when we write their rules. Let's go ahead and write the basic rule for the logic (LGC) group. While we're add it, let's go ahead and remove that cast from the rule "CAT_DP_REG".
Ok, so our new rule is not a category rule. This means it will not start with a mnemonic, but rather it will start with an opcode. It can still end in an expression if we want it to though. Since this is our basic rule, it will take the most basic form of the instruction, meaning that it will be an opcode with no condition (AKA condition=AL), will take 3 registers as arguments, and will not evaluate to expressions. The three register form of an instruction takes an optional shift to be applied to the value in the register, examples are:
Code:
EOR R0, R0, R0 ;set R0=0
EOR R1, R1, R1 ;set R1=0
ADD R0, R0, #2 ;set R0=2
ADD R1, R1, R0, #1 ;set R1=R0<<1 aka 2<<1 = 4
;it can also take the following:
ADD R1, R1, R0, LSL #1 ;R1=R0<<1
ADD R1, R1, R0, LSR #1 ;R1=R0>>1
ADD R1, R1, R0, ASR #1 ;R1=R0 arithmetic shift right 1 bit
ADD R1, R1, R0, ROR #1 ;R1=R0 rotated right 1 bit
So, for our absolute base rule, we will use only the default form, which is three registers, no shift or rotation applied. Here is the code I wrote for the logic group:
Code:
{
   "AB_DP_LGC",        /* .name */
   kINSTRUCTION_DATA,  /* .type */
   SYN_SRT             /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER
   SYN_END,
   ",,",               /* .characters */
   CG_OPCODES_LGC
}
as you can see, I gave the names a new prefix AB. This stands for "Absolute Base", and it will help us when we are reading the match output, or just in general organization. Now, let's go ahead and write the code for the other two (CMP and ARR). I advise that you try to write these rules yourself, there will only be a couple changes to my above code, and it will be good for you to give it a try.
Code:
/* DP AB Rules */
{ /* ABSOLUTE BASE RULE for Data Processing (logic group) */
   "AB_DP_LGC",        /* .name */
   kINSTRUCTION_DATA,  /* .type */
   SYN_SRT             /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER
   SYN_END,
   ",,",               /* .characters */
   CG_OPCODES_LGC      /* .allowed_opcodes */
},
{ /* ABSOLUTE BASE RULE for Data Processing (comparison group) */
   "AB_DP_CMP",        /* .name */
   kINSTRUCTION_DATA,  /* .type */
   SYN_SRT             /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER
   SYN_END,
   ",,",               /* .characters */
   CG_OPCODES_CMP      /* .allowed_opcodes */
},
{ /* ABSOLUTE BASE RULE for Data Processing (arithmetic group) */
   "AB_DP_ARR",        /* .name */
   kINSTRUCTION_DATA,  /* .type */
   SYN_SRT             /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER
   SYN_END,
   ",,",               /* .characters */
   CG_OPCODES_ARR      /* .allowed_opcodes */
}
Perfect, now our (unwritten) parser should be able to match this form. At this point, it can match every instruction we need to write a basic program. While we're at it, lets write the absolute base rules for DT and BR:
Code:
/* DT/BR AB Rules */
{ /* ABSOLUTE BASE RULE for Single Data Transfer */
   "AB_DT",                    /* .name */
   kINSTRUCTION_TRANSFER,      /* .type */
   SYN_SRT                     /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_REGISTER,
       kTOKEN_CHARACTER,
       kTOKEN_CONSTANT
   SYN_END,
   ",,",                       /* .characters */
   kOPCODE_LDR | kOPCODE_STR   /* .allowed_opcodes */
},
{ /* ABSOLUTE BASE RULE for Branches */
   "AB_BR",                /* .name */
   kINSTRUCTION_BRANCH,    /* .type */
   SYN_SRT                 /* .syntax */
       kTOKEN_OPCODE,
       kTOKEN_CONSTANT     /* offset */
   SYN_END,
   NULL,                   /* .characters */
   kOPCODE_B | kOPCODE_BL  /* .allowed_opcodes */
}
At this point, it should look (organizationally speaking) like this:
[Image: RRuThHD.png]



At this point, I want to take a break from writing rules and start working on the parser. I'm going to start in language/parsing.h
[Image: zhoeVPa.png]

Here we have the three structures that will help us decode and match our instruction lines, but we aren't actually doing anything with them yet, let's change that. Right now, we don't really have any place to hold a mnemonic, but we should just to stay consistent.
Code:
struct language_parsing_mnemonic
{
   struct language_parsing_opcode      opcode;
   struct language_parsing_condition   condition;
   uint8_t                             setflags : 1;
};
Now, I want to make a couple changes to this file at this point.
1. we don't need to call things reg_name, opcode_value, etc. We can just use name and value. This makes it a lot easier to remember later on.
2. we don't actually need the whole language_parsing structures, it will tempt us to modify them, which we shouldn't.
here is the new file:
Code:
#ifndef PARSING_H_INCLUDED
#define PARSING_H_INCLUDED

struct language_parsing_register
{
   char       *name;
   uint8_t     value : 4; // register number
};

struct language_parsing_opcode
{
   char                   *name;
   enum language_opcode    value; /* DO NOT LOR THESE! */
};

struct language_parsing_condition
{
   char                       *name;
   enum instruction_condition  value;
};

struct language_parsing_mnemonic
{
   uint32_t opcode     : 27;
   uint8_t condition   : 4;
   uint8_t setflags    : 1;
};

#endif // PARSING_H_INCLUDED
Now, why did I use bit fields there? Well, a couple of reasons. In using bit fields, I have made the structure size exactly 32 bits long, this will make it not only faster to read the structure, but also saved space in memory. We are currently making use of 20 bits to hold our instructions, so I gave it an extra 7 bits (8-ish more instructions) just so we don't have to adjust. If we need more than that, then we will lose our benefit, but for now it's worth it.
Secondly, what is this setflags bit?
Well, every instruction has the ability to be a comparison instruction at the end. For example, the following C code:
Code:
void a(int input)
{
    int a = 5;
    a -= input;
    if (a == 0)
         b();
}
would normally look like this (without the set bit)
Code:
a:
 EOR R0, R0, R0 ;clear R0 (a)
 ADD R0, R0, #5 ;set a=5
 SUB R0, R0, R1 ;a -= R1 (input)
 EOR R2, R2, R2 ;clear R2 (temp)
 TEQ R0, R2 ;Rd is ignored anyways, im not even going to fill it
 BLEQ b ;if equal, call function b
 RET ;return from frame

But, if we use the set bit, we can lose a few lines of code in that. The set bit sets the status register AFTER the operation is completed, TST is the generic "set status register" instruction, TEQ is just a subset of TST that only sets the Z (equality) bit in that register. Here's the code with the set bit:
Code:
a:
 EOR R0, R0, R0 ;clear R0 (a)
 ADD R0, R0, #5 ;set a=5
 SUBS R0, R0, R1 ;a -= R1 (input), set status register bits with result
 BLEQ b ;if equal, call function b
 RET ;return from frame
Not only is that sequence shorter (and thus more dense and easier to read), but it also saves 2 whole clock cycles! That may not seem like a lot, but imagine if this is in a loop from 0=5000. That means it saved 10k clock cycles, if that was at a clock rate of 700MHz (pretty common for ARM), that saved exactly one millisecond of time, see, they add up.
So, that's what the set bit does, and why you need it.

Ok, this part might get a bit spotty, I was writing it and had a system crash, and when I tried to recover it I found out that I was over the mybb post size limit, so I'm having to rewrite it.
Let's go ahead and write our first parsing function. This function is going to take in a line of input and spit out a linked list of tokens that other functions will process. The point of this function is to strip out the white space and other unnecessary bits. Let's start by making our linked list.
Create a folder called data_structures and inside of it create two files:

data_structures/linked_list.h
Code:
#ifndef LINKED_LIST_H_INCLUDED
#define LINKED_LIST_H_INCLUDED

struct linked_node
{
   void *data;
   struct linked_node *next;
};

void llist_add(struct linked_node **list, void *data);
void llist_free(struct linked_node *list);

#endif // LINKED_LIST_H_INCLUDED

data_structures/linked_list.c
Code:
#include <stdlib.h>
#include <assert.h>
#include <data_structures/linked_list.h>

void llist_add(struct linked_node **list, void *data)
{
   struct linked_node *node;
   struct linked_node *ptr;
   assert(list != NULL);
   node = malloc(sizeof(struct linked_node));
   node->data = data;
   node->next = NULL;
   if (*list != NULL)
   {
       for (ptr = *list; ptr->next != NULL; ptr = ptr->next); //go to end of list
       ptr->next = node;
   }
   else
       *list = node; //set head element
}
void llist_free(struct linked_node *list)
{
   assert(list != NULL);
   if (list->next != NULL)
       llist_free(list->next);
   free(list->data);
   free(list);
}

Now, I just built a generic linked list, it will work fine for our project.

Next, create a folder called parser and create two files: parser/rules.c and parser/rules.h
at this point, your project tree should look like this:
[Image: oeZFQgP.png]

Inside parser/rules.c we need to start off with our includes. It's pretty obvious that we want to include our own header, and since our function will return a linked list, we need to include that too
Code:
#include <parser/rules.h>
#include <data_structures/linked_list.h>

Ok, so we're going to call this function tokenize and it will take in a single line, strip the white space and other stuff out of it, and then spit out a linked list. Let's go ahead and put the function signature in the header file:
Code:
#ifndef RULES_H_INCLUDED
#define RULES_H_INCLUDED

struct linked_node *tokenize(const char *line);

#endif // RULES_H_INCLUDED

Ok, now I'm just going to write this function myself, to keep it simple. I'm using a basic strtok loop. Here's my code
Code:
#include <stdlib.h>
#include <string.h>
#include <parser/rules.h>
#include <data_structures/linked_list.h>

struct linked_node *tokenize(const char *line)
{
   struct linked_node *list;
   char *copy; // make a copy, because strtok destroys the input string
   char *token; // will store the output of strtok
   token = NULL; // init
   list = NULL; // init (so add works properly)
   copy = strdup(line);
   token = strtok(copy, " \t\r\n");
   if (token != NULL) //line is not empty
   {
       llist_add(&list, strdup(token));
       while ((token = strtok(NULL, " \t\r\n")))
           llist_add(&list, strdup(token));
   }
   free(copy);
   return list;
};

Let's do a quick build check to make sure we are on track
[Image: 3r0HCpq.png]

Perfect! Ok, now lets run a test of it. Let's go ahead and add some includes to main and write a very basic function to test that.
[Image: Mc2CJWV.png]

Cool, now that its in place and we know it builds, let's set our breakpoint, build it, and start a debugger on it:
[Image: bh2rInC.png]

Awesome. At this point, our tokenize function has run, and we should have a first token. This token should be "EOR", let's check it
[Image: IqWDJ0K.png]
Perfect! now, I'm just going to run through and check all of the tokens
[Image: PsuQuT9.png]

we got
Code:
EOR
R0,
R0,
R0
;clear
R0

This should look strange to you. Our tokenizer did work exactly how we wrote it to, but it's not quite finished. We should only have (for final processing)
Code:
EOR
R0,
R0,
R0

In our test string, I added a comment. We need to filter those out. We'll do that in two places. The easiest one is inside tokenize
Go ahead and find the line that says
Code:
if (token != NULL)
and change it to be
Code:
if (token != NULL && *token != ';') //line is not empty and not comment

cool. Now, we need a second function. This one will be internal to our parser though, so no function signature in the header. Its job is to sort through the list, find the first token that starts with ';', and delete both it and everything after. Pretty simple. We already know that token 0 will not be a comment, so we don't have to worry about nulling the whole list. This function is actually pretty simple:
Code:
void remove_comments(struct linked_node *list)
{
   struct linked_node *curr;
   for (curr = list; curr->next != NULL; curr = curr->next)
   {
       if (*(char *)curr->next->data == ';')
       {
           llist_free(curr->next); // delete elements
           curr->next = NULL; // remove link
           break; // stop the loop
       }
   }
}
Ok, let's give it a test: I just added this into my main
[Image: jw2kX4t.png]
Perfect, now let's add the signature in parser/rules.c, add it to tokenize (right above the return) and wrap it up!



Ok, well due to constraints on how long this post can be (the text alone is 26Kb), I have to wrap this one up. Next time we'll work more on our parser, sorry. Please talk about this thread below!

Reply






Users browsing this thread: 1 Guest(s)