chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 1 Vote(s) - 5 Average


Tutorial CYFA - Creating Your First Assembler - Getting Structured filter_list
Author
Message
CYFA - Creating Your First Assembler - Getting Structured #1
NOTE: an error has been found in this thread, please see note at end, it's important. This same correction notice will be present in Part 9 in case you miss it

Ok, so it's been a while since I've done a tutorial, and I really wanted to get into the cool stuff of this one, so here we go.

First of all, we'll be using Code:Blocks, since I'm assuming most of you are on Windows, those of you who aren't, impress me by making a nice Makefile project Smile

Let's get our project set up:
[Image: umXaA1S.png]
We're going to go ahead and click new project down there from our list

At this point, a dialog should appear, and we're going to create a new console application (meaning this will not have a GUI), and click OK
[Image: YyDPbHq.png]

Now, we can select that this will be a C project, we won't need any sort of object orientation so we'll stick with strict C where things are easier
Go ahead and fill out the project path settings like mine
[Image: i8KnC6i.png]
and go through the rest of the dialog

Now, we should have a new project with nothing in it, but we don't because this is a windows editor and they think we all need to start with a hello world program..
Modify it to look like this
[Image: Ctt3Efh.png]



Go ahead and add a new header file "instruction.h" to the project, this is what we'll be doing today
[Image: nG43z2C.png]
In this file, let's lay out the skeleton for our instruction processing
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <stdint.h>

/* define individual instruction classes */
struct instruction_data
{
   
};
struct instruction_transfer
{
   
};
struct instruction_branch
{
   
};
/* wrapper body */
union instruction
{

};

#endif


Now, this is where your knowledge of C is going to be really important, there are two ways to do this from here out. We can do it the rookie way and have a bunch of processing functions, or we can do it so that it's native to our application (using unions and bit fields). We'll be doing the second one.

If your knowledge of C is rusty, now would be the time to go read up on the following
  • Structures
  • Unions
  • Bit Fields
  • Endianness
I'm going to leave an hr break here so that you can find your place again once you come back



Ok, now that you've done that, let's use comments to explain what we're doing here. We're going to leave endianness out of the mix until the very end, but keep it in mind.
I'm using the same diagram from Part 3 to make this comment, so you may want to pull that up and refresh yourself.
Code:
struct instruction_data
{
   //4 - condition
   //2 - hard code 0x0
   //1 - immediate bit
   //4 - operation code
   //1 - condition set bit
   //4 - 1st operand register
   //4 - destination register
   //12- operand2
};
In this structure, I've used comments to describe how many bits each field takes up, and what goes in them. Now I'm going to go ahead and replace all those comments with bit fields. I'll have to come back to 2 of them

Ok, my revised code looks like this. I've left the comments in place for now
Code:
struct instruction_data
{
   //4 - condition
   uint8_t     cond    : 0x4; // condition
   //2 - hard code 0x0
   uint8_t     hard    : 0x2; // set 0x0
   //1 - immediate bit
   uint8_t     imm     : 0x1; // immediate bit
   //4 - operation code
   uint8_t     opcode  : 0x4; // operation code
   //1 - condition set bit
   uint8_t     cset    : 0x1; // condition set
   //4 - 1st operand register
   uint8_t     op1     : 0x4; // 1st operand reg
   //4 - destination register
   uint8_t     dst     : 0x4; // destination reg
   //12- operand2
   //?????????
};
Notice how I just have a bunch of question marks for operand2? Yeah, that's where things get a little tricky. Operand2 actually has 2 different forms. Let's take a look
[Image: iNbunK8.png]

Ok, so for those, let's go ahead and make a new set of structures for each. Here's my comment code for those
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   //8 - shift
   //4 - register
};
struct instruction_data_operand2_immediate
{
   //4 - rotate
   //8 - unsigned immediate
};
Ok, so let's go through and fill those two out, just so we have them
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   //8 - shift
   uint8_t     shift;         // shift applied to reg
   //4 - register
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   //4 - rotate
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   //8 - unsigned immediate
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
Cool, but we still have a problem. We can't add these two structs to our instruction_data struct, because it would make it 12 bits too big. What can we do? Unions
So, you should be familiar with them by now, but here's a basic description of what they do. Unions work like structures, with one key difference. Member variables in a structure make the whole structure larger, while members in a union don't change its size at all. All members in a union share the same memory space, meaning if you change the values in one member, you change the values in all members. This is useful when you only ever need to store data in one of the members, like we do in this case. We will never need both the register and the immediate structure, and we know which one we need based on the imm bit in our instruction_data structure.
Let's go ahead and make that union. While we're at it, we can go ahead and take our planning comments out.
The updated code looks like this:
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   uint8_t     shift;         // shift applied to reg
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
I want to point out before we go any further, that I didn't use a bit field for imm in instruction_data_operand2_immedate. Before you all ask why, it's because it's an 8-bit field, and uint8_t is "unsigned 8-bit integer". A bit field would be fine here, but it would be set to 0x8, which is already how big the base type is. If you're OCD about it all, feel free to add it to your code

Now, we can go ahead and add our union to the instruction_data struct. We won't use a bit field here, because our union is exactly the right size already. Sorry to all you OCD fellas, this one won't line up either.
I'm also going to remove the planning comments.
With all that done, our entire instruction_data structure and all its dependencies are complete. Here is the full code for all of it:
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   uint8_t     shift;         // shift applied to reg
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
/* define individual instruction classes */
struct instruction_data
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x0
   uint8_t     imm     : 0x1; // immediate bit
   uint8_t     opcode  : 0x4; // operation code
   uint8_t     cset    : 0x1; // condition set
   uint8_t     op1     : 0x4; // 1st operand reg
   uint8_t     dst     : 0x4; // destination reg
   union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};



Ok, at this point, I think you've all got the hang of it, so I'm not going to hold your hand through writing the instruction_transfer and instruction_branch structures. I've included my code in a spoiler below for you to check your work.

instruction_transfer
Spoiler:
Code:
struct instruction_transfer
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x1
   uint8_t     I       : 0x1; // offset is immediate value (set 0)
   uint8_t     P       : 0x1; // pre/post index (set 1)
   uint8_t     U       : 0x1; // up/down bit (negative sign for offset)
   uint8_t     B       : 0x1; // byte/word bit (set 0 for word)
   uint8_t     W       : 0x1; // write-back register (set 0)
   uint8_t     op1     : 0x4; // source register
   uint8_t     dst     : 0x4; // destination register
   uint16_t    offset  : 0xC; // immediate/offset
};

instruction_branch
Spoiler:
Code:
struct instruction_branch
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x3; // set 0x5
   uint8_t     link    : 0x1; // link bit
   uint32_t    offset  : 0x18;// offset
};



Ok, so at this point, all we have left to do is finish our instruction union. As you suspected, this will be easy since our union is going to be made up of all 32-bit structures, we don't need to do bit fields or any of that sort of thing.
Here's the code
Code:
/* wrapper body */
union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};




Alright, I apologize that this thread has run so long, but I'm happy we got through the tough stuff
Please stop at this point, until you have completed writing your instruction_transfer and instruction_branch structures. The remainder of this thread does not place those in spoilers



Ok, well since we got all the code hammered out, let's do a little refactoring. At this point, your entire instruction.h file should look like this:
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <stdint.h>

/* operand2 for data processing */
struct instruction_data_operand2_register
{
   uint8_t     shift;         // shift applied to reg
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
/* define individual instruction classes */
struct instruction_data
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x0
   uint8_t     imm     : 0x1; // immediate bit
   uint8_t     opcode  : 0x4; // operation code
   uint8_t     cset    : 0x1; // condition set
   uint8_t     op1     : 0x4; // 1st operand reg
   uint8_t     dst     : 0x4; // destination reg
   union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};
struct instruction_transfer
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x1
   uint8_t     I       : 0x1; // offset is immediate value (set 0)
   uint8_t     P       : 0x1; // pre/post index (set 1)
   uint8_t     U       : 0x1; // up/down bit (negative sign for offset)
   uint8_t     B       : 0x1; // byte/word bit (set 0 for word)
   uint8_t     W       : 0x1; // write-back register (set 0)
   uint8_t     op1     : 0x4; // source register
   uint8_t     dst     : 0x4; // destination register
   uint16_t    offset  : 0xC; // immediate/offset
};
struct instruction_branch
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x3; // set 0x5
   uint8_t     link    : 0x1; // link bit
   uint32_t    offset  : 0x18;// offset
};
/* wrapper body */
union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};

#endif

I want to refactor it all so that it properly fits the ARM spec. This pretty much just means changing the names of the elements to either their bit codes or their proper register notation.
At the end of that refactor, using the following register definitions
Rn = source register
Rd = destination register
Rm = 2nd register
My code looks like this
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <stdint.h>

/* operand2 for data processing */
struct instruction_data_operand2_register
{
   uint8_t     shift;         // shift applied to reg
   uint8_t     Rm      : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
/* define individual instruction classes */
struct instruction_data
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x0
   uint8_t     I       : 0x1; // immediate bit
   uint8_t     opcode  : 0x4; // operation code
   uint8_t     S       : 0x1; // condition set
   uint8_t     Rn      : 0x4; // 1st operand reg
   uint8_t     Rd      : 0x4; // destination reg
   union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};
struct instruction_transfer
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x1
   uint8_t     I       : 0x1; // offset is immediate value (set 0)
   uint8_t     P       : 0x1; // pre/post index (set 1)
   uint8_t     U       : 0x1; // up/down bit (negative sign for offset)
   uint8_t     B       : 0x1; // byte/word bit (set 0 for word)
   uint8_t     W       : 0x1; // write-back register (set 0)
   uint8_t     Rn      : 0x4; // source register
   uint8_t     Rd      : 0x4; // destination register
   uint16_t    offset  : 0xC; // immediate/offset
};
struct instruction_branch
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x3; // set 0x5
   uint8_t     L       : 0x1; // link bit
   uint32_t    offset  : 0x18;// offset
};
/* wrapper body */
union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};

#endif



Now, I think this header file is a bit too long for my comfort, and for reference (63 lines of struct defs), so I want to split this apart into 4 files:
  • instruction.h
  • instruction_data.h
  • instruction_transfer.h
  • instruction_branch.h
I want to keep each file as close to it's title as possible, and limit all the mixing of components. Here are my updated files:

instruction.h
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <instruction_data.h>
#include <instruction_transfer.h>
#include <instruction_branch.h>

union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};

#endif

instruction_data.h
Code:
#ifndef INSTRUCTION_DATA_H_INCLUDED
#define INSTRUCTION_DATA_H_INCLUDED

#include <stdint.h>
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   uint8_t     shift;         // shift applied to reg
   uint8_t     Rm      : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};
union instruction_data_operand2
{
   struct instruction_data_operand2_immediate imm;
   struct instruction_data_operand2_register reg;
};
/* structure for instruction */
struct instruction_data
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x0
   uint8_t     I       : 0x1; // immediate bit
   uint8_t     opcode  : 0x4; // operation code
   uint8_t     S       : 0x1; // condition set
   uint8_t     Rn      : 0x4; // 1st operand reg
   uint8_t     Rd      : 0x4; // destination reg
   union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};

#endif

instruction_transfer.h
Code:
#ifndef INSTRUCTION_TRANSFER_H_INCLUDED
#define INSTRUCTION_TRANSFER_H_INCLUDED

#include <stdint.h>
struct instruction_transfer
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x2; // set 0x1
   uint8_t     I       : 0x1; // offset is immediate value (set 0)
   uint8_t     P       : 0x1; // pre/post index (set 1)
   uint8_t     U       : 0x1; // up/down bit (negative sign for offset)
   uint8_t     B       : 0x1; // byte/word bit (set 0 for word)
   uint8_t     W       : 0x1; // write-back register (set 0)
   uint8_t     Rn      : 0x4; // source register
   uint8_t     Rd      : 0x4; // destination register
   uint16_t    offset  : 0xC; // immediate/offset
};

#endif

instruction_branch.h
Code:
#ifndef INSTRUCTION_BRANCH_H_INCLUDED
#define INSTRUCTION_BRANCH_H_INCLUDED

#include <stdint.h>
struct instruction_branch
{
   uint8_t     cond    : 0x4; // condition
   uint8_t     hard    : 0x3; // set 0x5
   uint8_t     L       : 0x1; // link bit
   uint32_t    offset  : 0x18;// offset
};

#endif



Awesome! We've now taken the first 5 parts and made them into the foundation for our very first assembler! Let's go ahead and add our include to main.c just to wrap this up.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <instruction.h>

int main(int argc, char * const * argv)
{
   return 0;
}



One final note, if you try to build this right now, you will get an error like this
[Image: o0gGi5c.png]
Do not worry, you didn't accidentally wipe it or something, it's just not set up properly.
Open the build settings
[Image: m1cOu94.png]
go into Search directories for the whole project
[Image: duojbum.png]
Click add, then make sure it points at your project (this path will be different for all users)
[Image: vN8BAWY.png]
Now go ahead and click OK, save the project, then build it again
[Image: dNLKkrH.png]

EDIT: I've restructured this project so it's a little more organized. I moved the instruction_x.h files to a folder called instruction and renamed them to just x.h.
This only effects instruction.h. Here's updated code for that file:
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <instruction/data.h>
#include <instruction/transfer.h>
#include <instruction/branch.h>

union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};

#endif
and a screenshot of the project again:
[Image: ZwfjSQY.png]



Congrats! You've made it to the end of part 6! Things should start coming together in your mind about what's going to happen next.
PLEASE write replies to this thread! Comment on what you liked or didn't like. Ask questions. Suggest what we could do to make it better. Just have a conversation! I hate seeing these threads go stale.

Also, don't forget about the webpage that holds organized links to all of these. It's linked in my signature, but you can visit it by going to https://goo.gl/8hvoog as well!



UPDATE:
I realized that I made a mistake in Part 6 - Getting structured, dealing with the structure for handling Operand2 when using a register. Rather than explain all of the changes, I'll simply paste the new contents of instruction/data.h below. I'll also be pasting this same block in part 6 (if I can still edit it).
Code:
#ifndef INSTRUCTION_DATA_H_INCLUDED
#define INSTRUCTION_DATA_H_INCLUDED

#include <stdint.h>

/* fixes */
enum instruction_data_operand2_register_shift_type
{
    kDATA_OP2_LOGIC_LEFT      = 0x0,
    kDATA_OP2_LOGIC_RIGHT     = 0x1,
    kDATA_OP2_LOGIC_ARR_RIGHT = 0x2,
    kDATA_OP2_LOGIC_ROT_RIGHT = 0x3,
};
struct instruction_data_operand_register_shift_imm
{
    uint8_t hard0: 0x1; // set 0x0
    uint8_t type : 0x2; // see enum instruction_data_operand2_register_shift_type
    uint8_t imm  : 0x5; // shift ammount
};
struct instruction_data_operand_register_shift_reg
{
    uint8_t hard1 : 0x1; // set 0x1
    uint8_t type  : 0x2; // see enum instruction_data_operand2_register_shift_type
    uint8_t hard0 : 0x1; // set 0x0
    uint8_t reg   : 0x4; // register holding value
};
union instruction_data_operand2_register_high
{
    struct instruction_data_operand_register_shift_imm imm;
    struct instruction_data_operand_register_shift_reg reg;
};
struct instruction_data_operand2_register
{
    uint8_t Rm : 0x4; // op2 as register
    union instruction_data_operand2_register_high shift;
};
/* operand2 for data processing */
struct instruction_data_operand2_immediate
{
    uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
    uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
};
union instruction_data_operand2
{
    struct instruction_data_operand2_immediate imm;
    struct instruction_data_operand2_register reg;
};
/* structure for instruction */
struct instruction_data
{
    uint8_t     cond    : 0x4; // condition
    uint8_t     hard    : 0x2; // set 0x0
    uint8_t     I       : 0x1; // immediate bit
    uint8_t     opcode  : 0x4; // operation code
    uint8_t     S       : 0x1; // condition set
    uint8_t     Rn      : 0x4; // 1st operand reg
    uint8_t     Rd      : 0x4; // destination reg
    union instruction_data_operand2 op2; // 2nd operand (imm=0 : reg, imm=1 : imm)
};

#endif
(This post was last modified: 01-17-2018, 07:51 PM by phyrrus9. Edit Reason: Fixed issue with Operand2_Register in DP )

[+] 2 users Like phyrrus9's post
Reply

RE: CYFA - Creating Your First Assembler - Getting Structured #2
(11-26-2017, 07:32 PM)phyrrus9 Wrote:
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   //8 - shift
   uint8_t     shift   : 0x8; // shift applied to reg
   //4 - register
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   //4 - rotate
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   //8 - unsigned immediate
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};

I'm one of those OCD people, so I'd like to point out that you used a bit field for shift, even though you didn't for imm.

Nice guide, lots of code, I like it. But eww, Windows with Code::Blocks Wink2 (btw, at the top, you put Code:Blocks instead of Code::Blocks)


(11-02-2018, 02:51 AM)Skullmeat Wrote: Ok, there no real practical reason for doing this, but that's never stopped me.

Reply

RE: CYFA - Creating Your First Assembler - Getting Structured #3
(11-27-2017, 09:48 PM)Ender Wrote:
(11-26-2017, 07:32 PM)phyrrus9 Wrote:
Code:
/* operand2 for data processing */
struct instruction_data_operand2_register
{
   //8 - shift
   uint8_t     shift   : 0x8; // shift applied to reg
   //4 - register
   uint8_t     reg     : 0x4; // op2 as register
};
struct instruction_data_operand2_immediate
{
   //4 - rotate
   uint8_t     rotate  : 0x4; // rotate value (multiplied by 2 by CPU)
   //8 - unsigned immediate
   uint8_t     imm;           // 8-bit unsigned immediate value to be rotated
};

I'm one of those OCD people, so I'd like to point out that you used a bit field for shift, even though you didn't for imm.

Nice guide, lots of code, I like it. But eww, Windows with Code::Blocks Wink2 (btw, at the top, you put Code:Blocks instead of Code::Blocks)

Updated. While having the field there doesn't change the code, I did forget and was out of uniformity.

[+] 1 user Likes phyrrus9's post
Reply






Users browsing this thread: 1 Guest(s)