chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 1 Vote(s) - 5 Average


Tutorial CYFA - Creating Your First Assembler - Getting Help filter_list
Author
Message
CYFA - Creating Your First Assembler - Getting Help #1
Ok, so in the last part we built structures to hold and encode our instructions, but we still have a little left to do with them. In this part, we're going to take care of all the helpers we're going to need to wrap up our assembler's encoding mechanism. I've listed them below
  • initializers for the individual structures
  • hidden functions to detect endianness and perform conversion if necessary
  • function to convert our structure to a 32-bit integer
  • enumeration for condition codes

Alright, now this is a pretty big list, so I won't be walking you through it step by step but rather I'll explain the first step and then just complete the rest for you. You're welcome (and encouraged) to try this for yourself. I'm going to break this down into sections based on the above list. Enjoy, and don't forget to discuss this at the end.



1. Initializers
Ok, so for this one, we're going to initialize to a default state depending on the instruction we choose. Some instructions have hardcoded values, others we will just want to init to a valid state so we can reduce our code in processing. Let's start off by making x_init functions for each structure and making an enum to hold each type. Your new instruction.h file should look like this:
Code:
#ifndef INSTRUCTION_H_INCLUDED
#define INSTRUCTION_H_INCLUDED

#include <instruction/data.h>
#include <instruction/transfer.h>
#include <instruction/branch.h>

union instruction
{
   struct instruction_data         DP;     // data processing
   struct instruction_transfer     DT;     // single data transfer
   struct instruction_branch       BR;     // branch
};

enum instruction_type
{
   kINSTRUCTION_DATA,
   kINSTRUCTION_TRANSFER,
   kINSTRUCTION_BRANCH,
   kINSTRUCTION_UNDEF
};

void instruction_init(union instruction *, enum instruction_type);

#endif
I hope that's pretty clear as to how it works. We don't need to set enum values because these won't be getting combined, and we add an UNDEF enum so that we can process errors as time goes on. Now, we can go ahead and create .c files for each of the instruction headers. The contents of these C files is shown below, as well as a project screenshot to show their organization
[Image: fJrsrYj.png]
instruction.c
Code:
#include <instruction.h>

void instruction_init_data(struct instruction_data *);
void instruction_init_transfer(struct instruction_transfer *);
void instruction_init_branch(struct instruction_branch *);

void instruction_init(union instruction *encoder, enum instruction_type type)
{

}
instruction/data.c
Code:
#include <instruction/data.h>

void instruction_init_data(struct instruction_data *encode)
{
   encode->hard = 0x0;
}
instruction/transfer.c
Code:
#include <instruction/transfer.h>

void instruction_init_transfer(struct instruction_transfer *encode)
{
   encode->hard = 0x1; // hard code to 0b1 per arm spec
   /* set bits for this assembler's spec */
   encode->I = 0x0;
   encode->P = 0x1;
   encode->B = 0x0;
   encode->W = 0x1;
}
instruction/branch.c
Code:
#include <instruction/branch.h>

void instruction_init_branch(struct instruction_branch *encode)
{
   encode->hard = 0x5; // hard code 0b101 per ARM spec
}
Ok, now we can go ahead and finish the main initializer function. This one takes in 2 arguments and will initialize the encoding union with the correct values. Let's go ahead and write that code using a basic switch. The instruction.c file should now look like this
Code:
#include <instruction.h>

void instruction_init_data(struct instruction_data *);
void instruction_init_transfer(struct instruction_transfer *);
void instruction_init_branch(struct instruction_branch *);

void instruction_init(union instruction *encoder, enum instruction_type type)
{
   switch (type)
   {
   case kINSTRUCTION_DATA:
       instruction_init_data(&encoder->DP);
       break;
   case kINSTRUCTION_TRANSFER:
       instruction_init_transfer(&encoder->DT);
       break;
   case kINSTRUCTION_BRANCH:
       instruction_init_branch(&encoder->BR);
       break;
   default: // will allow our undef to go through
       break;
   }
}

Now, before we move on, I want to point out why I didn't put the instruction_init_x functions in their respective header files. This comes down to scoping, these functions should only be called by the primary init, so there's no need for them to be in a header file. By putting them here, we guarantee that the signature won't be exported by the program, and thus won't clutter up the namespace of future code. This is how we handled namespaces before C++ came out, if you don't need it, don't include it.



2. hidden functions to detect endianness and perform conversion if necessary
Ok, so if we've filled in this structure and then realize we're on a big endian system (they exist), then we have a problem because by default ARM is little endian. We need to make a function that detects that and compensates accordingly. The good thing is, all we have to do is swap the byte order, so we'll be using the network functions htonl and ntohl. We'll do it pretty simply. There are compiler macros to detect this, but we won't be using them, just for the sake of cross compatibility. Let's go ahead and add this signature and a bare function to our instruction.c file.
Now, the includes for this are a little weird, we have to use 2 different files: arpa/inet.h for linux, BSD, and osx and Winsock2.h for Windows systems. We'll be using a header switch for that. Our new instruction.c file should look like this:
Code:
#include <stdint.h>
#ifdef _WIN32
#include <Winsock2.h>
#else
#include <arpa/inet.h>
#endif
#include <instruction.h>

/* signatures not defined in this file */
void instruction_init_data(struct instruction_data *);
void instruction_init_transfer(struct instruction_transfer *);
void instruction_init_branch(struct instruction_branch *);
/* signatures for this file (hidden) */
uint32_t instruction_endian(uint32_t);

void instruction_init(union instruction *encoder, enum instruction_type type)
{
   switch (type)
   {
   case kINSTRUCTION_DATA:
       instruction_init_data(&encoder->DP);
       break;
   case kINSTRUCTION_TRANSFER:
       instruction_init_transfer(&encoder->DT);
       break;
   case kINSTRUCTION_BRANCH:
       instruction_init_branch(&encoder->BR);
       break;
   default: // will allow our undef to go through
       break;
   }
}

uint32_t instruction_endian(uint32_t encode)
{
   if (htonl(encode) == encode) // system is big endian
       return ntohl(encode);
   return encode; // system is little endian
}
Now, the standard questions: why did you put the function signature inside the .c file?
Well, that's because this is a hidden function. We could even go so far as to undef it at the end of this, but we don't need to. The point is that this function isn't intended to be called outside of instruction.c, so we aren't going to export its symbol. This is how we keep our namespace clean.
Second question: what do htonl and ntohl do?
These function names are actually acronyms for Host To Network Long and Network To Host Long. Let's define some of those terms for you. A long (in the original C networking spec) is a 32-bit unsigned integer (uint32_t), and the network order is big endian. What we're doing with this function is looking for change. If we convert our input to the network order (big endian), and it doesn't change, then we know it was already a big endian value and we need to convert it to little endian. If it does change, then we know it was already little endian and we don't need any conversion.
Third question: why won't it build?
If you try to build this library, you should get something similar to the following:
[Image: Fp7E5QC.png]
Relax, the code is correct. This actually comes down to why we did it this way. We're using a shared library to handle the conversion. This makes our code more platform independent. If we had used macros, then the code would only work for the exact system it was compiled on, but if we use a shared library, it will work on any system that has that library. This means that every system could have a different version of the library, and would always correlate to the endianness of the system it's being run on. We need to link that library. Go ahead and go to the build options and select as.
Go into Linker settings
[Image: yhjh09U.png]
If you're using Windows, you will want to add ws2_32 and if you're on linux you will want to add -lsocket to your build flags for this file
[Image: X2WYBJx.png]
Go ahead and exit that dialog and save the project.
[Image: u4xdPSd.png]
Congrats! We're done with section 2 of this installment.



3. function to convert our structure to a 32-bit integer
Ok, so once we've populated our instruction union, we need something that converts it into a form that it's easy to write out. We know that all instructions are 32-bits long, so we can stick with our trusty uint32_t. Unfortunately, we don't have a very good conversion for this, so we'll have to make a hidden union for it. Our conversion function also has to take endianness into account, good thing we wrote that function a little while ago. Let's start with the hidden union.
We're going to define this in instruction.c, and it's task is going to be converting our original union into type uint32_t. This union should have two members then, our source union and our destination integer.
Code:
union instruction_encoder
{
   union instruction source;
   uint32_t destination;
};
Ok, that was easy, let's go ahead and make the signature for this function. This is a function that's intended to be called by our main routine, so we'll want to export this to a higher namespace by placing the signature in instruction.h. The signature for this will be the following
Code:
uint32_t instruction_encode(union instruction *);
Why did we use a pointer you ask? We did that because this function is intended to be called outside of our scope or control. We don't want to make guarantees to the main program that we won't modify this, and we want to follow C spec that all abstract data types are passed as reference.
Let's go back to instruction.c and write this function.
Code:
uint32_t instruction_encode(union instruction *encode)
{
   union instruction_encoder encoder;
   encoder.source = *encode; // make a copy for translation
   return instruction_endian(encoder.destination); // convert endians and return our integer
}
This is a pretty simple function, but it actually does a lot of work. This function takes our input instruction structure (with all of its flags), converts it to the type of uint32_t (to make it easier to write out), and rearranges it to fit the proper endian for ARM execution.
This part was pretty small, because we laid all of the groundwork for it in the previous sections and parts. This is the very reason it's so important to plan your projects out before you start them. Since we knew exactly how the pieces were supposed to fall together, we were able to design it down to the line to make the whole program much shorter.



4. enumeration for condition codes
Now, this one will be the easiest, however it will be the most tedious. In instruction.h, we're going to make an enumeration that contains all of the instruction condition codes so that it will be easier to apply these to both parsing, and into the instruction structure. You can find them by using the following table.
[Image: iPjcqlY.png]
I'll write the code and put it below, but at least read through the table
Code:
enum instruction_condition
{
   kCONDITION_EQ,
   kCONDITION_NE,
   kCONDITION_CS,
   kCONDITION_CC,
   kCONDITION_MI,
   kCONDITION_PL,
   kCONDITION_VS,
   kCONDITION_VC,
   kCONDITION_HI,
   kCONDITION_LS,
   kCONDITION_GE,
   kCONDITION_LT,
   kCONDITION_GT,
   kCONDITION_LE,
   kCONDITION_AL,
   kCONDITION_NV
};
Congrats! We've completed part 7!
Now, I included all of the condition codes, even though we aren't going to worry about many of them. I did this so that I would not have to number the enum. Another question you may ask is why I prefixed them all with kCONDITION. I did this because enumerations are always in the global namespace, so we need a unique prefix for them. It's standard procedure to prefix the prefix with lowercase k. Interestingly enough, this actually started at Apple.



In the next part, we will start to look at parsing. I'm hoping this series will be no longer than 10 parts, so the last couple will likely be long reads. At the end of the series, I will make a post (in this section) containing the full source of this project for your reference.

After careful deliberation, I've decided to release this today rather than next week. Merry Christmas.

EDIT: In order to help generate activity both in the programming section and on these threads, I will add a giveaway to this. The drawing will happen once 15 replies have been posted, and the prize will either be 25NSP or $10 paypal. The drawing will be at random (due to rule violation concerns).
Rules:
1. Your post must contribute in some way either to the programming section as a whole or this thread/series
(this means that short posts similar to "count me in" will not be counted. The post does not have to be long, but needs to be something of value. The idea is to help SL grow its programmer base)
2. Your posts must obey all SL rules
(This post was last modified: 11-29-2017, 01:09 AM by phyrrus9.)

[+] 1 user Likes phyrrus9's post
Reply

RE: CYFA - Creating Your First Assembler - Getting Help #2
I'm going to bump this. The contest has been OK'd by staff, so let's hear what you have to say. Every qualifying post you make increases your chances of winning.

Reply

RE: CYFA - Creating Your First Assembler - Getting Help #3
I really hate to do this, bumping this once more, since more people are active due to the christmas giveaway.

Reply

RE: CYFA - Creating Your First Assembler - Getting Help #4
"Relax, the code is correct. This actually comes down to why we did it this way. We're using a shared library to handle the conversion. This makes our code more platform independent. If we had used macros, then the code would only work for the exact system it was compiled on, but if we use a shared library, it will work on any system that has that library. This means that every system could have a different version of the library, and would always correlate to the endianness of the system it's being run on. We need to link that library. Go ahead and go to the build options and select as."

This is the most smooth, relaxing paragraph I have ever seen written about C code. I have no idea why it has that effect, it sends chills down my spine.

That was a lot of code, looking forward to the next part!


(11-02-2018, 02:51 AM)Skullmeat Wrote: Ok, there no real practical reason for doing this, but that's never stopped me.

Reply

RE: CYFA - Creating Your First Assembler - Getting Help #5
pretty much straightforward.

Reply

RE: CYFA - Creating Your First Assembler - Getting Help #6
(12-01-2017, 10:59 PM)Ender Wrote: "Relax, the code is correct. This actually comes down to why we did it this way. We're using a shared library to handle the conversion. This makes our code more platform independent. If we had used macros, then the code would only work for the exact system it was compiled on, but if we use a shared library, it will work on any system that has that library. This means that every system could have a different version of the library, and would always correlate to the endianness of the system it's being run on. We need to link that library. Go ahead and go to the build options and select as."

This is the most smooth, relaxing paragraph I have ever seen written about C code. I have no idea why it has that effect, it sends chills down my spine.

That was a lot of code, looking forward to the next part!

I had to write it like that because I was afraid of trolls running around telling me I could've just done
Code:
#if (!*(unsigned char *)&(uint16_t){1})
encode = htonl(encode);
#endif
Which actually wouldn't work for our case. Using shared libraries defeats this issue, since the code is dependent on the host, not our program. The only thing worse than non-optimal code is trolls who think your code is non-optimal

[+] 1 user Likes phyrrus9's post
Reply






Users browsing this thread: 1 Guest(s)