chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 1 Vote(s) - 5 Average


Tutorial [Algorithm] Run Length Encoding filter_list
Author
Message
[Algorithm] Run Length Encoding #1
Run length encoding (RLE) is a compression mechanism used when a file consists of many repeated bits or bytes. It is commonly used to encode graphics data.

Take for example a 4x4 all black bitmap. The data for this image would be
Code:
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00

But why would we want to waste 16 bytes if they are all the same?

a RLE interpretation of the same bitmap might look something like
Code:
2A 0C 00

see how much smaller that is? (81% smaller for those of you who don't know)...

but since this is binary data, it would probably look something like this
Code:
0C 00

which is 88% smaller than the original.

Alright, so here is how it works.

A flag character is chosen, most RLE examples use an asterisk (*) (0x2A) for this.

When a group of more than three bytes of the same value is found, that whole string is replaced by * # x where * is our flag character, # is the number of bytes (4-255), and x is the actual byte.



Seems pretty simple, let's work through one.
Code:
WWWWWWWWWWWWBWWWWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWWBBBBW

now, you could store the full 72 bytes, or we could go through it with RLE...
what we do is we read until we hit a character that is different.

We read 12 W's before we hit a B..
Code:
WWWWWWWWWWWW
becomes
Code:
*12W
or
2A 0B 57

We read 1 B before we hit a W
Code:
B
becomes
Code:
B
and our current state looks like this:
Code:
2A 0B 57 42

We read 15 W's before we hit a B
Code:
WWWWWWWWWWWWWWW
becomes
Code:
*15W
or
2A 0F 57
and our state looks like
Code:
2A 0B 57 42 2A 0F 57

We read 3 B's before we hit a W
this translates to 3 B's, since we save no space by encoding it..
our state is this:
Code:
2A 0B 57 42 2A 0F 57 42 42 42

We read 21 W's before we hit a B
this translates to
Code:
*21W
or
2A 15 57
and our current state is
Code:
2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57

We read 1 B before we hit a W,
once again, this stays a B, so our state is
Code:
2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42

We read 14 W's before we hit a B
this translates to
Code:
*14W
or
2A 0E 57
and our state is
Code:
2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42
2A 0E 57

We read 4 B's before we hit a W
this translates to
Code:
*4B
or
2A 04 42
and our state is
Code:
2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42
2A 0E 57 2A 04 42

We read one W until we hit the end of the file
this stays as a W, and our final state is
Code:
2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42
2A 0E 57 2A 04 42 57



After run length encoding the file, we wound up with 21 bytes, saving 51 bytes for a compression ratio of 29% (71% deflated).



but what if our file uses all 256 possible characters and we can't pick a flag character?

there are two ways to tackle this problem.

1: RLE everything in the file, even if it is only a single byte (meaning we are adding 2 extra bytes for each single byte)
2: pick a flag character anyways, and when that flag character appears in the input file, encode it anyways. Example:

flag character: *
text:
Code:
aaaa****b*
would RLE to
Code:
*4a*4*b*1*
or
2A 04 61 2A 04 2A 62 2A 01 2A
in this example, our compression ratio is 100% (0% deflation), so at least we didn't gain any extra bytes. for a larger example like
Code:
aaaa***************************************************************************************************************************************************************************************************************************************************************bbbbb**a*
it would rle to
Code:
*4a*255**5ba*1*
or
2A 04 61 2A FF 2A 2A 05 62 61 2A 01 2A
which is FAR smaller.



decoding

Decoding is simple...take the following example:
Code:
abc*5hg*4b
we simply go through the input character by character, if the character is not our flag, then we do nothing, and output it.
Code:
a: output a
b: output b
c: output c
*: N is our next character
    5: C is our next character
    g: output 5 g's (note N and C are variables here)
h: output h
g: output g
*: N is our next character
    4: C is our next character
    b: output 4 b's
and the output is
Code:
abcggggghgbbbb



that is probably the simplest form of compression out there (and probably the worst general purpose one), but it works great for encoding things like graphics.

[+] 1 user Likes phyrrus9's post
Reply

RE: [Algorithm] Run Length Encoding #2
I actually read your tutorials... Even though I know almost nothing about coding.

Reply

RE: [Algorithm] Run Length Encoding #3
(06-25-2015, 07:28 AM)Losi Wrote: I actually read your tutorials... Even though I know almost nothing about coding.

I have a pretty wide range of knowledge....what is a topic you would like to see me make a tutorial on?

Reply

RE: [Algorithm] Run Length Encoding #4
(06-25-2015, 07:46 AM)phyrrus9 Wrote: I have a pretty wide range of knowledge....what is a topic you would like to see me make a tutorial on?

Assembly (I'm using NASM) and C. I'm trying to learn both.

Reply

RE: [Algorithm] Run Length Encoding #5
Great tut @phyrrus9. I just wish I knew enough to fully understand it Biggrin

Reply






Users browsing this thread: 1 Guest(s)