Tutorial [Algorithm] Run Length Encoding filter_list Author Message
[Algorithm] Run Length Encoding #1
Run length encoding (RLE) is a compression mechanism used when a file consists of many repeated bits or bytes. It is commonly used to encode graphics data.

Take for example a 4x4 all black bitmap. The data for this image would be
Code:
```00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00```

But why would we want to waste 16 bytes if they are all the same?

a RLE interpretation of the same bitmap might look something like
Code:
`2A 0C 00`

see how much smaller that is? (81% smaller for those of you who don't know)...

but since this is binary data, it would probably look something like this
Code:
`0C 00`

which is 88% smaller than the original.

Alright, so here is how it works.

A flag character is chosen, most RLE examples use an asterisk (*) (0x2A) for this.

When a group of more than three bytes of the same value is found, that whole string is replaced by * # x where * is our flag character, # is the number of bytes (4-255), and x is the actual byte.

Seems pretty simple, let's work through one.
Code:
`WWWWWWWWWWWWBWWWWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWWBBBBW`

now, you could store the full 72 bytes, or we could go through it with RLE...
what we do is we read until we hit a character that is different.

We read 12 W's before we hit a B..
Code:
`WWWWWWWWWWWW`
becomes
Code:
```*12W or 2A 0B 57```

We read 1 B before we hit a W
Code:
`B`
becomes
Code:
`B`
and our current state looks like this:
Code:
`2A 0B 57 42`

We read 15 W's before we hit a B
Code:
`WWWWWWWWWWWWWWW`
becomes
Code:
```*15W or 2A 0F 57```
and our state looks like
Code:
`2A 0B 57 42 2A 0F 57`

We read 3 B's before we hit a W
this translates to 3 B's, since we save no space by encoding it..
our state is this:
Code:
`2A 0B 57 42 2A 0F 57 42 42 42`

We read 21 W's before we hit a B
this translates to
Code:
```*21W or 2A 15 57```
and our current state is
Code:
`2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57`

We read 1 B before we hit a W,
once again, this stays a B, so our state is
Code:
`2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42`

We read 14 W's before we hit a B
this translates to
Code:
```*14W or 2A 0E 57```
and our state is
Code:
```2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42 2A 0E 57```

We read 4 B's before we hit a W
this translates to
Code:
```*4B or 2A 04 42```
and our state is
Code:
```2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42 2A 0E 57 2A 04 42```

We read one W until we hit the end of the file
this stays as a W, and our final state is
Code:
```2A 0B 57 42 2A 0F 57 42 42 42 2A 15 57 42 2A 0E 57 2A 04 42 57```

After run length encoding the file, we wound up with 21 bytes, saving 51 bytes for a compression ratio of 29% (71% deflated).

but what if our file uses all 256 possible characters and we can't pick a flag character?

there are two ways to tackle this problem.

1: RLE everything in the file, even if it is only a single byte (meaning we are adding 2 extra bytes for each single byte)
2: pick a flag character anyways, and when that flag character appears in the input file, encode it anyways. Example:

flag character: *
text:
Code:
`aaaa****b*`
would RLE to
Code:
```*4a*4*b*1* or 2A 04 61 2A 04 2A 62 2A 01 2A```
in this example, our compression ratio is 100% (0% deflation), so at least we didn't gain any extra bytes. for a larger example like
Code:
`aaaa***************************************************************************************************************************************************************************************************************************************************************bbbbb**a*`
it would rle to
Code:
```*4a*255**5ba*1* or 2A 04 61 2A FF 2A 2A 05 62 61 2A 01 2A```
which is FAR smaller.

decoding

Decoding is simple...take the following example:
Code:
`abc*5hg*4b`
we simply go through the input character by character, if the character is not our flag, then we do nothing, and output it.
Code:
```a: output a b: output b c: output c *: N is our next character     5: C is our next character     g: output 5 g's (note N and C are variables here) h: output h g: output g *: N is our next character     4: C is our next character     b: output 4 b's```
and the output is
Code:
`abcggggghgbbbb`

that is probably the simplest form of compression out there (and probably the worst general purpose one), but it works great for encoding things like graphics.

1 user Likes phyrrus9's post
RE: [Algorithm] Run Length Encoding #2

RE: [Algorithm] Run Length Encoding #3
(06-25-2015, 07:28 AM)Losi Wrote: I actually read your tutorials... Even though I know almost nothing about coding.

I have a pretty wide range of knowledge....what is a topic you would like to see me make a tutorial on?

RE: [Algorithm] Run Length Encoding #4
(06-25-2015, 07:46 AM)phyrrus9 Wrote: I have a pretty wide range of knowledge....what is a topic you would like to see me make a tutorial on?

Assembly (I'm using NASM) and C. I'm trying to learn both.

RE: [Algorithm] Run Length Encoding #5
Great tut @phyrrus9. I just wish I knew enough to fully understand it

Users browsing this thread: 1 Guest(s)