Sunrast's Implementation of Run length Encoding

lionaneesh's Avatar author of Sunrast's Implementation of Run length Encoding
This is an article on Sunrast's Implementation of Run length Encoding in Python.
Sun Rasterfile also known as Sunrast is an image format developed by Sun Microsystems, It uses a version of Run Length Encoding to compress images.

In this tutorial we'll be discussing how Sun Rasterfile's implementation of RLE is different from the basic vanila version and how is it used to reduce the size of images. The format implements a Run length encoding trigger, which is used to reduce the wastage of bytes in case of a non redudant data.
0x80 (RLE Trigger) is a trigger to read a runlength for the next value, otherwise the bytes are written unchanged (as in original).

Note: This is a continuation of my previous article on Run Length Encoding

The Working

The sunrast format uses RLE in the following format :-

Where “RLE TRIGGER” is equal to 0x80, (which is predefined in the format's specification)
RUN is the run length (i.e length of repetition)
and Byte is the value to be repeated.

Lets look at some examples and try to understand How Sunrast encodes them :-

Original : [0x11 0x22 0x33]
Encoded : [0x11 0x22 0x33]

Here there is no difference between the original and the encoded data because the data is non redudant.

Original : [0x11 0x11 0x11]
Encoded : [0x80 0x02 0x11]
  • 0x80 (RLE Trigger)
  • 0x02 (Runlength – 1)
  • 0x11 (Value)

Original : [0x80 0x11 0x11]
Encoded : [0x80 0x00 0x02 0x11]

This example is a bit tricky, this involves [0x80] to be encoded to [0x80 0x00].

Q: Why is that?
A: Basically its a way to indicate that the byte value is same as the RLE trigger and without RLE. Its just a way sunrast handles it and its predefined in the format's specifications.

Original : [0x80 0x80 0x80]
Encoded : [0x80 0x03 0x80]

What so ever be the byte value, if its being repeated, then the format follows the normal RLE format i.e [RLE TRIGGER] [RUN – 1] [BYTE].

Libav already has a working Sunrast Encoder (which is written by me )

Exercise :

Now that we understand the Sunrast's version of RLE lets try to solve some examples to check our knowledge :-
  • 0x01 0x80 0x00 0x11 0x12 0x12 0x12
  • 0x01 0x80 0x80 0x80 0x80 0x80 0x80
  • 0x00 0x00 0x00 0x00 0x00 0x00 0x00
  • 0x01 0x02 0x03 0x04 0x05 0x06 0x07

Here's the Python Code to Check your answers :-

Code: python
# Sunrast's Version of RLE

def rle(data) :
    run = 1
    encoded = []
    for i in range(1, len(data)) :
        if data[i] == data[i - 1] :
            run = run + 1
        else :
            if run == 1 : # no RLE
                if data[i - 1] == '0x80' :
                    encoded.append(('0x80', '0x00'))
                else :
            else :
                encoded.append(('0x80', run - 1 , data[i - 1]))
                run = 1
        if i == len(data) - 1 :
            if run == 1 and data[i] != '0x80':
            elif run == 1 and data[i] == '0x80':
                encoded.append(('0x80', '0x00'))
            else :
                encoded.append(('0x80', run - 1 , data[i]))
    return encoded

# Test :-
print rle(['0x01', '0x80', '0x11', '0x12', '0x12', '0x12', '0x80'])


Sunrast's implementation of RLE reduces the wastage of bytes in case of non redudant data, but some times it can be a little expensive if a file contains a lot of 0x80 bytes at different places, encoding a 0x80 byte (non-redudant) would take 2 bytes [0x80 0x00] which will cost more bytes than the original data. The worst case i.e a file with alternate 0x80 bytes can cost double the bytes as used in the original data.