Run length Encoding also known as RLE is probably the most basic encoding method that's being used in different fields of computing like, Multimedia Formats etc. In this tutorial we'll be looking at how RLE works and a basic python example which implements RLE. How does it works Run length encoding works best on the files with lots redundancy, example an Image with lots of whitespace, like a scanned document etc. Run length encoding can be very effective in such files, and can reduce the size of such files to around half of the original size. So, to understand how it works lets take an example :- Original Data : 0x00 0x00 0x01 0x01 0x01 0x01 0x01 Encoded Data : 0x02 0x00 0x06 0x01 Original data has a sequence of bytes which have a lot of repetition, 0x00 has been repeated 2 times and 0x01 has been repeated 6 times, Now its quite simple to understand the encoded data. It follows the following format :- Code: [RUN] [BYTE] [RUN2] [BYTE2] ... So, in the example which we discussed, 0x00 has been repeated 2 times, in this case RUN = 2 and BYTE = 0x00, same is with 0x01. Now that we understand How this algorithm works, why not write a basic python script to do some RLE. Code: #!/bin/py def rle(data) : run = 1 encoded = [] for i in range(1, len(data)) : if data[i] == data[i - 1] : run = run + 1 else : encoded.append((run , data[i - 1])) run = 1 if i == len(data) - 1 : encoded.append((run , data[i])) return encoded # Test :- print rle([ 'A', 'A', 'B', 'B', 'C', 'C' ]) Exercise : Now that we understand how the algoritm works lets test our knowledge by solving some basic exercises. ABCDEFGHOOPPPP Go4expert Aneesh Shabbir You can check your answers by using the python script above. Conclusion Run Length Encoding can be used to encode any type of data losslessly but its beneficial with only a specific type of data which involves a fair amount of redundancy, in some other casses it can even be expensive to use and could cost more bytes than the original data. Example: lets try to encode 'go4expert' using RLE. Here is the modified python script to do that :- Code: #!/bin/py def rle(data) : run = 1 encoded = [] for i in range(1, len(data)) : if data[i] == data[i - 1] : run = run + 1 else : encoded.append((run , data[i - 1])) run = 1 if i == len(data) - 1 : encoded.append((run , data[i])) return encoded # Test :- print rle([ 'g','o','4','e','p','e','r','t' ]) Output :- Code: [(1, 'g'), (1, 'o'), (1, '4'), (1, 'e'), (1, 'p'), (1, 'e'), (1, 'r'), (1, 't')] As we can see the original data costs us 8 bytes but the encoded data actually costs double, i.e 16 bytes.
Note: In the above Python code , '[' has been replaced by [ (I don't know the reason), Please replace that before using that code.
In programming language, Many times, it required to get all same types of data or any other details in arranged format, So that we can work perfectly and also user can easily understand it. The Data encoding is really very useful. The given detail with example is really appreciated.