Blogs

New idea for content compression

By Sami Abu Shawarib posted 12-16-2011 19:00

  

What new ideas can be suggested to reduce content storage?

Many ideas might be explored and some of them are being used, the problem most of them are not published and not registered. I remember when we were working on content agent we made mix of LZW compression and tree compression to enhance the outcome of the LZW compression.

Another idea tested and give good results but never been used is to replace words by codes then use the pkzip compression which give excellent results when you are dealing with numbers only. The excellence of this idea was when using hex representation then in one word you can use up to 4G of words to be represented by codes so all you need to build your own dictionary to map each word with code "number".

How this process works?

for test select any content consist of A4 page which contains average of 250 words per page, when you use pkzip it gives around 50 words compression, now for testing purpose only pass the 250 word to parser it will give you 250 code, run this into pkzip it will yield 5-10 words compression which gain 5 times compression.

I wander if these ideas was shared and made public to all genius math algorithms, with today’s power processers which give results in no time which saves a lot of storage spaces!

Same idea will be perfect for JPG images which yield bad compression because the JPG is already compressed so we should have public search on these areas to benefit our world and reduce storage use.   

 



#ECM
0 comments
2 views

Permalink