Frequently Asked Questions


    Q1. What do you mean by recursive compression?

    As the literal meaning of the words suggest, it is the compression of a piece of binary data in a repeated manner, by applying the same method again and again.

     

    Q2. How could a file be compressed repeatedly by means of the same method?

    This is due to the capability of our technology to compress random data. Once a piece of random binary data has been compressed by our technology, the output is still random binary data; therefore the method used to compress the original file can be used to compress the compressed files, over and over again.

     

    Q3. What do you mean by random binary data? Does it mean data randomly produced for the sake of being random?

    No. By random binary data, it means any binary data at all, without identification of its source, including randomly produced data mentioned in the question; hence it is also known as data from discreet source.

     

    Q4. Why other compression technologies cannot compress random data and thus become recursive?

    Almost all existing compression technologies target at data from a specific known source, such as alpha-numeric, graphic or sonic. Apart from the long headers and trailers that have become part and partial of such technologies, they are only able to create signals within a specific type of data, and not outside it; and signals are essential tools for all compression technologies. For example, with alpha-numeric data, those strings of characters that do not exist in meaningful texts can be used as signals, such as ^%$##@Z/. However, once the file has been compressed, those same signals cannot be used again, now that the compressed file is made up of random data; and whatever signals are used now will become part of the random data. Without valid signals, no compression technologies can work.

     

    Q5. How does your technology overcome the problems of creating and using signals, since whatever signals your technology uses will become part of the random data?

    This is getting close to trade secrets. It suffices here to say that the argument holds true if the data to be compressed is dealt with in a linear manner, and that our technology does not work in a linear manner, but, rather, in three dimensions. For instance, by separating the data at the compressed level into control section and subordinate section (two dimensions here), the signals can exist in the control section without becoming part of the random data, which occupies the subordinate section. At this stage, we are not prepared to explain what the third dimension is.

     

    Q6. How are the two sections, the control and the subordinate, separated? If a dividing barrier is used for this purpose, how long would the barrier be?

    There may or may not be a dividing barrier, depending on how the data is processed. No barrier is needed, if the two sections are processed in opposite directions, say, the control section from left to right and the subordinate from right to left, with the two meeting in the middle at the end of processing. If a barrier is used to separate the two, its length would be very short indeed, no longer than twenty bits, which would pale into insignificance when compared with the long headers and trailers used by most compression technologies. This technology has virtually no header or trailer, except for a few bytes, used for such purposes as counting the number of passes, etc. For this reason, it can compress relatively short files which other technologies with long overheads cannot. If a very short file is zipped, it will most certainly end up longer, because the compression cannot cover the overheads.

     

    Q7. It is well known among compression experts and mathematicians that what stands in the way of the compression of random binary data is the entropy problem. How does this technology overcome it?

    The entropy theory states that the encoding of a binary string cannot be shorter than its entropy. In lay terms, eight bytes of data need eight bytes to encode, if one single bit is missing, half the information will be lost:

    2^8 = 256, whereas 1^7 = 128. Our technology overcomes this impasse by means of flexible compression ratios resulting from compressing binary sections of equal length. In answer to the above mathematic objection, our theory states that the quantity of values represented by a certain number of bits equal the aggregate quantity of values represented by all the smaller number of bits, less two (values). For example, four bits represent 2^4 = 16 values, whereas three bits plus two bits plus one bit represent 2^3 + 2^2 + 2^1 = 8 + 4 + 2 = 14 values = 16 values minus 2 values. It holds true for eight bits, it holds true for eight hundred bits, and for that matter it holds true for eight billion bits. Therefore, if a piece of data is divided to equal sections of , say, eight bits, and if some of these sections are compressed to seven bits, some to six bits and so on, then 254/256 of the total values of the data, can be represented by the sum total of all the shorter sections. If some of the sections can even remain the same length, then no value will be lost at all. Thus lossless recursive compression is achieved despite the entropy constraints.

     

    Q8. Have you heard about the counting check? And what is your answer to that?

    The counter check uses the analogy of how can one fit sixteen pigeons into fifteen holes; if two pigeons are put into the same hole, then they are no longer unique. This is a rather simplistic view. Compression technologies relying on pattern repetition already put two or more identical pigeons in the same hole. The number of pigeons and holes are even less applicable to our technology, in which holes are of unequal size, some are big enough to accommodate even non-identical pigeons, analogically speaking.

     

    Q9. If equal sections of binary data can be compressed to sections of unequal lengths, how can the compressed data be decoded back to the original data?

    Obviously, in a linear approach this would be impossible, because, unless there is a marker between every two sections of different lengths, there is no way the computer knows how long a section is compressed from a particular section of the original data, so that it can decode it; the trouble with markers is that the expansion they cause would be a lot longer than the compression achieved. That is why anyone who tried to compress random data ended up making it longer, that is until now. There will be no such problems if all sections of equal length, say, eight bits are compressed to equal shorter sections, say, seven bits, because the computer can simply decode seven bits at a time and restore the original eight-bit sections. But unfortunately, if this method is used, half of the information will be lost, as explained in the answer to question (5). Our technology has achieved the feat of compressing sections of equal length in the original data to unequal shorter sections in the compressed data without the need for markers. This is made possible by using the signals in the control section to indicate how many bit to process at any particular time.

     

    Q10. If two files are compressed with the Recursive Compression Technology, even if only one bit is compressed after each pass, eventually they could both be reduced to a single 0, or, or that matter, a single 1. How could this same bit be decoded back to two different files?

    This will never happen, because after a file has gone through a large number of passes, it will first get into a phase of diminishing returns, and eventually reaches the optimum, when the effects of further passes would become negligible. Furthermore, even in the event two files become exactly the same at their optimum by sheer coincidence, the decompressed files will surely be different, because the two files would have gone through different numbers of passes to reach the same optimum.

    As for the doubting Thomas' spurious critique about Shannon's theorem, entropy and all that jazz, and in particular the smart Alec argument that, if two different files can both be compressed recursively to a single 0, how one can decode a single 0 back to two different files. This simply doesn't even hold water and I have already refuted it in our website in a simplistic example: our technology can compress a number of 0s to half the number at each pass, therefore in decoding, a single 0 will be decoded back to two 0s at the first pass of decoding, then four 0s at the next pass and so on. Therefore, if one file has gone through four passes to become a single 0 and another file has gone through five passes to become the same , by decoding the single 0 four times, it will be restored to the original file of 16 0s and the other five times to its original file of 32 0s, respectively, thus you have two completely different files decoded from the same single 0! Of course this is a very simplistic example. I can give you a slightly more complex example. As our technology compresses by replacing bit strings with a shorter signal, and the shortest replaceable string is, say, 9 bits. Then if many files can be compressed recursively to become a 8 bit file, there will be 256 different compressed files, and each can be decoded through different number of passes, resulting in an infinite number of different files. Furthermore, not all string over 8 bits can be replaced by a signal, therefore files can reach eventually a string of 9, 10, 11, 12 bits, etc, which cannot be replaced by a signal; if we apply the same principle to them as with the 8-bit irreplaceable strings, the number of files that can be restored from these last-pass strings would be astronomical! That is what I mean by an optimal point, when a compressed file cannot be compressed any further, as it can no longer be replaced by a signal.

     

    Q11. How is the optimum reached?

    The answer to this question has already been partially answered in the answer to Q10. According to underlying principle of unequal compression ratio, sections of equal number of bits in a pre-compression file (not necessary the original file) can be compressed into sections of unequal number of bits (see answer to question 6). However, some sections in the compressed file may contain only one bit less or even no less than their pre-compression counter parts. When an overwhelming majority of sections in the compressed level belong to this category, the recursive compression process is deemed to have reached the optimum.

     

    Q12. Are there files that cannot be compressed with this technology?

    Yes, only when a file is specially design for that particular purpose of being recursive-proof, for instance, a file deliberately made up predominantly of sections that cannot be compressed or ineffectively compressed: a recursively compressed file already at the optimum would be a case in point. However, such files are unlikely to contain any meaning or sense, and in the real world few people would be fool-hardy enough to engage themselves in such an exercise in futility.

     

    Q13. Since the technology can be applied to any random data, universally, will it show different results and efficiencies with different files?

    It goes without say that this would be the case. If the number of highly compressible sections outstrip the less compressible or uncompressible sections, the result will show a relatively low compression ratio; and vice versa. See (9) and (10).

     

    Q14. Given the fact that different files will return different results when compressed with this technology, what is the average compression ratio achieved by this academic version? And how many passes are needed on average for a file to be compressed by 90%?

    From our extensive tests, the worst, or plateau, result shows a .4% reduction in file size after each pass; and the best over 1.25%. Therefore it is a modest assessment to say that the average percentage of deduction should be above .6%. At that rate, it takes less than 400 passes for this academic version to compress a file down to 10% of its original size.

     

    Q15. If a file has to go through many passes to achieve a substantial compression ratio by means of this technology, would it not take a long time to decode?

    Obviously the more passes a file goes through, the longer time it would take to decode. It is a matter of trade-off between compression ratio and decoding time. By going through a large number of passes, a file can be compressed to a trickle, which would be particularly useful where decoding time is of little consequence, such as in storage of archival materials.

     

    Q16. If a large number of passes are needed to achieve substantially low compression ratios, is it possible to apply it in real-time operations, such as telecommunication and video streaming?

    Of course, to achieve acceptable speed, a file can only go through a limited number of passes, which will no doubt reduce the compression efficiency. However, the technology should still remain effective regardless, especially when it can be applied on top of other compression technologies. With skillful buffering, near real-time operation is a distinct possibility; one must be mindful that real time is only a relative and not an absolute term; just pay attention to the time difference when you watch a televised game or race and the real thing simultaneously.

     

    Q17. Are there any securities features built into the technology?

    Yes. As a matter of fact, this technology has a built-in security feature in that, in order to break into or intercept and decode a file compressed with this technology, the intruder must in the first place possess the decoding software, and secondly he or she must know how many passes the file in question has gone through. For these reasons, this technology is an ideal unbreakable security code for confidential materials in the commercial world and in defense and military operations.

     

    Q18. How do you know for sure those not-yet-implemented features can actually be implemented?

    One can be sure of that, because they are exactly more of the same as what has already been done in this prototype. It can be compared to the multiplication table; if one knows how to compile it from one times one equals one to three times nine equals twenty-seven, surely he or she would know how to compile the rest of the table. With our technology, more funding and time will see these features implemented. Such development and implementation can be described as horizontal. Each feature itself can be further developed in a vertical manner; and such vertical development is in its nature open-ended: theoretically it can go on forever, comparable to space technology; and the end result could be exponential.