From: GldbladeI think your basic idea is that given a set of data, you can represent it with a smaller set of data that can be used to reconstruct the original data. Then you hope to perform multiple passes, each time needing less and less data to represent previous passes, in the end reaching an extremely small file size.
Based upon this synopsis, you will require an algorithm that satisfies two requirements: it can indefinitely reduce filesize, it is completely reversible.
I believe that you stated it was up to math to find such an algorithm. However, I will state that it is ignorance that suggests that such an algorithm exists in the first place. My rudimentary understanding of information theory suggests that such a process cannot possibly exist, and if you believe otherwise, please prove it by locating such an algorithm. Then proceed to contact the many information theory scientists across the globe to tell them that their decades of research is wrong.
*EDIT*
Just to clarify, you can come up with an algorithm that can indefinitely reduce filesize (approximately at least, you can't get down below 1 bit). You can come up with an algorithm that is reversible. But you can't come up with an algorithm that can do both.
The reason is that reducing filesize intrinsically loses information. You can have lossless compression up to a certain (mathematically proven) limit using various tricks, afterwards lossy compression is required to further decrease filesize. Since information is lost in the process, you cannot fully reconstruct previous data. With each pass, more and more information would be lost, meaning less and less of the original data.
For example, let's take an algorithm where you merely divide by 10 and round.
123456
12346
1235
124
12
1
Information is clearly lost with each iteration such that one would have less and less hope of reconstructing the original data the more iterations you run.
You may claim that there is an algorithm out there somewhere that will somehow meet both requirements of indefinite filesize reduction and full reversibility. Once again, I would state that this is due to a lack of understanding on the limits of mathematics.
An alternative approach to having a single algorithm is to have multiple formulas that will each individually reduce the size of the data, which is probably one of the ideas that you were trying to get across.
For example,
120, formula: divide by 5
24, formula: divide by 4
6, formula: divide by 3
2, formula: divide by 2
1
In which case, if you knew every single formula in the chain, you could theoretically reduce the orignal data to a very small filesize. However, even this approach has some major issues. First, will these formulas be predetermined, or will they be generated on the fly?
If the formulas are predetermined, then the process can only represent a *very* limited subset of data. For example, the chain above cannot represent a starting value of 121. Therefore, to be practical, the formulas would need to be generated on the fly.
However, how then do you tell the decoder what to do with the end data? You would need to store the formula chain somehow, probably alongside the processed data that you will use to reconstruct the original data. But then have you really reduced the filesize? Doesn't the storage of the formula chain count towards the filesize? If you take this into account, I think you will realize that this imposes a significant limit on your compression efficiency, and may even exact a severe penalty in some cases. Using the example above, storage of the number 120 requires only 8 bits. Storage of the number 1, along with the formula chain, would require many more than that.
I must applaud anyone who read through all of this. I can't believe I actually wrote this much considering its 4:25 in the morning. I must apologize for lack of coherency anywhere