A man claims to store multiple movies on 64kb

door **Johan1951** » do 14 okt 2010, 10:18

From: Virus

Finding that 1 out of 129,934,811,447,123,020,117,172,145,698,449 sequences can be represented by 3 bytes is not compression, at all. Please, let's not confuse "synthesis" with "compression", because they are different beasts.

"Compression" simply means "taking arbitrary data of any length and reduce its size". The amount of reduction depends on the "features" of the source, of course, and it can be zero.

So, you find that this particular number (let's call it N for the sake of simplicity) can be represented by just 3 bytes calculating a power (7^38). How do you plan to code N-1, then? Because you also need it. You also need to code at least the numbers from 0 to N if you want to have a true compression algorithm (better would be "0 to the smallest power of 2 >= N", and even better "0 to infinity").

Simple. You code it as (7^38)-1. That means that you're applying a different operation (power+subtraction), no? Then, to be able to reconstruct the data you also need to code what type of operation you need to perform. Saying that

7, 255, 38, 255, 1

can reproduce the number is plain wrong. You also need to code the "power+subtraction" thing (the "formula type"), and it takes more bits.

And how do you plan to code N - 1600, then?
Simple: (7^38) - (40^2)!

7, 255, 38, 255, 40, 255, 2, of course, plus the overhead to describe what to do with your data (power, power, then subtraction)

And what about N - 1601 then? ... (and so on)

Eventually you'll end up with a formula for every integer in [0, N], but as you can see you'll need several types of formulas. Eventually much more complex than calculating a couple of powers. So you'll have the overhead for the "type" and all the operands. Do you really think that the average code length (computed on all the codewords, not just 1) will be so low?

(BTW 0x38 is 56 in decimal... that's why I've switched to plain decimal my numbers)

door **Johan1951** » do 14 okt 2010, 12:18

From: FccHandler

Virus wrote:
Finding that 1 out of 129,934,811,447,123,020,117,172,145,698,449 sequences can be represented by 3 bytes is not compression, at all. Please, let's not confuse "synthesis" with "compression", because they are different beasts.

Fine, so let's forget compression and talk about synthesis.

So, you find that this particular number (let's call it N for the sake of simplicity) can be represented by just 3 bytes calculating a power (7^38). How do you plan to code N-1, then? Because you also need it. You also need to code at least the numbers from 0 to N if you want to have a true compression algorithm (better would be "0 to the smallest power of 2 >= N", and even better "0 to infinity").

Simple. You code it as (7^38)-1. That means that you're applying a different operation (power+subtraction), no? Then, to be able to reconstruct the data you also need to code what type of operation you need to perform. Saying that

7, 255, 38, 255, 1

can reproduce the number is plain wrong. You also need to code the "power+subtraction" thing (the "formula type"), and it takes more bits.

Errm, that code means 7^38^1. We need to define a new token to represent subtraction, let's choose 254. So then we can code it as

7, 255, 38, 254, 1

And yes it will take more bits to code it that way, but let's suppose that the compressor is powerful enough to find a simpler equation for 7^38-N? I don't know that one exists, but if it does then it will be the compressor's job to reduce it to the simplest form.

And how do you plane to code N - 1600, then?
Simple: (7^38) - (40^2)!

7, 255, 38, 255, 40, 255, 2, of course, plus the overhead to describe what to do with your data (power, power, then subtraction)

No, we code it as 7, 255, 38, 254, 40, 255, 2. And to save some bits we can set a fixed rule for the codec that exponentiation is always performed before subtraction (just like programming languages assume). If we must specify operational order, we can define some tokens to represent parentheses, say 253 and 252. (We still have 152 unused symbols.)

Eventually you'll end up with a formula for every integer in [0, N], but as you can see you'll need several types of formulas. Eventually much more complex than calculating a couple of powers. So you'll have the overhead for the "type" and all the operands.

How many operators do we really need?

+ - * / ^ ( )

Wouldn't that just about cover every "simple" equation? Also, I think only the "integer" type should be used. We don't want to get into the hassle of floating-point imprecision.

(BTW 0x38 is 56 in decimal... that's why I've switched to plain decimal my numbers)

oops

P.S. Just to clarify my stance, I don't honestly believe that this idea is at all feasible in the real world, but I can sort of visualize the approach, and I think it has merit. Kind of like how I can imagine travelling to other stars, even though we may never be able to do it...

door **Johan1951** » do 14 okt 2010, 12:19

From: FccHandler

Virus wrote:
And how do you plan to code N - 1600, then?
And what about N - 1601 then? ... (and so on)

Thinking about this further, (7^38)-(40^2) would be better coded as follows:

7 255 38 254 16 0

(where the BCD "16, 0" = 16 * 100 + 0)

Then (7^38)-1601 would be:

7 255 38 254 16 1

door **Johan1951** » do 14 okt 2010, 18:23

From: Virus

FccHandler wrote:
And yes it will take more bits to code it that way, but let's suppose that the compressor is powerful enough to find a simpler equation for 7^38-1?

You look a bit too much optimist to me

Simpler than 7^38-1? Using integer numbers only? :shock:

And doing that for all the numbers between 0 and 7^38? nah, you don't need to be a mathematician to see that it's impossible.
Keep in mind, for example, that all the numbers in the form x^y, with x and y in [1, 255] are just a bit more than 65000. Compare this number with the N you wrote before and see how lucky you need to be to find a number which is close to a certain power of an integer number.
If you don't believe me, just draw a random number between 1000 and 9999 and try to do the calculations manually. The formulas will be quite a bit complex, and they'll often need many operators and large operands. And no matter how efficient your bitstream format is going to be, you are going to use a lot of bits...

Look at prime numbers for example: you cannot factorize them so powers and multiplications cannot bring you the result you want (you need to do "something more" to reach it). So you'll probably need to combine several operators here, and probably large operands, too. (and not just three small numbers like 7, 38, 1

)

Just to clarify my stance, I don't honestly believe that this idea is at all feasible in the real world, but I can sort of visualize the approach, and I think it has merit.

I can visualize the approach, too. But I don't think it will really compress anything, much in the same way you cannot reproduce efficiently whatever video clip you want using Flash

EDIT: oh, I see you improved your bitstream format :mrgreen:

what about N - 14897131 then?
(note that this is still just a small "perturbation" of 7^38, given how large 7^38 is)

EDIT2: stupid grammar error corrected

door **Johan1951** » do 14 okt 2010, 18:25

From: FccHandler

Virus wrote:
I can visualize the approach, too. But I don't think it will really compress anything

Well, just like other compression ideas it takes advantage of weaknesses in the data. The "weakness" of my chosen number is the fact that it just happens to be equal to 7^38. But other numbers (including the perturbation you posted) may have hidden weaknesses we don't know about. In fact, I seem to remember a theory that all primes > 3 are of the form 6N +/- 1, so that may be one "weakness" that the compressor (oops, I mean synthesizer) could exploit.

Also, my bitstream format isn't the main obstacle here (I admit it can stand a lot of improvement)! As I see it, the problem is that even with today's technology it would take an impossibly powerful computer to break apart any giant random number and express it as the solution of a simplistic equation. Heck, 7^38 is only 14 bytes. How to synthesize 4GB of a DVD movie? :shock:

But who knows... Maybe the Star Wars DVD just happens to be the solution to the equation (742987 ^ 80175908972457823712) - 1. :mrgreen:

edit; drat, another math error

door **Johan1951** » do 14 okt 2010, 18:27

From: Virus

FccHandler wrote:
As I see it, the problem is that even with today's technology it would take an impossibly powerful computer to break apart any giant random number and express it as the solution of a simplistic equation. Heck, 7^38 is only 14 bytes. How to synthesize 4GB of a DVD movie?

yep, there must be some reason if I limited my analysis to the "pure efficiency" viewpoint

How can we classify such an algorithm?

Really-NP-hard? :mrgreen:

door **Johan1951** » do 14 okt 2010, 18:28

From: FccHandler

Thinking about it further, virus is right. You only have to experiment a little bit to realize that statistically, the chances of being able to represent any large arbitrary number as the solution of a simple equation, with the representation of the equation being smaller than the representation of the number, are hopelessly small.

@callmeace: Confound it, you almost had me convinced!

It was fun to think about, though. I'm going away now to work on my perpetual motion machine.

door **Johan1951** » do 14 okt 2010, 18:29

From: Joe Fenton

Well, I can record 256 movies in just 256 bytes. Just tell me which 256 movies, then I'll send you 180G drive with the player program on it (the ONLY thing on it), and a floppy disk with the 256 byte movie file. :mrgreen:

door **Johan1951** » do 14 okt 2010, 18:32

From: Callmeace

Okay, very interesting reading your posts

I had kept thinking of how to respond and put across what I wanted, so here goes...

Many of you seem stuck on a problem of how to allow for 'all possible combinations' and of random numbers and so forth. These are presented as disapproving the concept I present, but in my concept, These things are not an issue which will need to be dealt with.

Let me explain I hope

I think this will answer Manaos post.

Data compression is a challenge. It's hard to compress data but it's not so hard to represent a long number by much less characters which can be put back into a formula, so we first represent the data which we want to compress as a number.

Do you recall in earlier posts here that I said about unique character string? And also that a number 'is only that number'?

Do you see that immediately we have a case in maths where (so long as you set the rules) equations will only give one result.

In stages, What the concept does is first present, via code-book and mapping, the data which you want to compress as a series of numbers, unique to that data for being derived from it. Note that the mapping and code-book use also allows essential 'reversing'. Then in following stages the number is broken down into equations which would use powers and algebra for example, until one is left with a character string.

When you want your original file back, the character string when fed back into the formula generates back up to your original data number, This is because it 'adds up' to the number mathematically, which at the end is reversely written (possible because of the code-book and mapping) back into your file.

Now, how could one get a number from some data? Hmm..... Well, here is one way just as a concept suggestion. On any computer.

Firstly, understand that functions can be performed on the hosting computer, so you want to get a number unique to any file? will you allow that the binary information of the file must exist with any PC? We could use that! It will be a unique number unique only to that file and derived from it.

So in stage one The formula maps this binary information of the input file as continuous code being a long number consisting only of the digits 1 and 0. Let me explain what I mean by that. We literally want to use this binary code as a long numbers without spaces, but there are naturally going to be limits hit on how big a number could be. imagine a string consisting of 0s and 1s in a text document being as an actual number (like 1001 is 'one thousand and one') but being much longer. So rather than having the binary data of the input file as a huge unmanageble number in the form of an incredibly long sequence of binary, this would be cut in sections which are mapped to blocks, as a number at the biggest size which can be handled.

So regarding the above, as the numbers only contain 1s and 0s, even better! How much easier to handle and compress

We know that they all must be multiples of ten plus with or without an added one!, and the numbers beginning with 0s must have their S.F. (significant figures count) accounted for too.

So, we have lots of mapped blocks each consisting of a big number whose digits are either 1s or 0s.
In case I have lost you, Remember that these blocks are just cut sections of the original huge continuous number which is the binary code of the input file. We want to break this number down through a series of formulas and various Mathematical and other compression techniques like mapping, to a string of characters. The string of characters when put back into the formula will then generate that number (which will then be put back to your original data)

For the sake of argument lets say that a thousand thousands are a million, a thousand millions are a billion, a thousand billions are a trillion, and a thousand trillions are a zillion. This is much much smaller than it would really be but lets say each number (block) doesn't exceed the zillions.
I will insert commas here just for clarity, Imagine then that the first block has the number
Block01: 10,010,001,010,011,100
the second has
Block02: 110,101,001,110,101,001

and so on

in this stage it would maybe be that every block is given a multiple or power of ten number and flagged as having plus one or not and a SF number - this is just a suggestion

Remember that all this work is taking place running through the process, to be under 64KB its only the formula and the character string total we are concerned about.

In the following stages I'll leave to your imagination the advanced calculations from years of mathematical thinking which could be employed. But you would end up with a character string.

In all this quite briefly stated above I think I have shown the concept better than before. Can you see that there is no need to worry about infinite combinations of numbers and the like, there is no concerns about hashes not being reversible or anything like that. Those issues would not be encountered. The maths itself is simple but the mathematical techniques and knowledge would assuredly be complex and advanced to generate the character string.

Dificult and a painstaking task requiring dedication, yes. But it's not unrealistic, impossible or anything like that.

:edited add some clarity I hope

door **Johan1951** » do 14 okt 2010, 20:35

From: StephanV

i could understand fcc, with callmeace im lost...

Do you see that immediately we have a case in maths where (so long as you set the rules) equations will only give one result

rules create exceptions and you cant have that with compression. its not acceptable to not being able to compress a random set of data.

A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb

Re: A man claims to store multiple movies on 64kb