David Hofman escribió:Hmm, after reading your whitepaper, I must admit I'm a bit disappointed.
Sorry to hear that, but could it be that your own expectation was a little bit to high?
David Hofman escribió:Essentially, you seem to be storing chunks from the input files in the reference table (well, not literally, but that's what it boils down to fundamentally).
Not as simple as you put it but lets say it boils down to that fundamentally.

David Hofman escribió:This means it only works for storing ('encoding') multiple files if they share common patterns.
Isn't that what compression tools also do? Finding common patterns (redundancy)?
David Hofman escribió:If you encode 1000 .zip or .rar files of 1-2MB each, using DNA, you will NOT be able to reduce them all to 256 bytes and keep your reference table at or below 740 MB.
The white paper is about version 2 of DNA, I never mentioned a size of 720 MB for the reference table here. The table size you referring to was mentioned in combination with version 3 and has nothing to do with version 2 and the white paper which explains the basics of DNA.
David Hofman escribió:Furthermore, I'm pretty sure that if you compress a bunch of files using Rar or 7-Zip with solid archiving, the resulting archive will always be smaller than encoding it with DNA and taking the resulting reference file size into account. This means there is essentially NO advantage in required disk size to store certain data, or the effective compression ratio achieved by this method.
This conclusion is up to you.
David Hofman escribió:It would be very impressive indeed, because this is mathematically impossible
You cannot store any possible combination of N bits in K bits, if K < N.
You are absolutely right here.
David Hofman escribió:I really hope I'm wrong about this
Just want to make a quote from the white paper
The input file size is 544606 bytes but to reach the final requested size of the key code it takes
some iterations the get there. These iterations make the amount data flowing thru the
encoder a lot more than the size of the input file. In this case the encoder had to process
2271608 bytes and its final output was 473839 bytes. In fact the real data reduction
percentage has to be calculated from these values and is (2271608 – 473839) /
(2271608 / 100) = 79,14%, started with a empty reference table.
This indicates that the algorithms already found matching sequences and can rebuild the total stream of
2271608 bytes out of these 8855 references with the size of 473839 bytes!
Best regards, SDSC.