General Questions (Forum Closed)

por **SDSC** » Mié 01 Feb 2012, 22:11

@David,

David Hofman escribió:Hmm, after reading your whitepaper, I must admit I'm a bit disappointed.

Sorry to hear that, but could it be that your own expectation was a little bit to high?

David Hofman escribió:Essentially, you seem to be storing chunks from the input files in the reference table (well, not literally, but that's what it boils down to fundamentally).

Not as simple as you put it but lets say it boils down to that fundamentally.

David Hofman escribió:This means it only works for storing ('encoding') multiple files if they share common patterns.

Isn't that what compression tools also do? Finding common patterns (redundancy)?

David Hofman escribió:If you encode 1000 .zip or .rar files of 1-2MB each, using DNA, you will NOT be able to reduce them all to 256 bytes and keep your reference table at or below 740 MB.

The white paper is about version 2 of DNA, I never mentioned a size of 720 MB for the reference table here. The table size you referring to was mentioned in combination with version 3 and has nothing to do with version 2 and the white paper which explains the basics of DNA.

David Hofman escribió:Furthermore, I'm pretty sure that if you compress a bunch of files using Rar or 7-Zip with solid archiving, the resulting archive will always be smaller than encoding it with DNA and taking the resulting reference file size into account. This means there is essentially NO advantage in required disk size to store certain data, or the effective compression ratio achieved by this method.

This conclusion is up to you.

David Hofman escribió:It would be very impressive indeed, because this is mathematically impossible

You cannot store any possible combination of N bits in K bits, if K < N.

You are absolutely right here.

David Hofman escribió:I really hope I'm wrong about this

Just want to make a quote from the white paper

The input file size is 544606 bytes but to reach the final requested size of the key code it takes
some iterations the get there. These iterations make the amount data flowing thru the
encoder a lot more than the size of the input file. In this case the encoder had to process
2271608 bytes and its final output was 473839 bytes. In fact the real data reduction
percentage has to be calculated from these values and is (2271608 – 473839) /
(2271608 / 100) = 79,14%, started with a empty reference table.
This indicates that the algorithms already found matching sequences and can rebuild the total stream of
2271608 bytes out of these 8855 references with the size of 473839 bytes!

Best regards, SDSC.

por **David Hofman** » Lun 13 Feb 2012, 15:45

SDSC escribió:Sorry to hear that, but could it be that your own expectation was a little bit to high?

Probably, yeah. The video seemed to imply some revolutionary breaktrough in terms of data storage (I would say 'compression' but you seem to prefer the term 'encoding').

Isn't that what compression tools also do? Finding common patterns (redundancy)?

Yes, but with the purpose of reducing the total number of bits needed to store a file. Exactly for this reason, NO compression tool will ever claim it can compress/encode/store/<insert your favorite verb> ANY file into X bytes.

David Hofman escribió:If you encode 1000 .zip or .rar files of 1-2MB each, using DNA, you will NOT be able to reduce them all to 256 bytes and keep your reference table at or below 740 MB.

The white paper is about version 2 of DNA, I never mentioned a size of 720 MB for the reference table here. The table size you referring to was mentioned in combination with version 3 and has nothing to do with version 2 and the white paper which explains the basics of DNA.

Well, will either version 2 or 3 of DNA actually be able to store 1000 .zip or .rar (or random binary) files of 1-2 MB each, in considerably less space altogether? (regardless of whether we call it compression, encoding, or something else)
Or is the disk space occupied by this data just being moved from the .zip & .rar files into a reference table, thus not actually gaining any space advantages?

Let me put it this way: exactly what is the benefit of DNA?

David Hofman escribió:Furthermore, I'm pretty sure that if you compress a bunch of files using Rar or 7-Zip with solid archiving, the resulting archive will always be smaller than encoding it with DNA and taking the resulting reference file size into account. This means there is essentially NO advantage in required disk size to store certain data, or the effective compression ratio achieved by this method.

This conclusion is up to you.

Sure, this thought was just based on the whitepaper. But since I can't test an actual version of DNA myself, I can't tell for sure. Do you expect a different result?

Just want to make a quote from the white paper

The input file size is 544606 bytes but to reach the final requested size of the key code it takes
some iterations the get there. These iterations make the amount data flowing thru the
encoder a lot more than the size of the input file. In this case the encoder had to process
2271608 bytes and its final output was 473839 bytes. In fact the real data reduction
percentage has to be calculated from these values and is (2271608 – 473839) /
(2271608 / 100) = 79,14%, started with a empty reference table.
This indicates that the algorithms already found matching sequences and can rebuild the total stream of
2271608 bytes out of these 8855 references with the size of 473839 bytes!

The fact that it takes multiple iterations (and thus processing 2271608 bytes in total) seems to be just an 'implementation detail', or am I mistaken?
What do these 2271608 involved bytes mean from a user's point of view?

If I create some compression program which compresses X bits to Y bits, and then an alternative version which first inserts a random bit after every other original bit and then compresses the resulting 2X bits to the same Y bits, I can claim the alternative version processes 'twice as much data', but obviously this doesn't represent any real improvement.

por **BOŻYDAR** » Mar 14 Feb 2012, 11:13

Thankz for sharing your approach and for looking into your kitchen.
Millions of people are trying to improve current compression techniques
that almost reached its limit. People like you are trying to find other
solutions with different techniques that has left behind. Outsiders have
their opinions but the answers can only be found to research/explore it.

Regards, BOŻYDAR (Poland)

por **Joakim** » Jue 12 Abr 2012, 09:39

Hello SDSC

I never saw such a coding process with iterations using a reference table.
Your WP gives a good closer look of the whole process. Very interesting is
diagram 1 (reference table size <> datainput) with your algorithms. It seem
to be based on a inventor from Holland. To bad that the story and streams are
in the dutch language which makes it not understandable for me.

Hope to hear more from your project.

Reg. Joakim Sweden

por **Jürgen Möller** » Sab 11 Ago 2012, 09:29

Hi SDSC,

Is there any progress in DNA yet (maybe version 3) ? I looking forward to it.
I know that some students (USA) also using the same solution for videostreaming.

Regards Jürgen

por **tiseSweleld** » Lun 06 Jul 2015, 11:04

Not quite every legacy and new-look IT vendor has its own settle on making the uninterrupted data center more programmable via software and less dependent on specialized, proprietary and expensive hardware.

General Questions (Forum Closed)

Re: General Questions (Guest threads Allowed)

Re: General Questions (Guest threads Allowed)

Re: General Questions (Guest threads Allowed)

Re: General Questions (Guest threads Allowed)

Re: General Questions (Guest threads Allowed)

Intelligence centre comments can be a useful way for the ben