Page 5 of 6

Re: General Questions (Guest threads Allowed)

PostPosted: Sun 22 Jan 2012, 13:59
by Webmaster
uwequbit wrote:@Webmaster
Who has "translated" my text? Please repeat! ;)

Hello Uwe,

When it is needed, all messages in this Topic will be translated manually
(with original message included). Some people are able to read English but
have difficulties to write it down. This offers the possibility to follow this
Topic by everyone.

Yes, it's repeatable ;)

Best regards, Webmaster

Re: General Questions (Guest threads Allowed)

PostPosted: Sun 22 Jan 2012, 16:04
by uwequbit
Hello Webmaster,

Please enter the text newly translated into the English language. I have posted the second set again.

Thank you (and Google Translator) ;)

regards

uwequbit

Re: General Questions (Guest threads Allowed)

PostPosted: Sun 22 Jan 2012, 19:12
by Siegmund
@SDSC
You're welcome.
@Webmaster
Thanks for translating.

Siegmund

Re: General Questions (Guest threads Allowed)

PostPosted: Wed 25 Jan 2012, 22:51
by SDSC
@ uwequbit,

uwequbit wrote:Thank you very much for publishing the documentation. Respect!!!

Thanks.

uwequbit wrote:The more files are compressed, the bigger the reference table will be. Referring to the reference table, is this a mathematical model or does it grow iteratively, so created during compression ? (it would be logical, but it will never reach a 4^255 pattern (4 Byte pattern)).

The reference table for versions 2 of DNA grows iteratively but sequences (patterns) will only be added once. Store data only once!

uwequbit wrote:but it will never reach a 4^255 pattern (4 Byte pattern).

Can you explain more?

uwequbit wrote:How big is one pattern in the reference table ?

The size of a DNA sequence (pattern) in the reference table variates. It depends on the blocks size and the data content of the block. I rather like to use the term factor here which is defined as [block size] / [sequence size]. For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78 always started with an empty reference table and there for representing the worst case situation. The main goal at this moment is to improve this value, the higher the better. The magical value for this factor is 5.68, if this value can be reached, DNA can do without a reference table.

Best regards, SDSC.

Re: General Questions (Guest threads Allowed)

PostPosted: Thu 26 Jan 2012, 13:19
by Karel Jan
Interesting Topic.

This is one of the few Topics I''ll folllow. Thanks for sharing (WP).

Karel Jan

Re: General Questions (Guest threads Allowed)

PostPosted: Sun 29 Jan 2012, 12:25
by David Hofman
Hmm, after reading your whitepaper, I must admit I'm a bit disappointed. Essentially, you seem to be storing chunks from the input files in the reference table (well, not literally, but that's what it boils down to fundamentally).

This means it only works for storing ('encoding') multiple files if they share common patterns.

If you encode 1000 .zip or .rar files of 1-2MB each, using DNA, you will NOT be able to reduce them all to 256 bytes and keep your reference table at or below 740 MB.

Furthermore, I'm pretty sure that if you compress a bunch of files using Rar or 7-Zip with solid archiving, the resulting archive will always be smaller than encoding it with DNA and taking the resulting reference file size into account. This means there is essentially NO advantage in required disk size to store certain data, or the effective compression ratio achieved by this method.

Maybe I miss the point, but I don't see any revolutionary benefits here? :(

Re: General Questions (Guest threads Allowed)

PostPosted: Mon 30 Jan 2012, 09:24
by Eberhard
Some qoutes from your White Paper
SDSC wrote:By creating unique (DNA) sequences from blocks of data which are equal for every file type,
it would be possible to replace this block by a shorter reference.

I suppose, every file type is including Zip, Rar, 7-zip etc.

SDSC wrote:Try to compress a compressed file again and you will see that it will not compress any more
because all the redundancy was removed at the first run. The algorithms reached the end
of there capacity and this results that the file grows again. (negative compression!)

Known limitations :(
SDSC wrote:DNA is not a replacement of current compression techniques but is an addition to it and they
can work very well together.
Very interesting :)


Some other qoutes
SDSC wrote:For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78
always started with an empty reference table and there for representing the worst case situation.

and
SDSC wrote:DNA in this case needs more files/data to get effective (it has to learn).

Referring to diagram 1, you showed me that this factor grows during learning process.

When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !
When you can handle the speed, your test version implemented in current compression tools
will give it a boost :roll:


Regards, Eberhard (Röth)

Re: General Questions (Guest threads Allowed)

PostPosted: Mon 30 Jan 2012, 09:35
by Eberhard
It was impossible to modify my earlier reply :(
SDSC wrote:Up to the fifth iteration the key code decreases strong but after that the effect of the
resting iterations gets smaller (thirteen iterations for 750 bytes).

Is here a possiblity to increase your encoding time ?

Eberhard

Re: General Questions (Guest threads Allowed)

PostPosted: Mon 30 Jan 2012, 10:29
by David Hofman
Eberhard wrote:When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !

It would be very impressive indeed, because this is mathematically impossible :)

You cannot store any possible combination of N bits in K bits, if K < N.

@SDSC: could you please comment on my previous post above? I really hope I'm wrong about this :(

Re: General Questions (Guest threads Allowed)

PostPosted: Wed 01 Feb 2012, 22:00
by SDSC
@ Eberhard,

Eberhard wrote:I suppose, every file type is including Zip, Rar, 7-zip etc.

Yes, DNA processes every file as equal, just a sequence of numbers!

Eberhard wrote:Some other qoutes
SDSC schreef:
For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78
always started with an empty reference table and there for representing the worst case situation.
and
SDSC schreef:
DNA in this case needs more files/data to get effective (it has to learn).
Referring to diagram 1, you showed me that this factor grows during learning process.

Yes.

Eberhard wrote:When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !

I think there is a little misunderstanding here, let me quote the referring question and answer.

SDSC wrote:uwequbit schreef:
How big is one pattern in the reference table ?
The size of a DNA sequence (pattern) in the reference table variates. It depends on the blocks size and the data content of the block. I rather like to use the term factor here which is defined as [block size] / [sequence size]. For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78 always started with an empty reference table and there for representing the worst case situation. The main goal at this moment is to improve this value, the higher the better. The magical value for this factor is 5.68, if this value can be reached, DNA can do without a reference table.

The factor mentioned here is the factor between block size and sequence size which means that for every block processed, the DNA algorithms will calculate a sequence (pattern) that is at least 4.26 smaller than the size of the processed block. These tests where always started with an empty reference table, this way I could check the maximum amount of data a file would at. (worse case) This does not mean that you can shrink every (compressed) file by a factor of >= 4.26 with an empty reference table!

Eberhard wrote:SDSC schreef:
Up to the fifth iteration the key code decreases strong but after that the effect of the
resting iterations gets smaller (thirteen iterations for 750 bytes).
Is here a possiblity to increase your encoding time ?

I am afraid not, during encoding the last iterations are the fastest once, the algorithms are very heavy and this needs optimisation. In the test version also some brute force programming was used to speed up testing and of course this is not the best and optimal way. :oops:

Best regards, SDSC.