Página 5 de 6

Re: General Questions (Guest threads Allowed)

NotaPublicado: Dom 22 Ene 2012, 13:59
por Webmaster
uwequbit escribió:@Webmaster
Who has "translated" my text? Please repeat! ;)

Hello Uwe,

When it is needed, all messages in this Topic will be translated manually
(with original message included). Some people are able to read English but
have difficulties to write it down. This offers the possibility to follow this
Topic by everyone.

Yes, it's repeatable ;)

Best regards, Webmaster

Re: General Questions (Guest threads Allowed)

NotaPublicado: Dom 22 Ene 2012, 16:04
por uwequbit
Hello Webmaster,

Please enter the text newly translated into the English language. I have posted the second set again.

Thank you (and Google Translator) ;)

regards

uwequbit

Re: General Questions (Guest threads Allowed)

NotaPublicado: Dom 22 Ene 2012, 19:12
por Siegmund
@SDSC
You're welcome.
@Webmaster
Thanks for translating.

Siegmund

Re: General Questions (Guest threads Allowed)

NotaPublicado: Mié 25 Ene 2012, 22:51
por SDSC
@ uwequbit,

uwequbit escribió:Thank you very much for publishing the documentation. Respect!!!

Thanks.

uwequbit escribió:The more files are compressed, the bigger the reference table will be. Referring to the reference table, is this a mathematical model or does it grow iteratively, so created during compression ? (it would be logical, but it will never reach a 4^255 pattern (4 Byte pattern)).

The reference table for versions 2 of DNA grows iteratively but sequences (patterns) will only be added once. Store data only once!

uwequbit escribió:but it will never reach a 4^255 pattern (4 Byte pattern).

Can you explain more?

uwequbit escribió:How big is one pattern in the reference table ?

The size of a DNA sequence (pattern) in the reference table variates. It depends on the blocks size and the data content of the block. I rather like to use the term factor here which is defined as [block size] / [sequence size]. For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78 always started with an empty reference table and there for representing the worst case situation. The main goal at this moment is to improve this value, the higher the better. The magical value for this factor is 5.68, if this value can be reached, DNA can do without a reference table.

Best regards, SDSC.

Re: General Questions (Guest threads Allowed)

NotaPublicado: Jue 26 Ene 2012, 13:19
por Karel Jan
Interesting Topic.

This is one of the few Topics I''ll folllow. Thanks for sharing (WP).

Karel Jan

Re: General Questions (Guest threads Allowed)

NotaPublicado: Dom 29 Ene 2012, 12:25
por David Hofman
Hmm, after reading your whitepaper, I must admit I'm a bit disappointed. Essentially, you seem to be storing chunks from the input files in the reference table (well, not literally, but that's what it boils down to fundamentally).

This means it only works for storing ('encoding') multiple files if they share common patterns.

If you encode 1000 .zip or .rar files of 1-2MB each, using DNA, you will NOT be able to reduce them all to 256 bytes and keep your reference table at or below 740 MB.

Furthermore, I'm pretty sure that if you compress a bunch of files using Rar or 7-Zip with solid archiving, the resulting archive will always be smaller than encoding it with DNA and taking the resulting reference file size into account. This means there is essentially NO advantage in required disk size to store certain data, or the effective compression ratio achieved by this method.

Maybe I miss the point, but I don't see any revolutionary benefits here? :(

Re: General Questions (Guest threads Allowed)

NotaPublicado: Lun 30 Ene 2012, 09:24
por Eberhard
Some qoutes from your White Paper
SDSC escribió:By creating unique (DNA) sequences from blocks of data which are equal for every file type,
it would be possible to replace this block by a shorter reference.

I suppose, every file type is including Zip, Rar, 7-zip etc.

SDSC escribió:Try to compress a compressed file again and you will see that it will not compress any more
because all the redundancy was removed at the first run. The algorithms reached the end
of there capacity and this results that the file grows again. (negative compression!)

Known limitations :(
SDSC escribió:DNA is not a replacement of current compression techniques but is an addition to it and they
can work very well together.
Very interesting :)


Some other qoutes
SDSC escribió:For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78
always started with an empty reference table and there for representing the worst case situation.

and
SDSC escribió:DNA in this case needs more files/data to get effective (it has to learn).

Referring to diagram 1, you showed me that this factor grows during learning process.

When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !
When you can handle the speed, your test version implemented in current compression tools
will give it a boost :roll:


Regards, Eberhard (Röth)

Re: General Questions (Guest threads Allowed)

NotaPublicado: Lun 30 Ene 2012, 09:35
por Eberhard
It was impossible to modify my earlier reply :(
SDSC escribió:Up to the fifth iteration the key code decreases strong but after that the effect of the
resting iterations gets smaller (thirteen iterations for 750 bytes).

Is here a possiblity to increase your encoding time ?

Eberhard

Re: General Questions (Guest threads Allowed)

NotaPublicado: Lun 30 Ene 2012, 10:29
por David Hofman
Eberhard escribió:When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !

It would be very impressive indeed, because this is mathematically impossible :)

You cannot store any possible combination of N bits in K bits, if K < N.

@SDSC: could you please comment on my previous post above? I really hope I'm wrong about this :(

Re: General Questions (Guest threads Allowed)

NotaPublicado: Mié 01 Feb 2012, 22:00
por SDSC
@ Eberhard,

Eberhard escribió:I suppose, every file type is including Zip, Rar, 7-zip etc.

Yes, DNA processes every file as equal, just a sequence of numbers!

Eberhard escribió:Some other qoutes
SDSC schreef:
For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78
always started with an empty reference table and there for representing the worst case situation.
and
SDSC schreef:
DNA in this case needs more files/data to get effective (it has to learn).
Referring to diagram 1, you showed me that this factor grows during learning process.

Yes.

Eberhard escribió:When I put all qoutes together, it look like you can shrink every compressed file with a faktor
of >= 4.26 with an empty reference table. If that's true, then it is very impressive !

I think there is a little misunderstanding here, let me quote the referring question and answer.

SDSC escribió:uwequbit schreef:
How big is one pattern in the reference table ?
The size of a DNA sequence (pattern) in the reference table variates. It depends on the blocks size and the data content of the block. I rather like to use the term factor here which is defined as [block size] / [sequence size]. For the versions 2 of DNA this factor has a minimum of 4.26 and test showed values up to 4.78 always started with an empty reference table and there for representing the worst case situation. The main goal at this moment is to improve this value, the higher the better. The magical value for this factor is 5.68, if this value can be reached, DNA can do without a reference table.

The factor mentioned here is the factor between block size and sequence size which means that for every block processed, the DNA algorithms will calculate a sequence (pattern) that is at least 4.26 smaller than the size of the processed block. These tests where always started with an empty reference table, this way I could check the maximum amount of data a file would at. (worse case) This does not mean that you can shrink every (compressed) file by a factor of >= 4.26 with an empty reference table!

Eberhard escribió:SDSC schreef:
Up to the fifth iteration the key code decreases strong but after that the effect of the
resting iterations gets smaller (thirteen iterations for 750 bytes).
Is here a possiblity to increase your encoding time ?

I am afraid not, during encoding the last iterations are the fastest once, the algorithms are very heavy and this needs optimisation. In the test version also some brute force programming was used to speed up testing and of course this is not the best and optimal way. :oops:

Best regards, SDSC.