WUGNET, the Windows User Group Network
Your Complete Resource Center for "The Best" in Shareware, Computing Tips and Support, Windows Industry News... and much more!
Home Forums Shareware Windows Tips Hot Offers FREE Newsletters Arcade Contact Us About Partners
Search WUGNET: RSS Feeds RSS Feeds Advertise with WUGNET    |    Shareware eBooks
HomeHome FAQFAQ      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Is there a hard drive file organizer that will ...

 
Goto page Previous  1, 2, 3, 4, 5
   Home -> Windows Other -> General Discussion RSS
Next:  email retreival comcast/smartzone  
Author Message
thanatoid

External


Since: Aug 21, 2006
Posts: 592



(Msg. 25) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: microsoft>public>win98>gen_discussion (more info?)

"FromTheRafters" <erratic.DeleteThis@nomail.afraid.org> wrote in
news:#mS$aA8NJHA.3876@TK2MSFTNGP04.phx.gbl:

>> Your understanding of networks in not nearly as impeccable as
>> your logic of finding duplicates on one drive on which I
>> commented in my previous reply.
>
> Okay, so as this thread reaches its EOL, it may interest
> someone that all might not be as it seems.
>
> I'm not sure about modern disk operating systems, but
> some older ones would not actually make a copy when
> asked to do so. Rather, they would make another full
> path to the same data on disk (why waste space with
> redundant data). Copying to another disk, or partition
> on the same disk, would actually necessitate a copy
> and would take longer as a result. When access was
> made to the file, and it was modified, then the path used
> to access that file would point to a newly created file
> while the *original* would still be accessed from the
> other paths.
>
> So, deleting duplicate files on a single drive in this case
> would only clean up the file system without freeing up
> any harddrive space.

OR deleting duplicates, it would seem (don't want to read it
again, see below).

Thanks for the headache. What a nightmare.


--
Those who cast the votes decide nothing. Those who count the
votes decide everything.
- Josef Stalin

NB: Not only is my KF over 4 KB and growing, I am also filtering
everything from discussions.microsoft and google groups, so no
offense if you don't get a reply/comment unless I see you quoted
in another post.
Back to top
Login to vote
Bill in Co.

External


Since: Apr 24, 2005
Posts: 989



(Msg. 26) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

FromTheRafters wrote:
>> AFAICS, a fundamental flaw in duplicate finder software is that it
>> relies on direct binary comparisons. With programs like FindDup, if we
>> have 3 files of equal size, then we would need to compare file1 with
>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>> For n equally sized files, the number of reads is n(n-1).
>>
>> Alternatively, if we relied on MD5 checksums, then each file would
>> only need to be read once.
>
> So...once it is found to be the same checksum, what should the
> program do next? How important are these files?

> A fundamental
> flaw would be to trust MD5 checksums as an indication that the
> files are indeed duplicates.

Since when? What is the statistical likelyhood of that being true?

> You can mostly trust MD5 checksums
> to indicate two files are different, but the other way around?
Back to top
Login to vote
Franc Zabkar

External


Since: Sep 03, 2005
Posts: 1504



(Msg. 27) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Sun, 26 Oct 2008 16:23:16 -0400, "FromTheRafters"
<erratic RemoveThis @nomail.afraid.org> put finger to keyboard and composed:

>> AFAICS, a fundamental flaw in duplicate finder software is that it
>> relies on direct binary comparisons. With programs like FindDup, if we
>> have 3 files of equal size, then we would need to compare file1 with
>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>> For n equally sized files, the number of reads is n(n-1).
>>
>> Alternatively, if we relied on MD5 checksums, then each file would
>> only need to be read once.
>
>So...once it is found to be the same checksum, what should the
>program do next? How important are these files? A fundamental
>flaw would be to trust MD5 checksums as an indication that the
>files are indeed duplicates. You can mostly trust MD5 checksums
>to indicate two files are different, but the other way around?

OK, I retract my ill-informed comment, but it still seems to me that
the benefits far outweigh the risks. FindDup has been running for the
past 18 hours or so as I write this, so I'm happy to accept a 30
minute alternative. In any case, all programs appear to require that
the user decides whether or not a file can be safely deleted. To this
end the programmer could allow for a binary comparision in those cases
where there is any doubt.

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.
Back to top
Login to vote
98 Guy

External


Since: Mar 12, 2005
Posts: 666



(Msg. 28) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Bill in Co." wrote:

> > A fundamental flaw would be to trust MD5 checksums as an
> > indication that the files are indeed duplicates.
>
> Since when? What is the statistical likelyhood of that being
> true?

If there was no malicious intent or source involved, I'd say the odds
are pretty low. But even if you had 2 identical hashs, it's simple
enough to just see if the files are the same length, and if they were,
then you do a byte-by-byte comparison.
Back to top
Login to vote
FromTheRafters

External


Since: Sep 28, 2008
Posts: 58



(Msg. 29) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Franc Zabkar" <fzabkar.RemoveThis@iinternode.on.net> wrote in message
news:rpn9g4h6d10d3kv20ud3j02e2phuq66ucg@4ax.com...
> On Sun, 26 Oct 2008 16:23:16 -0400, "FromTheRafters"
> <erratic.RemoveThis@nomail.afraid.org> put finger to keyboard and composed:
>
>>> AFAICS, a fundamental flaw in duplicate finder software is that it
>>> relies on direct binary comparisons. With programs like FindDup, if we
>>> have 3 files of equal size, then we would need to compare file1 with
>>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>>> For n equally sized files, the number of reads is n(n-1).
>>>
>>> Alternatively, if we relied on MD5 checksums, then each file would
>>> only need to be read once.
>>
>>So...once it is found to be the same checksum, what should the
>>program do next? How important are these files? A fundamental
>>flaw would be to trust MD5 checksums as an indication that the
>>files are indeed duplicates. You can mostly trust MD5 checksums
>>to indicate two files are different, but the other way around?
>
> OK, I retract my ill-informed comment, but it still seems to me that
> the benefits far outweigh the risks. FindDup has been running for the
> past 18 hours or so as I write this, so I'm happy to accept a 30
> minute alternative. In any case, all programs appear to require that
> the user decides whether or not a file can be safely deleted. To this
> end the programmer could allow for a binary comparision in those cases
> where there is any doubt.

It all depends on the risk you are willing to assume. It would be nice
to have a hybrid case where you could switch between the MD5
mode and the byte by byte mode depending on such factors as type
or location of files etc.
Back to top
Login to vote
FromTheRafters

External


Since: Sep 28, 2008
Posts: 58



(Msg. 30) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Bill in Co." <not_really_here DeleteThis @earthlink.net> wrote in message
news:%23RrPjC7NJHA.1908@TK2MSFTNGP04.phx.gbl...
> FromTheRafters wrote:
>>> AFAICS, a fundamental flaw in duplicate finder software is that it
>>> relies on direct binary comparisons. With programs like FindDup, if we
>>> have 3 files of equal size, then we would need to compare file1 with
>>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>>> For n equally sized files, the number of reads is n(n-1).
>>>
>>> Alternatively, if we relied on MD5 checksums, then each file would
>>> only need to be read once.
>>
>> So...once it is found to be the same checksum, what should the
>> program do next? How important are these files?
>
>> A fundamental
>> flaw would be to trust MD5 checksums as an indication that the
>> files are indeed duplicates.
>
> Since when?

Forever.

Checksums are often smaller than the file they are derived from
(thats kinda the point, eh?).

> What is the statistical likelyhood of that being true?

Greater than zero.
Back to top
Login to vote
Bill in Co.

External


Since: Apr 24, 2005
Posts: 989



(Msg. 31) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

FromTheRafters wrote:
> "Bill in Co." <not_really_here.DeleteThis@earthlink.net> wrote in message
> news:%23RrPjC7NJHA.1908@TK2MSFTNGP04.phx.gbl...
>> FromTheRafters wrote:
>>>> AFAICS, a fundamental flaw in duplicate finder software is that it
>>>> relies on direct binary comparisons. With programs like FindDup, if we
>>>> have 3 files of equal size, then we would need to compare file1 with
>>>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>>>> For n equally sized files, the number of reads is n(n-1).
>>>>
>>>> Alternatively, if we relied on MD5 checksums, then each file would
>>>> only need to be read once.
>>>
>>> So...once it is found to be the same checksum, what should the
>>> program do next? How important are these files?
>>
>>> A fundamental
>>> flaw would be to trust MD5 checksums as an indication that the
>>> files are indeed duplicates.
>>
>> Since when?
>
> Forever.
>
> Checksums are often smaller than the file they are derived from
> (thats kinda the point, eh?).

No, that's not the point. Your statement was that the checksums did not
assure the integrity of the file, whatsoever - i.e., that two files could
have the same hash valus and yet be different, which I still say is *highly*
unlikely. A statistically insignificant probability, so that using hash
values is often prudent and is much more expedient, of course.
Back to top
Login to vote
FromTheRafters

External


Since: Sep 28, 2008
Posts: 58



(Msg. 32) Posted: Mon Oct 27, 2008 3:09 am
Post subject: Re: Is there a hard drive file organizer that will ... [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

"Bill in Co." <not_really_here.TakeThisOut@earthlink.net> wrote in message
news:ugvJco8NJHA.3636@TK2MSFTNGP05.phx.gbl...
> FromTheRafters wrote:
>> "Bill in Co." <not_really_here.TakeThisOut@earthlink.net> wrote in message
>> news:%23RrPjC7NJHA.1908@TK2MSFTNGP04.phx.gbl...
>>> FromTheRafters wrote:
>>>>> AFAICS, a fundamental flaw in duplicate finder software is that it
>>>>> relies on direct binary comparisons. With programs like FindDup, if we
>>>>> have 3 files of equal size, then we would need to compare file1 with
>>>>> file2, file1 with file3, and file2 with file3. This requires 6 reads.
>>>>> For n equally sized files, the number of reads is n(n-1).
>>>>>
>>>>> Alternatively, if we relied on MD5 checksums, then each file would
>>>>> only need to be read once.
>>>>
>>>> So...once it is found to be the same checksum, what should the
>>>> program do next? How important are these files?
>>>
>>>> A fundamental
>>>> flaw would be to trust MD5 checksums as an indication that the
>>>> files are indeed duplicates.
>>>
>>> Since when?
>>
>> Forever.
>>
>> Checksums are often smaller than the file they are derived from
>> (thats kinda the point, eh?).
>
> No, that's not the point. Your statement was that the checksums did not
> assure the integrity of the file, whatsoever

I didn't say anything about the integrity of a file, and I also didn't
say 'whatsoever'. You can still read what I said above.

If you want to ensure they are duplicates - compare the files exactly.
If you only need to be reasonably sure they are duplicates, checksums
are adequate.

> - i.e., that two files could have the same hash valus and yet be
> different, which I still say is *highly* unlikely.

Highly unlikely -yes. But files can be highly valuable too. Just
how fast does such a program need to be? How much speed
is worth how much accuracy?

> A statistically insignificant probability, so that using hash values is
> often prudent and is much more expedient, of course.

True, but to aim toward accuracy instead of speed is not a flaw.
Back to top
Login to vote
Display posts from previous:   
       Home -> Windows Other -> General Discussion All times are: Eastern Time (US & Canada) (change)
Goto page Previous  1, 2, 3, 4, 5
Page 4 of 5

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum
Categories:
 Windows XP
 Windows Vista
  Windows Other
 Office
 Office Other
 Security
 WinRAR
  • Home |
  • Shareware |
  • Windows Tips |
  • Hot Offers |
  • FREE Newsletters |
  • Arcade |
  • Forums |
  • eBooks |
  • About WUGNET |
  • Partners |
  • Contact

  • WUGNET Privacy Policy |
  • Link to WUGNET