Replies

To: zeugma

In the current version (V4)

$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$ht = @{}
get-childitem c:\ -Recurse -File |
Select -ExpandProperty FullName |
foreach {
$hash = [System.BitConverter]::ToString($md5.ComputeHash([System.IO.File]::ReadAllBytes($_.FullName)))
$ht[$hash] += @($_.FullName)
}

$ht.GetEnumerator() |
Where {$_.Value.count -gt 1}

V5, which will be released with Windows 10 has a Get-FileHash command that would simplify that down to about half as many lines of code.

So, by what measure is that not nearly as powerful as sed, awk, and grep ( assuming that’s what your 5-line bit of shell code is using to find those duplicate files).

43 posted on 05/29/2015 12:53:30 PM PDT by tacticalogic ("Oh, bother!" said Pooh, as he chambered his last round.)

[ Post Reply | Private Reply | To 42 | View Replies ]

To: tacticalogic

well then, congratulations for finally making it into the 21st century! That stuff is almost as unreadable as perl. :-)

I guess now I won’t be letting the windows admin guys off anymore because the poor bastards don’t have the tools needed to do their job properly.

46 posted on 05/29/2015 2:29:24 PM PDT by zeugma (Are there more nearby spiders than the sun is big?)

[ Post Reply | Private Reply | To 43 | View Replies ]

To: tacticalogic

BTW, your solution is almost the same as mine... the 'find' that does the work is actualy one line of code but I break into separate lines for legibility. I've actually considered going to sha256, but so far I haven't had any false collisions using md5. I love hashes. Useful little buggers.

#!/bin/sh

OUTF=rem-duplicates.sh;

echo "#! /bin/sh" > $OUTF;

find "$@" -type f -print0 | \
    xargs -0 -n1 md5sum | \
    sort --key=1,32 | \
    uniq -w 32 -d --all-repeated=separate | \
    sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;

chmod a+x $OUTF; ls -l $OUTF

48 posted on 05/29/2015 2:36:15 PM PDT by zeugma (Are there more nearby spiders than the sun is big?)

[ Post Reply | Private Reply | To 43 | View Replies ]

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794