In the current version (V4)
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$ht = @{}
get-childitem c:\ -Recurse -File |
Select -ExpandProperty FullName |
foreach {
$hash = [System.BitConverter]::ToString($md5.ComputeHash([System.IO.File]::ReadAllBytes($_.FullName)))
$ht[$hash] += @($_.FullName)
}
$ht.GetEnumerator() |
Where {$_.Value.count -gt 1}
V5, which will be released with Windows 10 has a Get-FileHash command that would simplify that down to about half as many lines of code.
So, by what measure is that not nearly as powerful as sed, awk, and grep ( assuming that’s what your 5-line bit of shell code is using to find those duplicate files).
well then, congratulations for finally making it into the 21st century! That stuff is almost as unreadable as perl. :-)
I guess now I won’t be letting the windows admin guys off anymore because the poor bastards don’t have the tools needed to do their job properly.
#!/bin/sh
OUTF=rem-duplicates.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -print0 | \
xargs -0 -n1 md5sum | \
sort --key=1,32 | \
uniq -w 32 -d --all-repeated=separate | \
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF