Replies

To: tacticalogic

BTW, your solution is almost the same as mine... the 'find' that does the work is actualy one line of code but I break into separate lines for legibility. I've actually considered going to sha256, but so far I haven't had any false collisions using md5. I love hashes. Useful little buggers.

#!/bin/sh

OUTF=rem-duplicates.sh;

echo "#! /bin/sh" > $OUTF;

find "$@" -type f -print0 | \
    xargs -0 -n1 md5sum | \
    sort --key=1,32 | \
    uniq -w 32 -d --all-repeated=separate | \
    sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;

chmod a+x $OUTF; ls -l $OUTF

48 posted on 05/29/2015 2:36:15 PM PDT by zeugma (Are there more nearby spiders than the sun is big?)

I’m loading mine into a hash table (using the hash value as the key) as the hashes are calculated, instead of calculating all the hashes and then going back and sorting for unique values after.

50 posted on 05/29/2015 2:45:07 PM PDT by tacticalogic ("Oh, bother!" said Pooh, as he chambered his last round.)

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794