Free Republic
Browse · Search
General/Chat
Topics · Post Article

To: tacticalogic
BTW, your solution is almost the same as mine... the 'find' that does the work is actualy one line of code but I break into separate lines for legibility. I've actually considered going to sha256, but so far I haven't had any false collisions using md5. I love hashes. Useful little buggers.

#!/bin/sh

OUTF=rem-duplicates.sh;

echo "#! /bin/sh" > $OUTF;

find "$@" -type f -print0 | \
    xargs -0 -n1 md5sum | \
    sort --key=1,32 | \
    uniq -w 32 -d --all-repeated=separate | \
    sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF;

chmod a+x $OUTF; ls -l $OUTF
 

48 posted on 05/29/2015 2:36:15 PM PDT by zeugma (Are there more nearby spiders than the sun is big?)
[ Post Reply | Private Reply | To 43 | View Replies ]


To: zeugma

I’m loading mine into a hash table (using the hash value as the key) as the hashes are calculated, instead of calculating all the hashes and then going back and sorting for unique values after.


50 posted on 05/29/2015 2:45:07 PM PDT by tacticalogic ("Oh, bother!" said Pooh, as he chambered his last round.)
[ Post Reply | Private Reply | To 48 | View Replies ]

Free Republic
Browse · Search
General/Chat
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson