file wrangling help requested

footfootfoot • Feb 23, 2011 12:40 pm
I've got a problem with one of my back up drives due to poor computer hygiene.

There are about 50,000 files on it, of which a huge number are duplicates residing in a series of nested folders.

For example,
L:\music\elvis costello\my aim is true has tracks 1,2,4,5,7,8,9

and
L:\music\itunes music\~E\elvis costello\my aim is true has tracks 1,2,3,6,10,11

and
L:\music\MISC Vinyl\elvis costello\my aim is true has a live version of track 3 with the same file name as track 3 above, but a different bit rate and time.

How can I get all these organized, deleting the dupes and not deleting the similarly named songs.

I've tried using digital volcano's Duplicate Cleaner with some success, but it still misses dupes when using MD5 or byte by byte, and it comes up with false positives when there are multiple versions of songs, especially a problem with greatest hits and compilation discs.

Any suggestions?
glatt • Feb 23, 2011 12:49 pm
You need an intern. Post an opening at your local college. Spring semester is internship time.
Perry Winkle • Feb 23, 2011 1:00 pm
I don't know of any out of the box software.

If you search around for a Python or Ruby script, I'm sure you could find one.

Basically you want something that will create an index of all of your music based on something like an md5 checksum of the file (the filename doesn't matter). Then it should remove duplicates, and maybe move them all to a consistent location.
Perry Winkle • Feb 23, 2011 1:02 pm
This will give you a list of all of the duplicates.
footfootfoot • Feb 23, 2011 1:34 pm
OK, I will get an intern to run that script for me.
Gravdigr • Mar 1, 2011 2:38 am
[SIZE="1"]Hah![/SIZE]