September 25, 2010

removing duplicated photos from your hard drive

These last days I've been trying to organize my photo collection. I realized that some of them were in more than one folder, that is, some photos were duplicated or triplicated on my hard drive.


So, I wrote a python script to solve my problem, and here it is:

import os
import sys
import hashlib

def removeDuplicatedImages(dirname):
print "DEBUG: looking for images at "+dirname
list_of_images={}
list_of_images_to_remove=[]
for root, dirs, files in os.walk(dirname):
for file in files:
filename=root+os.sep+file
f=open(filename,'r')
md5sum=hashlib.md5(f.read()).hexdigest()
f.close()
if not list_of_images.has_key(md5sum):
list_of_images[md5sum]=[filename]
else:
print "DEBUG: removing " + filename
os.remove(filename)
return

if __name__ == '__main__':
if len(sys.argv)!=2:
print "Usage " + sys.argv[0] + "dirname"
sys.exit()
dirname=sys.argv[1]
removeDuplicatedImages(dirname)

2 comments:

Anonymous said...

fdupes -r directory --delete

Isn't that easier?

Jordi said...

Yes! That looks much easier! I did not know about this tool. Thanks for posting it.