Determine Which Duplicate Images to Remove using Python Dictionary

Question

Welcome To Ask or Share your Answers For Others

Determine Which Duplicate Images to Remove using Python Dictionary

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

Determine Which Duplicate Images to Remove using Python Dictionary

I've written a script that identifies duplicate and near-duplicate images based on some criteria. The results are placed in a dictionary where the keys represent an image and the values are duplicate images. For example, for image0 there are duplicate images 1-5. Now, I'm trying to make a list of candidates to delete based on my dictionary. I'd like to keep the first image that appears in the dictionary (image0), delete images 1-5, and then skip keys 1-5 because those images have already been removed. How would I do this? Or is there a better way to go about identifying candidates for deletion?

Example Dictionary:

{0: [1, 2, 3, 4, 5],
 1: [0, 2, 3, 4, 5],
 2: [0, 1, 3, 4, 5],
 3: [0, 1, 2, 4, 5],
 4: [0, 1, 2, 3, 5],
 5: [0, 1, 2, 3, 4],
 6: [7, 8, 9, 10, 11],
 7: [6, 8, 9, 10, 11],
 8: [6, 7, 9, 10, 11],
 9: [6, 7, 8, 10, 11],
 10: [6, 7, 8, 9, 11],
 11: [6, 7, 8, 9, 10],
 12: [13, 14, 15, 16, 17],
 13: [12, 14, 15, 16, 17],
 14: [12, 13, 15, 16, 17],
 15: [12, 13, 14, 16, 17],
 16: [12, 13, 14, 15, 17],
 17: [12, 13, 14, 15, 16],
 18: [19, 20, 21, 22, 23],
 19: [18, 20, 21, 22, 23],
 20: [18, 19, 21, 22, 23],
 21: [18, 19, 20, 22, 23],
 22: [18, 19, 20, 21, 23],
 23: [18, 19, 20, 21, 22],
 24: [25, 26, 27, 28, 29],
 25: [24, 26, 27, 28, 29],
 26: [24, 25, 27, 28, 29],
 27: [24, 25, 26, 28, 29],
 28: [24, 25, 26, 27, 29],
 29: [24, 25, 26, 27, 28],
 30: [31, 32, 33, 34, 35],
 31: [30, 32, 33, 34, 35],
 32: [30, 31, 33, 34, 35],
 33: [30, 31, 32, 34, 35],
 34: [30, 31, 32, 33, 35],
 35: [30, 31, 32, 33, 34],
 36: [37, 38, 39],
 37: [36, 38, 39],
 38: [36, 37, 39],
 39: [36, 37, 38],
 40: [41, 42, 43],
 41: [40, 42, 43],
 42: [40, 41, 43],
 43: [40, 41, 42],
 44: [45, 46],
 45: [44, 46],
 46: [44, 45]}

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-26T20:38:33+0000

The logic of what you want to do is very straightforward. You can keep a set of delete candidates. You iterate over the keys in the dict. For each key, you look for it in the set. If it's there, then you don't want to process it because it has already been determined to be a key you want to delete. If it isn't there, then the value in that key's dictionary contains a list of keys that you want to delete, and so you add all of those keys to the list of delete candidates.

If you really want a list as the result, at the end you can convert the set to a list.

Here's the code to do that:

dups = set()

for i in data:
    if i not in dups:
        dups = dups.union(data[i])

print(list(dups))

Result:

[1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 41, 42, 43, 45, 46]

Categories

Determine Which Duplicate Images to Remove using Python Dictionary

Determine Which Duplicate Images to Remove using Python Dictionary

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags