Detect Corrupt Photos (using Bash)

Corrupt Images

Corrupt (broken) image files have been popping up for a few years but I’ve only ever fixed the damaged image files by restoring from backups as and when I found them, until now.

Possible Causes

I am not certain of the causes of the corrupt images, they could have been caused by a dying hard drive, or corrupted in RAM and then saved to disk.

Automated Detection

This weekend I decided to systematically find all the defective images in my collection (both JPEG and .CR2 raw files). I couldn’t find a free ready-made solution so I ended up creating my own and I thought I would document it here for others to find as I needed to consolidate a lot of posts & pages on the Internet with a bit of experimentation to get to this point.

Overview

The solution uses Bash and ImageMagick®. I’ve developed and tested it using Cygwin but I’m sure it will work on most Linux distributions as well.

Required Libraries and Dependencies

You will need ImageMagick and if you would like to also check raw files you will also need to install ufraw-batch.

Solution

Step #1 is to enumerate all the files you wish to check:

find . -iname \*\.jp*g -o -iname \*\.CR2 -type f > all-files.out

Step #2 is to ensure you have the required files:

touch done.out
touch failed.out

Then run this line:

awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 done.out failed.out f=2 all-files.out | xargs -n 1 -P 2 -I '{}' ./check-image.sh "{}"

The first part facilitates resuming the command. Here awk removes the lines in done.out and failed.out so we don’t need to check them again.

The xargs then calls the check-image.sh script (see below) in parallel (-P 2).

check-image.sh script looks like:

if identify -verbose "$1" >/dev/null; then
    echo "$1" >> done.out
else
    echo "$1" >> failed.out
fi

Improvements

There is a possibility that two (or more) of the multiple processes could write to the done or failed file at exactly the same time causing a mess of two lines, but this hasn’t yet happened to me even using 3 threads. I think this is largely due to how long it takes to process an image file, especially a raw file, compared with the short amount of time to write to a file.

Comments

If you have any comments or suggestions please let me know.