Detect Corrupt Photos (using Bash)
Corrupt Images
Corrupt (broken) image files have been popping up for a few years but I’ve only ever fixed the damaged image files by restoring from backups as and when I found them, until now.
Possible Causes
I am not certain of the causes of the corrupt images, they could have been caused by a dying hard drive, or corrupted in RAM and then saved to disk.
Automated Detection
This weekend I decided to systematically find all the defective images
in my collection (both JPEG and .CR2
raw files). I couldn’t find a free
ready-made solution so I ended up creating my own and I thought I would
document it here for others to find as I needed to consolidate a lot of posts &
pages on the Internet with a bit of experimentation to get to this point.
Overview
The solution uses Bash and ImageMagick®. I’ve developed and tested it using Cygwin but I’m sure it will work on most Linux distributions as well.
Required Libraries and Dependencies
You will need ImageMagick and if you would like to also check raw files
you will also need to install ufraw-batch
.
Solution
Step #1 is to enumerate all the files you wish to check:
find . -iname \*\.jp*g -o -iname \*\.CR2 -type f > all-files.out
Step #2 is to ensure you have the required files:
touch done.out
touch failed.out
Then run this line:
awk '{if (f==1) { r[$0] } else if (! ($0 in r)) { print $0 } } ' f=1 done.out failed.out f=2 all-files.out | xargs -n 1 -P 2 -I '{}' ./check-image.sh "{}"
The first part facilitates resuming the command.
Here awk
removes the lines in done.out
and failed.out
so we
don’t need to check them again.
The xargs
then calls the check-image.sh
script (see below) in parallel (-P 2
).
check-image.sh
script looks like:
if identify -verbose "$1" >/dev/null; then
echo "$1" >> done.out
else
echo "$1" >> failed.out
fi
Improvements
There is a possibility that two (or more) of the multiple processes could write to the done or failed file at exactly the same time causing a mess of two lines, but this hasn’t yet happened to me even using 3 threads. I think this is largely due to how long it takes to process an image file, especially a raw file, compared with the short amount of time to write to a file.
Comments
If you have any comments or suggestions please let me know.