This plugin for Lightroom allows you to validate images and check for file corruption or “bit rot”. It works by computing a hash for each file and then comparing it to a previously stored value to see if your file has changed unexpectedly.
Note that this plugin is under development and should be considered a work in progress. It currently requires Lightroom 4 and runs on both Mac OS X and Windows. It works with the current version of Lightroom (v5.7).
Note: Please read the section about writing XMP to Metadata before using this plugin.
How it works
For each file, Validator reads in the entire image and computes a hash value based on every bit in the image file. You can think of this as a digital fingerprint, and just as no two humans have the exact same fingerprints, no two files have the exact same hash (mathematically speaking, there is a very very small probability that two files will have the same hash but it’s close enough to zero that we can ignore it).
The way the hash algorithm works is that if any bit in the file changes, the hash value/fingerprint also changes. For example, if I take my picture of the Arc de Triomphe and compute a hash (with the MD5 algorithm) we get a value of 2642a5f5f7468c115e63f7e35c67c20a.
Now if we test this by making small changes to the file, for example taking a single pixel in the sky and modifying the value from RGB = (40,41,80) to RGB = (40,41,81), we get a totally different hash: 435070d0f98d682502ed058833780e96.
The main idea is that if your data is corrupted we can detect it by recomputing the hash and comparing it to the original value when we first brought the image into Lightroom. Detecting these changes is critical because the last thing we want is for an image file to be silently damaged and then copied into all of our backups.
Writing XMP to Metadata
Under the catalog settings in Lightroom there is an option to “Automatically write changes into XMP”. (On a Mac, choose Lightroom > Catalog Settings and click the Metadata tab. On Windows choose Edit > Catalog Setting and click the Metadata tab). What this option means is that for file types with publicly documented formats (i.e., TIFF, JPEG, PSD, and DNG), Lightroom will write metadata such as captions and keywords directly into the file. Since the actual file changes, this can cause the hash value to change as well.
I recommend leaving this option turned off. Since I often go back and alter captions or add keywords to an image, this behavior of writing to the image file is undesirable as it will cause the hash to change even though the image data is still good. There are three main ways of dealing with this issue:
- Turn off write XMP metadata into files. Lightroom will store this information in the catalog and your image files won’t change even if you change the metadata.
- Run Validator only on RAW files which Lightroom does not modify. For these files Lightroom puts the metadata into a sidecar file (with a .xmp extension).
- Run Validator on TIFF files (and other files types which Lightroom modifies) but update the hashes whenever you alter the metadata or make direct changes to the file such as retouching in Photoshop.
Workflow with Validator
The commands for Validator can be found under Library > Plug-in Extras. Here’s how I use Validator with my images:
- I run Generate Hashes on all of my RAW files and any TIFF master images (client ready images that have all necessary edits and retouching done).
- Every few months, I select all of my images (RAW + TIFF masters) and run Verify Files.
- Verify Files will identify any image file that has changed and place them into a collection (the default is validator_changed)
- I manually verify the changed files:
- If the image file is okay, I run Accept Changed Hashes to update the hash values.
- If an image has been corrupted (only happened once to me), I replace it with a working version from my image backups.
In most cases, if Verify Files turns up a change, it is usually is a TIFF image where I performed some additional editing in an external program like Photoshop. Lightroom never modifies RAW files so their hash values should always stay the same.
These commands can be found under the menu item Library > Plug-in Extras. There are five commands: Generate Hashes, Verify Hashes, Accept Hashes, Clear Hashes, and Help.
This command will generate a hash value for all selected images and store the value in the plug-in specific metadata fields Archive Hash and Archive Date. The command will bring up a dialog box where you can select the type of files for which you want to generate hashes, and whether you want to include virtual copies (the hash for a virtual copy is the same as for the original image).
Generate Hashes will run as a background task in Lightroom. On completion, you will see a dialog box with summary statistics. In addition, the log file will store a detailed record of the actions taken for each image and indicate if there were any errors (such as problems reading an image file).
For images that have an Archive Hash, this command will verify whether the file has changed by computing a new hash and comparing it to the existing value. Files which have changed will be added to the specified collection.
Once the command runs, a dialog box summarizing the results is shown. In this case we have 3 files that have changed.
Verify Files will store the new hash in the field Last Hash. It will not change the Archive Hash.
Accept Changed Hashes
If there are changed files (Last Hash does not equal Archive Hash), you should manually verify if this is because (1) you made edits to the file or its metadata or (2) the file is damaged/corrupted. In the former case, you can accept the new hash value by selecting the appropriate images and running the command Accept Changed Hashes. This will update the Archive Hash and Archive date with new values.
Note: this command ignores selected images where there is no change in hash values.
This function clears all hash values from the selected images. Normally you should not need to use this function.
Running the Help function will bring up this webpage in a browser window.
Validator creates 5 custom metadata fields to track image hashes:
- Archive Hash — the hash value of the image
- Archive Date — date the ArchiveHash was computed
- Last Hash — the most recent hash value computed by running Verify Files
- Last Date — date the Last Hash was computed
- Status — takes on one of four values: MATCH, CHANGE, NEW, N/A
Note that these fields are read-only and cannot be edited directly. You can change the fields only by executing the various Plug-in menu commands.
You can show the fields by setting the Metadata panel to display “Validator”:
Richard Costin says
Great looking plugin, will be testing this tonight!
Igor Shishkin says
It’s very interesting idea to count MD5 from the image. I’m not sure how Lightroom generate previews but MD5 result could depend on profile applied to raw image(to generate RGB image), please fix me if I’m wrong.
Don’t you want to find image duplicates with the same idea?
Igor — with RAW images lightroom won’t touch the image file and will store edits either in the database or in a separate XMP sidecar file. So the MD5 hash won’t change.
However, for formats like TIFF, JPEG, and DNG lightroom may write information into the file depending on the catalog settings causing the hash to change. See the section on writing XMP to metadata above to avoid this from happening.
Regarding duplicates, I think this would be useful but hashes like md5 are probably not the right technique. Ideally I would like a duplicate finding tool to identify a jpeg or tiff derivative as being identical to the original RAW. However a jpeg version of a file will have a completely different hash from the RAW.
Eyal Oren says
Just checking that the plugin is still working with LR 5.6 as the notes above reference LR 4 and there are no release notes in the zip file.
Yes Validator works in LR 5.6. I need to update the documentation.
Hi Stephen, I just learned about your plug-in. I have tested it on a small batch of RAW photos using LR 5.7 and it looks very promising.
Do you have any plans to bring the tool forward to a 1.0 release and update documentation?
Thanks for efforts on this and best regards,
Peter — I expect to continue developing Validator although I do not have any major improvements planned at this point. I expect/hope that Lightroom will get PSB support (for large files) and I will update the list of file types when that happens.
Are there particular features that you would like to see? or topics in the documentation?
Hi Stephen, thanks for the response. The only purpose of my comment was that I have always assumed that a beta release was always subject to updates/changes of direction whereas a 1.0 type release indicates a somewhat more stable and tested environment.
At this point I am happy with what you have been able to accomplish. My plans after the beginning of the year are to install Validator on my main catalog and use it only on RAW files so I can continue to write updates to XMP [captions, settings, keywords, etc.]. Am I correct that Lightroom has plug-in specific fields within the catalog structure and that is where you populate the hash values? Where do the hash values get stored if writing XMP to sidecar? In the sidecar or catalog?
Best regards and Merry Christmas
I released Validator as a beta mainly to allow time for bugs and UI issues to surface from users. I don’t plan to add major functionality unless I get significant user requests.
Lightroom stores fields from plug-ins inside the catalog as part of the underlying database. However, Lightroom will not write the plug-in data into the XMP sidecar files (this was a surprise to me). So the only way to export the hash values with the images is to use the “Export as catalog” function. I don’t know if Adobe plans to have the custom metadata written into the XMP in future versions.
Hi Steven, thanks for developing this plugin it’s great.
I’ve come across an error with one of my video files that’s around 2gb in size.
“An internal error has occurred: Attempted to seek past end of file”
Would this error be because the file is so large? The next largest file I’ve tried the plugin on was about 435mb and it worked fine.
If there’s any other information or logs you’d like me to send you to help find the problem let me know.
Thanks for the bug report. I can confirm there’s an issue with very large file sizes and I’ll be working on an update.
I’ve released an updated version of Validator that fixes the bug with large files and adds PNG support. It’s available on the download page.
Rick Warburton says
How does your plugin differ from Library/Validate DNG Files? Other than yours working with more file types.
Rick — as you note, the main benefit of the Validator plugin is that it works on additional file types including RAWs from Canon, Nikon, and other manufacturers. If you exclusively use DNG then you can use the built-in DNG validation.
I have a feature request.
Would it be possible to add the hash value of a raw/dng photo to the EXIF of any export formats (JPEG, TIFF, etc.) of the same photo? For the purpose of validating ownership of the photo.
Yes EXIF data can be changed or erased, but if a person using your photo erases the data they could not confirm the hash value on request, nor could they generate the correct hash without the original raw/dng.
Just an idea based on your plugin.
Also, is your plugin compatible with LR6 / CC?
The plugin should work with LR6/CC — I’ve been using it since LR6 release and have not heard any bug reports from users.
Regarding saving the hash in EXIF, I don’t think I understand your use case. Why would you ask a person using your photo for the hash value? I think if you want to prove your ownership, you could use the original RAW image. That said, I would like to be able to save the hash with the metadata of an image for archiving purposes. Right now, Adobe Lightroom does not export metadata from plugins (the hash is only stored in the database). I’ll need to look into how to do this with the SDK.
Hello, Would it be possible to add a “ignore XMP files?” Many LR users use the “automatically write changes in XMP feature as a backup to the catalog file and turning that off is kind of a deal breaker…
With Validator you can run it with “automatically write changes into XMP” left on but be aware that you might get hash mismatches due to the meta data being modified and not the image data itself. E.g. if you add a keyword to a tiff file with XMP on, you will change the original file and hence Validator will flag it even though the image data is the same. If this is the behaviour you want, then all is good. However I personally don’t like this because I frequently add keywords to old images and I only want to flag actual image corruption.
However, for RAW files LR stores the XMP as a sidecar file so the hash shouldn’t be changed. I.e. for RAW files the write-XMP feature doesn’t make a difference.
Hi. I realize this is an old post, but am hoping you can help me. I am paring down 30,000 photos by removing duplicates, unwanted images, etc. Yesterday I tried to email a few images through Lightroom and received the error message that one was corrupt. Although the thumbnail looks great, the lower 1/3 of the actual image is completely black. Are you aware of a plug-in or any other method that I can do a mass search for files that are already corrupt? If I run Validator, I think it wouldn’t accomplish my goal because it will generate files based on the images’ current, corrupt state. Thanks.
You’re correct that Validator won’t attempt to verify the structure of the image file and can’t detect files that are already corrupt. Since Lightroom produced an error message, can you try a bulk export of all your images and see if any more error messages are thrown? If your images are DNG, I believe you can use the built in validation tools which checks for more than just hash changes.
Wayne Mor says
I have been collating my entire families images and have about 100K images with numerous duplicates.
I read your comment some years ago above
“Regarding duplicates, I think this would be useful but hashes like md5 are probably not the right technique. Ideally I would like a duplicate finding tool to identify a jpeg or tiff derivative as being identical to the original RAW. However a jpeg version of a file will have a completely different hash from the RAW.”
Do you know of a LR tool which can achieve this outcome?
Most of the duplicate features I have reviewed work on metadata rather than image content. This seems to have two outcomes, different images are detected as duplicates or as you know a small change or clearing of metadata results in duplicates going undetected.
Wayne — I don’t know of any tool / plugin that will find duplicates/similars based on image appearance in Lightroom. In fact, I suspect this can’t really exist because the Lightroom SDK doesn’t have any functions for accessing the image data itself.
At one point I did play around with OpenCV (Open Source Computer Vision Library) in python and it’s not hard to come up with some basic methods for finding slightly different images (e.g. resized, slight crop, slight tone & color variations) but I never made anything formal.
If you are willing to work outside of Lightroom, you may want to try something like https://sourceforge.net/projects/imgseek/ or https://tn123.org/simimages/ . I’ve not used these programs personally.
David Kotz says
This looks great – indeed, exactly what I have been looking for. Of course I discovered it only after writing my own tool, which has very similar behavior but runs strictly as a command-line bash script. (On the other hand, that means I can use it for anything, not just LR files.)
Do you plan to update this script for the latest Lightroom CC Classic?
It should work on the latest version of LR CC Classic.