User forums now online! Please see the latest post in the Announcements forum for more information.

Author Topic: Image "Similarity". Is this a metric in the database?  (Read 51 times)

0 Members and 1 Guest are viewing this topic.

hockeyrink

  • Member
  • **
  • Posts: 8
    • View Profile
I have a website, and many of the images are low-res. My hi-res imagery on the server has some issues where the website product image ("img1234.jpg") should have been "img1234-SKU.jpg", but ISN'T.

So when I did a search for the largest version of "img1234.jpg" from my webserver, it often got it wrong, finding a different version of "img1234.jpg".

I'd like to know if the image similarity feature could work this out for me. Like:
  • FIND the website's image name & similarity metric
  • COMPARE it to all other images of the same name, then
  • SORT by metric, THEN by filesize.

Is this possible in a TB database, or am I gonna have to build an ugly BASH script using imagemagik?  :-[

Daan van Rooijen

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 832
    • View Profile
Re: Image "Similarity". Is this a metric in the database?
« Reply #1 on: 2019-08-14 22:51:08 »
When you open your database in Access, MDB Viewer or some other tool, you'll find fields named 'metric1' and 'metric2' in the Thumbnail table. But how those are used in similarity comparisons is probably only for Cerious to know.

As a user of the program, you could simply:

- Download your website's files
- Thumbnail all images
- Press Ctrl-F on any image that you suspect has differently-sized duplicates, and use the Image Similarity tab to locate them.

I'm volunteering as a moderator - I do not work for Cerious Software, Inc.

hockeyrink

  • Member
  • **
  • Posts: 8
    • View Profile
Re: Image "Similarity". Is this a metric in the database?
« Reply #2 on: 2019-08-14 23:08:55 »
Gotcha. That's a reasonable starting point. Thanks for the pointer!

Daan van Rooijen

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 832
    • View Profile
Re: Image "Similarity". Is this a metric in the database?
« Reply #3 on: 2019-08-14 23:58:40 »
Good, I hope it will help you fix the problem!

Of course, you could also use the "Edit | Find Similar" function (with a threshold setting of 5 or so), to find all different sets of similar images at one time. In the results list, you could Tag (press INS) all images that should be renamed to reflect their higher resolution. This would create a Tagged Images gallery that contains all images that need renaming. Maybe that's a faster method.
I'm volunteering as a moderator - I do not work for Cerious Software, Inc.

hockeyrink

  • Member
  • **
  • Posts: 8
    • View Profile
Re: Image "Similarity". Is this a metric in the database?
« Reply #4 on: 2019-08-15 12:57:52 »
Hmm... Might be a plan if I can't automate this. I have literally 2500 images to review to make sure I've got the largest file of.

Did some reading up on image similarities (pHash), and those "metric1" and "metric2" fields may be the key for me. Seems to be a 512bit field, which could be the results of a 64x64 image analysis (color & contrast maybe?). Then you are supposed to do something called a "Hamming distance" analysis, which is essentially "how many changes to FOO has to be made to match BAR?". The closer the similarity, the lower the Hamming distance.

I'll update the forum on how the process goes...

hockeyrink

  • Member
  • **
  • Posts: 8
    • View Profile
Re: Image "Similarity". Is this a metric in the database?
« Reply #5 on: 2019-08-19 13:47:49 »
Results are promising. I've tested the hamming distance between 3 images (original to find larger version of, LARGEST (wrong) file with same name, and a larger file with same name that is correct (but smaller than the LARGEST). I've got some PHP code here that illustrates the difference.

I've trimmed off the "0x" prefix and extra zeros off the METRIC1 AND METRIC2 fields from the thumbsplus database for these test images:

Original website image that I'm trying to find the largest local version of:
https://imgur.com/ieCuE3b

Largest local file I have with same filename (which is...very wrong):
https://imgur.com/DsVwOzy

Large(r) local file I have that is confirmed correct:
ttps://imgur.com/uMRCcMb

Hamming distance compares a generated thumbnail as a binary to see how many changes to one has to be made to match the other. Fewer changes = closer match. I've tested these both as HEX and BINARY numbers:

Metric 1 checked as HEX:

Test image vs confirmed GOOD: 17
Test image vs confirmed BAD (same name): 51

Metric 1 checked as BIN:

Test image vs confirmed GOOD: 14
Test image vs confirmed BAD (same name): 29

Metric 2 checked as HEX:

Test image vs confirmed GOOD: 35
Test image vs confirmed BAD (same name): 111

Metric 2 checked as BIN:

Test image vs confirmed GOOD: 18
Test image vs confirmed BAD (same name): 58

Both BIN and HEX results seem to indicate METRIC2 offers a better detection in this case. At least this will give me a metric to safely programmatically say "THIS image is NOT like that image...at all!".

Here's the code, borrowed and modified from Nitin Mittal (for metric 1):
Code: [Select]
<?php
// PHP program to find hamming distance b/w
// two string

// function to calculate
// Hamming distance
function hammingDist($str1$str2)
{
    
$i 0$count 0;
    while (isset(
$str1[$i]) != '')
    {
        if (
$str1[$i] != $str2[$i])
            
$count++;
        
$i++;
    }
    return 
$count;
}

    
// Driver Code this is for img_2220.jpg
        // str1 = website source
        // str2 = largest local source that is similar
        // str3 = simply largest source file (which is wrong)
    
$str1 "FFFFE701C301C10199019917BD7FBD7FB97F397FB97FB37F837FC750FF00FFFF";
    
$str2 "FFFFE701E301C101993B997FBDFF3DFF39FF39FF39FFB3FF83FFC77EFF00FFFF";
    
$str3 "0003000F601740035FF7581F581F58175817F81F781F781F7C17FFFFFFFFFFFF";

    
$str1b hex2bin($str1);
    
$str2b hex2bin($str2);
    
$str3b hex2bin($str3);


    
// function call
    
echo nl2br ("Metric 1 checked as HEX: \n");
    echo 
nl2br ("\nTest image vs confirmed GOOD: " hammingDist ($str1$str2));
    echo 
nl2br ("\nTest image vs confirmed BAD (same name): " hammingDist ($str1$str3));
    
// function call
    
echo nl2br ("\n\nMetric 1 checked as BIN: \n");
    echo 
nl2br ("\nTest image vs confirmed GOOD: " hammingDist ($str1b$str2b));
    echo 
nl2br ("\nTest image vs confirmed BAD (same name): " hammingDist ($str1b$str3b));


// This code is contributed by nitin mittal.
?>