All forums > ThumbsPlus v10 Questions

Image "Similarity". Is this a metric in the database?

<< < (2/2)

hockeyrink:
Results are promising. I've tested the hamming distance between 3 images (original to find larger version of, LARGEST (wrong) file with same name, and a larger file with same name that is correct (but smaller than the LARGEST). I've got some PHP code here that illustrates the difference.

I've trimmed off the "0x" prefix and extra zeros off the METRIC1 AND METRIC2 fields from the thumbsplus database for these test images:

Original website image that I'm trying to find the largest local version of:
https://imgur.com/ieCuE3b

Largest local file I have with same filename (which is...very wrong):
https://imgur.com/DsVwOzy

Large(r) local file I have that is confirmed correct:
ttps://imgur.com/uMRCcMb

Hamming distance compares a generated thumbnail as a binary to see how many changes to one has to be made to match the other. Fewer changes = closer match. I've tested these both as HEX and BINARY numbers:
Metric 1 checked as HEX:

Test image vs confirmed GOOD: 17
Test image vs confirmed BAD (same name): 51

Metric 1 checked as BIN:

Test image vs confirmed GOOD: 14
Test image vs confirmed BAD (same name): 29
Metric 2 checked as HEX:

Test image vs confirmed GOOD: 35
Test image vs confirmed BAD (same name): 111

Metric 2 checked as BIN:

Test image vs confirmed GOOD: 18
Test image vs confirmed BAD (same name): 58
Both BIN and HEX results seem to indicate METRIC2 offers a better detection in this case. At least this will give me a metric to safely programmatically say "THIS image is NOT like that image...at all!".

Here's the code, borrowed and modified from Nitin Mittal (for metric 1):

--- Code: ---<?php
// PHP program to find hamming distance b/w
// two string

// function to calculate
// Hamming distance
function hammingDist($str1, $str2)
{
    $i = 0; $count = 0;
    while (isset($str1[$i]) != '')
    {
        if ($str1[$i] != $str2[$i])
            $count++;
        $i++;
    }
    return $count;
}

    // Driver Code this is for img_2220.jpg
        // str1 = website source
        // str2 = largest local source that is similar
        // str3 = simply largest source file (which is wrong)
    $str1 = "FFFFE701C301C10199019917BD7FBD7FB97F397FB97FB37F837FC750FF00FFFF";
    $str2 = "FFFFE701E301C101993B997FBDFF3DFF39FF39FF39FFB3FF83FFC77EFF00FFFF";
    $str3 = "0003000F601740035FF7581F581F58175817F81F781F781F7C17FFFFFFFFFFFF";

    $str1b = hex2bin($str1);
    $str2b = hex2bin($str2);
    $str3b = hex2bin($str3);


    // function call
    echo nl2br ("Metric 1 checked as HEX: \n");
    echo nl2br ("\nTest image vs confirmed GOOD: " . hammingDist ($str1, $str2));
    echo nl2br ("\nTest image vs confirmed BAD (same name): " . hammingDist ($str1, $str3));
    // function call
    echo nl2br ("\n\nMetric 1 checked as BIN: \n");
    echo nl2br ("\nTest image vs confirmed GOOD: " . hammingDist ($str1b, $str2b));
    echo nl2br ("\nTest image vs confirmed BAD (same name): " . hammingDist ($str1b, $str3b));


// This code is contributed by nitin mittal.
?>
--- End code ---

Navigation

[0] Message Index

[*] Previous page

Go to full version