Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - hockeyrink

Pages: [1]
1
Results are promising. I've tested the hamming distance between 3 images (original to find larger version of, LARGEST (wrong) file with same name, and a larger file with same name that is correct (but smaller than the LARGEST). I've got some PHP code here that illustrates the difference.

I've trimmed off the "0x" prefix and extra zeros off the METRIC1 AND METRIC2 fields from the thumbsplus database for these test images:

Original website image that I'm trying to find the largest local version of:
https://imgur.com/ieCuE3b

Largest local file I have with same filename (which is...very wrong):
https://imgur.com/DsVwOzy

Large(r) local file I have that is confirmed correct:
ttps://imgur.com/uMRCcMb

Hamming distance compares a generated thumbnail as a binary to see how many changes to one has to be made to match the other. Fewer changes = closer match. I've tested these both as HEX and BINARY numbers:

Metric 1 checked as HEX:

Test image vs confirmed GOOD: 17
Test image vs confirmed BAD (same name): 51

Metric 1 checked as BIN:

Test image vs confirmed GOOD: 14
Test image vs confirmed BAD (same name): 29

Metric 2 checked as HEX:

Test image vs confirmed GOOD: 35
Test image vs confirmed BAD (same name): 111

Metric 2 checked as BIN:

Test image vs confirmed GOOD: 18
Test image vs confirmed BAD (same name): 58

Both BIN and HEX results seem to indicate METRIC2 offers a better detection in this case. At least this will give me a metric to safely programmatically say "THIS image is NOT like that image...at all!".

Here's the code, borrowed and modified from Nitin Mittal (for metric 1):
Code: [Select]
<?php
// PHP program to find hamming distance b/w
// two string

// function to calculate
// Hamming distance
function hammingDist($str1$str2)
{
    
$i 0$count 0;
    while (isset(
$str1[$i]) != '')
    {
        if (
$str1[$i] != $str2[$i])
            
$count++;
        
$i++;
    }
    return 
$count;
}

    
// Driver Code this is for img_2220.jpg
        // str1 = website source
        // str2 = largest local source that is similar
        // str3 = simply largest source file (which is wrong)
    
$str1 "FFFFE701C301C10199019917BD7FBD7FB97F397FB97FB37F837FC750FF00FFFF";
    
$str2 "FFFFE701E301C101993B997FBDFF3DFF39FF39FF39FFB3FF83FFC77EFF00FFFF";
    
$str3 "0003000F601740035FF7581F581F58175817F81F781F781F7C17FFFFFFFFFFFF";

    
$str1b hex2bin($str1);
    
$str2b hex2bin($str2);
    
$str3b hex2bin($str3);


    
// function call
    
echo nl2br ("Metric 1 checked as HEX: \n");
    echo 
nl2br ("\nTest image vs confirmed GOOD: " hammingDist ($str1$str2));
    echo 
nl2br ("\nTest image vs confirmed BAD (same name): " hammingDist ($str1$str3));
    
// function call
    
echo nl2br ("\n\nMetric 1 checked as BIN: \n");
    echo 
nl2br ("\nTest image vs confirmed GOOD: " hammingDist ($str1b$str2b));
    echo 
nl2br ("\nTest image vs confirmed BAD (same name): " hammingDist ($str1b$str3b));


// This code is contributed by nitin mittal.
?>

2
I've successfully connected TB10 to my SQL Server 2017, and it creates the data, but how can I access the data outside of TB10?

I'm doing some website-to-image reconciliations, and having the TB database available natively via an system-level ODBC connector is great (having imported copies of it previously), but having access to the latest "LIVE" data would be great. I created the table using the "Thumbs9_mssql.sql" script and successfully connected TB10 to it.

I can see the database show up in my MS SQL Server Management studio, I can expand it to see the table names, but I CANNOT expand any table to show data. All queries come back as "successful", but with zero rows! I've tried other ODBC clients too, with no success. The "Properties / Permissions" tab doesn't show anything weird, and they're the same as the other 2 databases I have running on the server at the same time (one being the imported TB database, the other the webserver database). The only think I can think of is I must be missing some sort of username permission?

The only sure-fire way to access the data is through TB10 application. Why can't I connect to this dataset via ODBC with any other client?

edit: I did find mention of the alias for dbo called "ThumbsUser" in the guide7.pdf. No luck. Plus, couldn't find reference to that in the setup sql script. Checked the "Security\Users\dbo" properties, and they match the same properties I use in the other 2 databases..

edit2: Found that I *could* open the "ThumbsPlusDatabase" table... whichk reports exacty 0 thumbnail _files. BUH?!? TB10 says (under "Thumbnail Database Statistics" there are over 141,585 Thumbnail records. Somehow, these seem to be looking at two different databases? Using the same 32-bit system ODBC connector settings? Weird. The digging continues...

edit3: FOUND IT. I must have done something wrong during configuration, as it dumped all the TB10 tables under the MASTER database. Grrr. Ok, now how to fix it without having to regen 3 hours of indexing.  >:(


3
Hmm... Might be a plan if I can't automate this. I have literally 2500 images to review to make sure I've got the largest file of.

Did some reading up on image similarities (pHash), and those "metric1" and "metric2" fields may be the key for me. Seems to be a 512bit field, which could be the results of a 64x64 image analysis (color & contrast maybe?). Then you are supposed to do something called a "Hamming distance" analysis, which is essentially "how many changes to FOO has to be made to match BAR?". The closer the similarity, the lower the Hamming distance.

I'll update the forum on how the process goes...

4
Gotcha. That's a reasonable starting point. Thanks for the pointer!

5
I have a website, and many of the images are low-res. My hi-res imagery on the server has some issues where the website product image ("img1234.jpg") should have been "img1234-SKU.jpg", but ISN'T.

So when I did a search for the largest version of "img1234.jpg" from my webserver, it often got it wrong, finding a different version of "img1234.jpg".

I'd like to know if the image similarity feature could work this out for me. Like:
  • FIND the website's image name & similarity metric
  • COMPARE it to all other images of the same name, then
  • SORT by metric, THEN by filesize.

Is this possible in a TB database, or am I gonna have to build an ugly BASH script using imagemagik?  :-[

6
I LOVE the SQL query feature in TP10. My question is how can I edit the SQL query saved in the "Found Files" gallery so I can manipulate the string / source directory? I think I must be missing something obvious.

Perhaps this is more of a feature request. I'd like to be able to specify my "Restrict search to:" selection to a manually entered location.

Now that I've looked for 15 minutes, typed in my question to the forum, I'm SURE to figure this out 30 seconds after posting my question... :o
(Yeah. Kludge fix is to just copy the SQL from the old saved "Found Files" to a new one)

7
Default action = "View Image"

Ah... here we go. Loaded up the old V7, and saw the "Equivalent to Type:" was set to Photoshop. Reset back to "(none)" / Image / Raster fixed it.

One last comment - is it my imagination that the database loads slower in V8 than V7? I mean, the thumbnails actually take a second to display in V8, whereas in V7 it's almost instant. Is there some optimization that I've yet to apply to the new database that causes this new lag?

Thanks,
Dave

8
Making jump from V7 to v8, and for some stupid (user) reason, I can't double-click or ctrl-enter an thumbnail to have it display in large view. I'm searching through settings, but can't seem to locate where/what I did wrong to disable this functionality.

Please, can somebody point a finger to the right menu for me?

Thx,
Dave

Pages: [1]