Provide which wiki an image suggestion is found on
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	• sdkim
	Feb 25 2021, 8:57 PM

Description

Context

Android wants the ability to provide the reason for an image suggestion. For example,

source: https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/Android/Add_an_image_MVP

As the lower part of the image suggestion describes:

Suggestion reason: Used in the same article on another language Wikipedia: German

The algo provides this in the raw data from the note column. There has been requests if we can change things like jawiki to Japanese Wikipedia.

Acceptance Criteria

Given I have made a request to the Image Suggestion API, I expect to receive a found_on for each image suggestion.

Example Response

[
  {
    "page": "Cat",
    "suggestions": [
      {
        "filename": "Cheetah.jpg",
        "source": "Wikipedia",
        "found_on": [ 
           "cswiki", 
           "nlwiki",
           "zhwiki",
           "azbwiki",
           "dewiki",
           "viwiki"
        ]
        "confidence_rating": "string"
      }
    ]
  }

Open Questions

Out of scope

MediaSearch does not provide reasons for suggestions as to where it was found on

Details

Subject	Repo	Branch	Lines +/-
Return found_on data with image matching algorithm suggestions	mediawiki/services/image-suggestion-api	master	+23 -10
Deterministic randomized image suggestion results	mediawiki/services/image-suggestion-api	master	+266 -76
Adjust sqlite schema and code for found_on column in .tsv files	mediawiki/services/image-suggestion-api	master	+31 -29

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		JTannerWMF	T272872 [EPIC] Image Recommendations Android MVP
		Resolved		BPirkle	T275816 Provide which wiki an image suggestion is found on

Event Timeline

• sdkim created this task.Feb 25 2021, 8:57 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 25 2021, 8:57 PM

• sdkim updated the task description. (Show Details)Feb 25 2021, 9:05 PM

Restricted Application added a subscriber: Stang. · View Herald TranscriptFeb 25 2021, 9:05 PM

• sdkim updated the task description. (Show Details)Feb 25 2021, 9:06 PM

• sdkim added subscribers: BPirkle, gmodena, Dbrant, JTannerWMF.Feb 25 2021, 9:14 PM

Cparle subscribed.Mar 10 2021, 9:46 AM

@sdkim I don't think passing back the raw data from the note column in the API response is as useful as it might be. This will be consumed by Android and our bot writers, and will need to be parsed/interpreted. Rather than doing multiple implementations of the parsing/interpreting code it'd be better for the PET (or even research) to do this upstream, and to return something like "found_on": [ "enwiki", "frwiki", ... ]

Hi @Cparle "found_on" is a good idea and if it's possible to add "found_on_filter" to choice specific wikis.

Restricted Application added a subscriber: alaa. · View Herald TranscriptMar 10 2021, 12:07 PM

• sdkim updated the task description. (Show Details)Mar 10 2021, 9:36 PM

• sdkim updated the task description. (Show Details)

• sdkim added a project: Image-Suggestion-API.Mar 12 2021, 5:18 PM

• sdkim moved this task from Backlog to v1 on the Image-Suggestion-API board.Mar 12 2021, 5:34 PM

• sdkim moved this task from v1 to Proof of Concept on the Image-Suggestion-API board.Mar 15 2021, 7:59 PM

@Cparle Does MediaSearch provide this per suggestion? This is available from ImageMatchAlgo but want to clarify whether MediaSearch provides it or not

One nit: the example shows source: Wikidata and found_on: [cswiki, ...].
Information re which wiki an image was found on will only be available for source: wikipedia.
Thus found_on is a property of Wikipedia sources only.

For wikidata and commons sources the reason why an image was chosen is tautological. Eg. an image is chosen by ImageMatchAlgo because:

image was in the Wikidata item
image was selected at random from the Commons category linked in the Wikidata item

Does MediaSearch provide this per suggestion?

No. MediaSearch returns no data about whether an image is used on a wiki, and it's not on our roadmap to provide it. We could investigate if you need us to, but there will v likely be a performance cost

• sdkim updated the task description. (Show Details)Mar 16 2021, 4:04 PM

@Dbrant are Android's needs met with the example response above? I vaguely remember you asking about a language code of some sort

• sdkim updated the task description. (Show Details)Mar 16 2021, 4:13 PM

@sdkim Yep, that looks good in the Wikipedia case. And if I understand correctly, if the image comes from the Wikidata entity, the source would be Wikidata? (with an empty or nonexistent found_on list?)

And if I understand correctly, if the image comes from the Wikidata entity, the source would be Wikidata? (with an empty or nonexistent found_on list?)

I think so. But right now we currently have source enums for "ima" or "ms" which might need to better reflect.
Possibly considering an image_source (Wikipedia, Wikidata, Commons) and an algorithm_source (ImageMatchAlgo, MediaSearch)? But need to discuss with the team

• sdkim moved this task from Backlog to Ready on the Platform Team Workboards (Image Suggestion API) board.Mar 18 2021, 3:00 PM

• sdkim set the point value for this task to 3.Mar 18 2021, 4:14 PM

LGoto added a parent task: T272872: [EPIC] Image Recommendations Android MVP.Mar 18 2021, 4:31 PM

• sdkim renamed this task from Provide the reason for an image suggestion to Provide which wiki an image suggestion is found on.Mar 24 2021, 2:58 PM

We continue to have some confusion surrounding the word "source". We are currently using it to mean both Algorithm vs MediaSearch and to specify how the Algorithm identified a suggestion.

I strongly feel we need to change our language to disambiguate between these two meanings. What about:

suggestion_source: Image Matching Algorithm vs MediaSearch
image_source: Wikipedia/Wikidata/Commons

I like "suggestion_source" better than the "algorithm_source" proposed above, because "algorithm" is already an overloaded word to mean both the common usage of the word (a procedure for solving a problem) and as a shorthand name for Image Matching Algorithm. Plus we're already in a "suggestion" block in the response data in the spot where this piece of data is specified. With that said, I'll be agreeable to term that disambiguates the two meanings of "source".

The task description currently says:

[
  {
    "page": "Cat",
    "suggestions": [
      {
        "filename": "Cheetah.jpg",
        "source": "Wikipedia",
        "found_on": [ 
           "cswiki", 
           "nlwiki",
           "zhwiki",
           "azbwiki",
           "dewiki",
           "viwiki"
        ]
        "confidence_rating": "string"
      }
    ]
  }

The "source": "Wikipedia" seems out of place. This should be either "ima" or "ms" (or if we prefer different terms for specifying Image Matching Algorithm vs MediaSearch I'm okay with renaming those). But this field should not be used to specify anything related to the internal details of either of those things. That should be in its own nested block. What about:

[
  {
    "page": "Cat",
    "suggestions": [
      {
        "filename": "Cheetah.jpg",
        "suggestion_source": "ima",
        "confidence_rating": "string",
        "details": {
          "image_source": "Wikipedia",
          "found_on": [ 
             "cswiki", 
             "nlwiki",
             "zhwiki",
             "azbwiki",
             "dewiki",
             "viwiki"
          ]
      }
    ]
  }

In the above example, the format of the "details" block would be dependent on the "suggestion_source" value. Different suggestion sources might have very different available details.

Alternatively, we could transform suggestion_source from a string field into a block with the suggestion source name and a nested variable-format details sub-object.

[
  {
    "page": "Cat",
    "suggestions": [
      {
        "filename": "Cheetah.jpg",
        "confidence_rating": "string",
        "suggestion_source": {
          "name": "ima",
          "details": {    
            "image_source": "Wikipedia",
            "found_on": [ 
              "cswiki", 
              "nlwiki",
              "zhwiki",
              "azbwiki",
              "dewiki",
              "viwiki"          
            ]
          }
        }
      }
    ]
  }

Great points @BPirkle . I agree that we should be explicit about the suggestion source and image source.

[
  {
    "page": "Cat",
    "suggestions": [
      {
        "filename": "Cheetah.jpg",
        "suggestion_source": "ima",
        "confidence_rating": "string",
        "details": {
          "image_source": "Wikipedia",
          "found_on": [ 
             "cswiki", 
             "nlwiki",
             "zhwiki",
             "azbwiki",
             "dewiki",
             "viwiki"
          ]
      }
    ]
  }

I am personally a fan of this one but would be interested to hear @Cparle and @Dbrant 's thoughts?

Definitely agree about untangling "image source" from "suggestion source", and any of the proposed structures would work perfectly well for us, but I might actually lean towards @BPirkle's last suggested structure, which IMO is the most semantically accurate, i.e. putting source-specific details in the actual structure of the suggestion source.

I like the last one better as well. Considering the discussion in T277190: Return results in a randomized deterministic way, I'd actually prefer to wrap the entire response in a containing object, so we'd have somewhere to put the seed value, and any other fields we think of in the future. So something like:

{
  "seed": 12345,
  "pages": [
     {
       "page": "Cat",
       "suggestions": [
         {
           "filename": "Cheetah.jpg",
           "suggestion_source": "ima",
           "confidence_rating": "string",
           "details": {
             "image_source": "Wikipedia",
             "found_on": [ 
                "cswiki", 
               "nlwiki",
                "zhwiki",
                "azbwiki",
                "dewiki",
                "viwiki"
             ]
         },
         {
           <another suggestion>
         }
       ]
     },
     {
        <another page>
     }
  ]
}

However, @Cparle said this in T277190:

I've been telling people that the format we have is fixed, and we'll be versioning changes, so I guess we should bump the version if we're changing the format

I had thought that v0 implied unstable, but I can only find discussion on that, not anywhere it was actually agreed to. And even that was related to the API Gateway, and we're not (yet) exposing the image suggestions service there. So maybe bumping the version is the right thing to do.

Who is actually hitting the service right now, and how much effect would a change have on them? Would we need to maintain (at least for a transition period) a v0 endpoint that produces the current format in addition to a v1 endpoint with the new format? A transition period is doable, but would be a bit more coding/testing than just switching to a new v1 endpoint.

Who is actually hitting the service right now, and how much effect would a change have on them? Would we need to maintain (at least for a transition period) a v0 endpoint that produces the current format in addition to a v1 endpoint with the new format?

Let me check with our bot writers ...

Who is actually hitting the service right now, and how much effect would a change have on them?

I talked to our test bot writers, and they're ok with us changing the format

BPirkle mentioned this in T277190: Return results in a randomized deterministic way.Mar 31 2021, 6:07 PM

Change 677067 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/services/image-suggestion-api@master] Deterministic randomized image suggestion results

https://gerrit.wikimedia.org/r/677067

gerritbot added a project: Patch-For-Review.Apr 6 2021, 2:27 AM

Change 677313 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/services/image-suggestion-api@master] Adjust sqlite schema and code for found_on column in .tsv files

https://gerrit.wikimedia.org/r/677313

Change 677313 merged by jenkins-bot:

[mediawiki/services/image-suggestion-api@master] Adjust sqlite schema and code for found_on column in .tsv files

https://gerrit.wikimedia.org/r/677313

BPirkle mentioned this in rMSIS72bc411498fb: Adjust sqlite schema and code for found_on column in .tsv files.Apr 8 2021, 4:33 PM

BPirkle added a subscriber: • nnikkhoui.Apr 8 2021, 9:17 PM

Change 677067 merged by jenkins-bot:

[mediawiki/services/image-suggestion-api@master] Deterministic randomized image suggestion results

https://gerrit.wikimedia.org/r/677067

BPirkle mentioned this in rMSISb8da5aca35b9: Deterministic randomized image suggestion results.Apr 13 2021, 8:29 PM

Maintenance_bot removed a project: Patch-For-Review.Apr 13 2021, 9:11 PM

Change 678970 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/services/image-suggestion-api@master] Return found_on data with image matching algorithm suggestions

https://gerrit.wikimedia.org/r/678970

gerritbot added a project: Patch-For-Review.Apr 13 2021, 9:40 PM

Change 678970 merged by jenkins-bot:

[mediawiki/services/image-suggestion-api@master] Return found_on data with image matching algorithm suggestions

https://gerrit.wikimedia.org/r/678970

BPirkle mentioned this in rMSISc71dbb8e415b: Return found_on data with image matching algorithm suggestions.Apr 14 2021, 3:42 PM

Maintenance_bot removed a project: Patch-For-Review.Apr 14 2021, 4:10 PM

BPirkle claimed this task.Apr 14 2021, 6:11 PM

Naike moved this task from Ready to In review on the Platform Team Workboards (Image Suggestion API) board.Apr 14 2021, 6:11 PM

• nnikkhoui moved this task from In review to Done on the Platform Team Workboards (Image Suggestion API) board.Apr 20 2021, 11:16 PM

• sdkim closed this task as Resolved.Jun 1 2021, 5:20 PM

Stang unsubscribed.Nov 13 2021, 11:31 PM

Meno25 removed a subscriber: • Jar.Jan 14 2023, 10:37 PM

Restricted Application added a subscriber: Ericliu1912. · View Herald TranscriptJan 14 2023, 10:37 PM

Provide which wiki an image suggestion is found onClosed, ResolvedPublic3 Estimated Story PointsActions