Page MenuHomePhabricator

Add an Image: filtering by suggestion "kind" or "confidence"
Open, MediumPublic

Description

User Story:

As a newcomer editing Wikipedia, I want to receive task suggestions that are fairly accurate and structured, so that I can get started and successfully edit on a mobile device.

Documentation:

https://www.mediawiki.org/wiki/Platform_Engineering_Team/Data_Value_Stream/Data_Gateway#Image_Suggestions
https://www.mediawiki.org/wiki/Help:Growth/Tools/Add_an_image
https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_an_image

Background:

Structured tasks provide new editors with machine generated suggestions and structured the edit in a way that helps more new account holders edit constructively (1).

We tested initial suggestions with our pilot wikis to ensure the tasks provide good suggestions the majority of the time. Task onboarding indicates that suggestions aren't always correct, and that's why editors are asked to review the suggestions. In other words, we don't expect the suggestions to be 100% accurate, but they should be good suggestions ~70% of the time.

However, as we looked into scaling "add an image" to more wikis, we completed an additional round of evaluation:
T366925: Evaluate image suggestions for a new set of Wikipedias.

When reviewing these suggestions it seemed that certain suggestion "kinds" were less likely to be good suggestions. The istype-depicts and istype-commons-category suggestions were often lower quality. After removing these task kinds from the suggestion pool, we notice suggestions improve.

The downside of removing these suggestions is that the task pool is reduced dramatically.

Open questions:
  • Should we remove istype-depicts and istype-commons-category task kinds from certain wikis or all wikis?
  • Should these suggestions be removed from both the article-level and section-level tasks?
  • Should we simply allow for filtering based on confidence score?
  • Should we allow communities to adjust this within Community Configuration?
  • Should we allow communities to adjust the confidence level for "add an image" tasks?
Acceptance Criteria:

TBD

Details

Other Assignee
mfossati
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
add concatenate and existsrepos/structured-data/image-suggestions-data-pipelines!9a-pizzataT368987-fix-values-contentmain
Customize query in GitLab

Event Timeline

Moving to triaged for now, we want to focus on this in the future but we need to focus on Q1 priorities for now.

Notes from discussion:

Ideally communities could adjust this via Community Configuration in case they notice poor suggestions based on "kind" or "confidence" suggestions.

Chiming with a suggested workflow that is currently actionable.
Given a new target Wikipedia:

  1. internally evaluate a sample of suggestions with https://alis-evaluation.toolforge.org/ as in T381930: Test the "Add an Image" Structured Task on a Representative Sample of Wikipedias
  2. the data gateway API provides kind as a list and confidence as a score for every suggestion. Determine eventual kinds to filter and a confidence threshold
  3. apply those filters after calling the API

We recently discussed optimizing the filtering step. This requires a change to how flags indicating whether a Wikipedia article has any suggestions are sent to search indices: given an article, update its recommendation.image and recommendation.image_section weighted tags with the value of the highest confidence score available.

We recently discussed optimizing the filtering step. This requires a change to how flags indicating whether a Wikipedia article has any suggestions are sent to search indices: given an article, update its recommendation.image and recommendation.image_section weighted tags with the value of the highest confidence score available.

See https://wikimedia.slack.com/archives/C01DFVAQRGA/p1764065493353579?thread_ts=1763992906.169429&cid=C01DFVAQRGA

Currently the indexed search tags have a hard-coded value named exists but it could be something else, (if deemed useful at search time) tags related to image recommendations could be more granular by including the kind as separate tags.
At query time this would become (suggested syntax): hasrecommendation:image=istype-depicts.

  • hasrecommendation:image>0.8 -hasrecommendation:image=istype-depicts could filter recommendations with a score above 0.8 and without a istype-depicts kind.

Currently the indexed search tags have a hard-coded value named exists but it could be something else, (if deemed useful at search time) tags related to image recommendations could be more granular by including the kind as separate tags.
At query time this would become (suggested syntax): hasrecommendation:image=istype-depicts.

  • hasrecommendation:image>0.8 -hasrecommendation:image=istype-depicts could filter recommendations with a score above 0.8 and without a istype-depicts kind.

As I understand it, but I might be wrong (looking to @mfossati to correct me if needed), the score already represents the source of the recommendation:

  • 0.9 = image is already used for that article on another Wikipedia
  • 0.8 = image is used as P18 on the linked Wikidata Item
  • 0.7 = image either has a matching commons category or matching depicts statement on commons

So, it should be enough if we are able to say hasrecommendation:image>=0.8 in our search query to exclude the lowest category.

the score already represents the source of the recommendation:

Roughly speaking, yes. It's a function of sources. ALIS and SLIS have different implementations, though.

  • 0.9 = image is already used for that article on another Wikipedia
  • 0.8 = image is used as P18 on the linked Wikidata Item
  • 0.7 = image either has a matching commons category or matching depicts statement on commons

This should be the correct list for ALIS:

  • 90 = Wikidata image
  • 80 = Commons category or Wikipedia lead image
  • 70 = Commons depicts statement

So, it should be enough if we are able to say hasrecommendation:image>=0.8 in our search query to exclude the lowest category.

I agree, along the lines of T405059: Adapt hasrecommendation to filter by score and possibly rank by score.

  • 0.9 = image is already used for that article on another Wikipedia
  • 0.8 = image is used as P18 on the linked Wikidata Item
  • 0.7 = image either has a matching commons category or matching depicts statement on commons

This should be the correct list for ALIS:

  • 90 = Wikidata image
  • 80 = Commons category or Wikipedia lead image
  • 70 = Commons depicts statement

Oh, that is interesting, thank you for clarifying! For 80, I assume you mean "(Commons category or Wikipedia) lead image" and not "(Commons category) or (Wikipedia lead image)", correct?

For 80, I assume you mean "(Commons category or Wikipedia) lead image" and not "(Commons category) or (Wikipedia lead image)", correct?

Not sure what you mean here, but let me expand: given a Wikipedia article X (candidate for receiving suggestions) and its corresponding Wikidata item Q, Commons category means an image coming from Q's P373 property (see examples); Wikipedia lead image means the lead image of a sitelink of X, i.e., an equivalent article in another language.

For 80, I assume you mean "(Commons category or Wikipedia) lead image" and not "(Commons category) or (Wikipedia lead image)", correct?

Not sure what you mean here, but let me expand: given a Wikipedia article X (candidate for receiving suggestions) and its corresponding Wikidata item Q, Commons category means an image coming from Q's P373 property (see examples); Wikipedia lead image means the lead image of a sitelink of X, i.e., an equivalent article in another language.

Let's take the example of Gorilla from that list of examples that you linked. What I'm asking is:
Does the alogorithm suggest with a score of 80

or

  • B) any of the 40+ images that are directly in the category of Gorilla

or

  • C) something else

?

A) would make much more sense to me than B), because images in the category seem to be still very "divers" and many are much less ideal or even suitable for being the first image to illustrate an articles about Gorrilas than the one specific lead/infobox image of that category.

Merged the MR, waiting for tomorrow's run to validate.

ALIS run correctly and here the results:

spark.sql(
    """
    SELECT
        tag,
        COUNT(*) AS cnt
    FROM analytics_platform_eng.image_suggestions_search_index_delta
    WHERE snapshot = '2026-03-16'
    GROUP BY tag
    """
).show()

+--------------------+--------+
|                 tag|count(1)|
+--------------------+--------+
|recommendation.image| 4041911|
+--------------------+--------+

spark.sql(
    """
    SELECT *
    FROM analytics_platform_eng.image_suggestions_search_index_delta
    WHERE snapshot = '2026-03-16'
       AND tag = 'recommendation.image'
    """
).show()

+-----------+--------------+-------+--------------------+------+----------+
|     wikiid|page_namespace|page_id|                 tag|values|  snapshot|
+-----------+--------------+-------+--------------------+------+----------+
|aawikibooks|             0|      1|recommendation.image| [970]|2026-03-16|
|     abwiki|             0|   2441|recommendation.image| [980]|2026-03-16|
|     abwiki|             0|   2443|recommendation.image| [980]|2026-03-16|
|     abwiki|             0|   2444|recommendation.image| [980]|2026-03-16|
|     abwiki|             0|   2445|recommendation.image| [980]|2026-03-16|
|     abwiki|             0|   2451|recommendation.image| [980]|2026-03-16|
|     abwiki|             0|   2455|recommendation.image| [800]|2026-03-16|
|     abwiki|             0|   2460|recommendation.image| [800]|2026-03-16|
|     abwiki|             0|   2521|recommendation.image| [800]|2026-03-16|
|     abwiki|             0|   2527|recommendation.image| [800]|2026-03-16|
|     abwiki|             0|   2528|recommendation.image| [800]|2026-03-16|

spark.sql(
    """
    SELECT
        min(values) AS min_values,
        max(values) AS max_values
    FROM analytics_platform_eng.image_suggestions_search_index_delta
    WHERE snapshot = '2026-03-16'
      AND tag = 'recommendation.image'
      AND NOT array_contains(values, '__DELETE_GROUPING__')
    """
).show()

+-----------+-----------+
|min(values)|max(values)|
+-----------+-----------+
|      [700]|      [990]|
+-----------+-----------+

Also SLIS results look like intended:

spark.sql(
    """
    SELECT
        tag,
        MIN(values) AS min_values,
        MAX(values) AS max_values
    FROM analytics_platform_eng.image_suggestions_search_index_full
    WHERE snapshot = '2026-03-16'
      AND NOT array_contains(values, '__DELETE_GROUPING__')
    GROUP BY tag
    """
).show(truncate=False)

+--------------------------------------+-----------------+-----------+
|tag                                   |min(values)      |max(values)|
+--------------------------------------+-----------------+-----------+
|image.linked.from.wikidata.p18        |[Q100000001|1000]|[Q99|1000] |
|image.linked.from.wikipedia.lead_image|[Q100000034|1]   |[Q99|43]   |
|recommendation.image                  |[700]            |[990]      |
|recommendation.image_section          |[710]            |[970]      |
+--------------------------------------+-----------------+-----------+

spark.sql(
    """
    SELECT
        tag,
        COUNT(*) AS cnt
    FROM analytics_platform_eng.image_suggestions_search_index_full
    WHERE snapshot = '2026-03-16'
      AND NOT array_contains(values, '__DELETE_GROUPING__')
    GROUP BY tag
    """
).show(truncate=False)

+--------------------------------------+--------+
|tag                                   |count(1)|
+--------------------------------------+--------+
|recommendation.image_section          |323501  |
|image.linked.from.wikidata.p18        |5871736 |
|image.linked.from.wikipedia.lead_image|7171983 |
|recommendation.image                  |4040426 |
+--------------------------------------+--------+

spark.sql(
    """
    SELECT
        tag,
        COUNT(*) AS cnt
    FROM analytics_platform_eng.image_suggestions_search_index_delta
    WHERE snapshot = '2026-03-16'
      AND NOT array_contains(values, '__DELETE_GROUPING__')
    GROUP BY tag
    """
).show(truncate=False)

+--------------------+--------+
|tag                 |count(1)|
+--------------------+--------+
|recommendation.image|4040426 |
+--------------------+--------+

@APizzata-WMF sorry I did not spot this earlier but the values ["970"] is not correct, it unfortunately pushed the string 970 as the tag value instead of the score, the tag value must remain exists with the score appended after a |: the full string should look like ["exists|970"].
Is this possible to re-run the pipeline with such fix applied, I'll re-ship the tags right after.

Hey @dcausse I must have misunderstood and thought the exists| part could be removed. I can update the output of the tables to show the correct form and fix the code to show in the correct form from next run. Does this sound good to you?

Hey @dcausse I must have misunderstood and thought the exists| part could be removed. I can update the output of the tables to show the correct form and fix the code to show in the correct form from next run. Does this sound good to you?

yes, ideally we should overwrite the 2026-03-16 snapshot with proper values so that we can re-ship tags (all suggestions might have disappeared from the structured tasks point of view), please feel free to add me as a reviewer. thanks!

I think one of the cause of this confusion is the current contract between ALIS/SLIS and search, for historical reasons (first non-search pipeline to push data to search indices) it used search internal formats which are very fragile (magic words like __DELETE_GROUPING__ and tag/score internal string encoding). I filed T414099 to try to address some of this by updating the ALIS/SLIS pipeline to use a newer contract based on an event schema which is what newer pipelines are using (i.e. revise tone recommendations).

I have created and validated all my update commands, will shortly run them all and past here all the results with the validations. As a next step we can just rerun the image_suggestions_weekly dag correct?

I have created and validated all my update commands, will shortly run them all and past here all the results with the validations. As a next step we can just rerun the image_suggestions_weekly dag correct?

Yes we can just clear the publish_page_change_weighted_tags task, it should rerun it.

Unfortunately, due to permission on the delta and full tables, I was not able to update the faulty records. Therefore here is the MR that should fix it, I will run it locally and test the results (will post here) and merge ASAP.

Tested in the dev env and the results are good:

spark.sql("""
select tag,count(*) from  analytics_platform_eng.image_suggestions_search_index_full
where snapshot = '2026-03-16' and tag in ('recommendation.image','recommendation.image_section')
group by tag""").show()

+----------------------------+--------+
|tag                         |count(1)|
+----------------------------+--------+
|recommendation.image        |4040426 |
|recommendation.image_section|323501  |
+----------------------------+--------+

spark.sql("""
select tag, count(*) from apizzata.image_suggestions_search_index_full
where snapshot='2026-03-16' group by tag""").show(truncate=False)

+----------------------------+--------+
|tag                         |count(1)|
+----------------------------+--------+
|recommendation.image        |4040426 |
|recommendation.image_section|323501  |
+----------------------------+--------+

spark.sql("""
select tag, count(*) from analytics_platform_eng.image_suggestions_search_index_delta
where snapshot='2026-03-16' group by tag""").show(truncate=False)

+----------------------------+--------+
|tag                         |count(1)|
+----------------------------+--------+
|recommendation.image        |4041911 |
|recommendation.image_section|378379  |
+----------------------------+--------+

spark.sql("""
select tag, count(*) from apizzata.image_suggestions_search_index_delta
where snapshot='2026-03-16' group by tag""").show(truncate=False)

+----------------------------+--------+
|tag                         |count(1)|
+----------------------------+--------+
|recommendation.image        |4041911 |
|recommendation.image_section|378379  |
+----------------------------+--------+

spark.sql(""" select tag, max(values), min(values) from apizzata.image_suggestions_search_index_full
where snapshot='2026-03-16' 
group by tag""").show(truncate=False)

+----------------------------+------------+------------+
|tag                         |max(values) |min(values) |
+----------------------------+------------+------------+
|recommendation.image        |[exists|990]|[exists|700]|
|recommendation.image_section|[exists|970]|[exists|710]|
+----------------------------+------------+------------+

spark.sql(""" select tag, max(values), min(values) from apizzata.image_suggestions_search_index_delta
where snapshot='2026-03-16' 
group by tag""").show(truncate=False)

+----------------------------+------------+---------------------+
|tag                         |max(values) |min(values)          |
+----------------------------+------------+---------------------+
|recommendation.image        |[exists|990]|[__DELETE_GROUPING__]|
|recommendation.image_section|[exists|970]|[__DELETE_GROUPING__]|
+----------------------------+------------+---------------------+

spark.sql(""" select tag, max(values), min(values) from apizzata.image_suggestions_search_index_delta
where snapshot='2026-03-16' and !array_contains(values,'__DELETE_GROUPING__')
group by tag""").show(truncate=False)

+----------------------------+------------+------------+
|tag                         |max(values) |min(values) |
+----------------------------+------------+------------+
|recommendation.image        |[exists|990]|[exists|700]|
|recommendation.image_section|[exists|970]|[exists|710]|
+----------------------------+------------+------------+

Tomorrow I will find the best way to delete the bad data from the tables and rerun everything

spark.sql("""
(
select * 
 from analytics_platform_eng.image_suggestions_search_index_delta where snapshot='2026-03-16'
 and tag = 'recommendation.image'
 limit 5)
 union all 
 (
select * 
 from analytics_platform_eng.image_suggestions_search_index_delta where snapshot='2026-03-16'
 and tag = 'recommendation.image_section'
 limit 5)
  """).show(truncate=False)

+-----------+--------------+-------+----------------------------+------------+----------+
|wikiid     |page_namespace|page_id|tag                         |values      |snapshot  |
+-----------+--------------+-------+----------------------------+------------+----------+
|aawikibooks|0             |1      |recommendation.image        |[exists|970]|2026-03-16|
|abwiki     |0             |2441   |recommendation.image        |[exists|980]|2026-03-16|
|abwiki     |0             |2443   |recommendation.image        |[exists|980]|2026-03-16|
|abwiki     |0             |2444   |recommendation.image        |[exists|980]|2026-03-16|
|abwiki     |0             |2445   |recommendation.image        |[exists|980]|2026-03-16|
|huwiki     |0             |1639080|recommendation.image_section|[exists|800]|2026-03-16|
|huwiki     |0             |1640392|recommendation.image_section|[exists|710]|2026-03-16|
|huwiki     |0             |1641209|recommendation.image_section|[exists|710]|2026-03-16|
|huwiki     |0             |1641486|recommendation.image_section|[exists|800]|2026-03-16|
|huwiki     |0             |1641865|recommendation.image_section|[exists|710]|2026-03-16|
+-----------+--------------+-------+----------------------------+------------+----------+


spark.sql("""
(
select * 
 from analytics_platform_eng.image_suggestions_search_index_full where snapshot='2026-03-16'
 and tag = 'recommendation.image'
 limit 5)
 union all 
 (
select * 
 from analytics_platform_eng.image_suggestions_search_index_full where snapshot='2026-03-16'
 and tag = 'recommendation.image_section'
 limit 5)
  """).show(truncate=False)

+--------------+--------------+-------+----------------------------+------------+----------+
|wikiid        |page_namespace|page_id|tag                         |values      |snapshot  |
+--------------+--------------+-------+----------------------------+------------+----------+
|cawiki        |0             |770438 |recommendation.image        |[exists|700]|2026-03-16|
|hywiki        |0             |770438 |recommendation.image        |[exists|800]|2026-03-16|
|zh_min_nanwiki|0             |770439 |recommendation.image        |[exists|980]|2026-03-16|
|simplewiki    |0             |770439 |recommendation.image        |[exists|800]|2026-03-16|
|simplewiki    |0             |770440 |recommendation.image        |[exists|800]|2026-03-16|
|lvwiki        |0             |1      |recommendation.image_section|[exists|710]|2026-03-16|
|swwiki        |0             |2      |recommendation.image_section|[exists|800]|2026-03-16|
|lbwiki        |0             |2      |recommendation.image_section|[exists|800]|2026-03-16|
|etwiki        |0             |2      |recommendation.image_section|[exists|800]|2026-03-16|
|dewiki        |0             |3      |recommendation.image_section|[exists|800]|2026-03-16|
+--------------+--------------+-------+----------------------------+------------+----------+

spark.sql("""
(
select * 
 from analytics_platform_eng.image_suggestions_search_index_delta where snapshot='2026-03-16' and array_contains(values,'__DELETE_GROUPING__')
 and tag = 'recommendation.image'
 limit 5)
 union all 
 (
select * 
 from analytics_platform_eng.image_suggestions_search_index_delta where snapshot='2026-03-16' and array_contains(values,'__DELETE_GROUPING__')
 and tag = 'recommendation.image_section'
 limit 5)
  """).show(truncate=False)

+-------+--------------+-------+----------------------------+---------------------+----------+
|wikiid |page_namespace|page_id|tag                         |values               |snapshot  |
+-------+--------------+-------+----------------------------+---------------------+----------+
|acewiki|0             |9587   |recommendation.image        |[__DELETE_GROUPING__]|2026-03-16|
|arwiki |0             |88243  |recommendation.image        |[__DELETE_GROUPING__]|2026-03-16|
|arwiki |0             |423129 |recommendation.image        |[__DELETE_GROUPING__]|2026-03-16|
|arwiki |0             |578188 |recommendation.image        |[__DELETE_GROUPING__]|2026-03-16|
|arwiki |0             |756871 |recommendation.image        |[__DELETE_GROUPING__]|2026-03-16|
|rowiki |0             |569548 |recommendation.image_section|[__DELETE_GROUPING__]|2026-03-16|
|rowiki |0             |573262 |recommendation.image_section|[__DELETE_GROUPING__]|2026-03-16|
|rowiki |0             |574800 |recommendation.image_section|[__DELETE_GROUPING__]|2026-03-16|
|rowiki |0             |576200 |recommendation.image_section|[__DELETE_GROUPING__]|2026-03-16|
|rowiki |0             |579720 |recommendation.image_section|[__DELETE_GROUPING__]|2026-03-16|
+-------+--------------+-------+----------------------------+---------------------+----------+

All looks good to me, @dcausse if you agree I will run the task publish_page_change_weighted_tags.

All looks good to me, @dcausse if you agree I will run the task publish_page_change_weighted_tags.

Awesome, thanks! yes please feel to re-run this task

The task finished running during the night, @dcausse do the numbers look good now?

The task finished running during the night, @dcausse do the numbers look good now?

yes all seem correct, we can now filter recommendations by scores hasrecommendation:image>0.98, thanks!

perfect! Will keep the ticket open for tomorrow run, if everything looks good I will close it.

@dcausse can you also confirm that the latest April run / data was successful? Thanks.

@Ahoelzl yes I can confirm the run scheduled on April 2nd did run properly.