Page MenuHomePhabricator

Identification of eligible image for Alt Text Flow C: Article Editor
Closed, ResolvedPublic

Description

Background

For experiment group C (Article flow), we want to prompt users to add missing alt text after they have published an edit on any article containing an image in need of alt text.

In order to know when to show the prompt, we need to

  1. Check whether there is an eligible image in the article. Surface the first eligible image in the article for the alt text prompt (What this task focuses on)
  2. Check whether an edit was eligible (any edit made by a logged-in user on an article in the mainspace).
  3. If there is an eligible image in that article, and the edit was eligible, show the Alt text prompt.

Sampling into control or group C should happen immediately after a user publishes an eligible edit from the main editor. If they are sorted into group C, they should see the alt text prompt.

Requirements
  • Location & wikitext of one eligible image in need of alt text is identified
  • Latency under 2 seconds for eligible image check and identification, to minimize the lag between when a user publishes the first edit, and when we show the prompt
  • Localized parameters should be supported
  • We should not suggest images to users for adding Alt text where any of the following are true
    • The image is not linked to its common page, and has “|link=|alt=” or the local equivalent in magic words
    • The image is below 100x100 pixels as defined in the wikitext, indicating it is an icon.
    • The image images has any aria accessibility attributes such as ariahidden=true or role=presentation

Open question:

  • Should we exclude images in templates due to the specific formatting required? (infoboxes, galleries, timelines, or math formulas from our suggestions as they have specific formats)
References

Spike: T344378: Spike: How to obtain articles that have images with missing alt text answered the questions:

  • Question 1: If we have the wikitext of an article, how do we tell if it has images with missing alt text?
  • Question 2: How do we insert alt text into an existing File link?
  • Question 3: How do we get a list and/or queue of articles that have images with missing alt text?

Event Timeline

HNordeenWMF renamed this task from Identification of eligible articles for Alt Text article flow C to Identification of eligible image for Alt Text Flow C: Article Editor.Jun 18 2024, 6:54 PM
HNordeenWMF updated the task description. (Show Details)
bvibber subscribed.

Taking this for feasibility spike. :)

(notes for alternate method using mostly client-side logic)

size has to be checked via lookup, this can be done easy after the listing though

  • parse for local links in original and post-edit versions
  • find any that looks like file links, with or without alt
    • discard any already present
    • consider empty equivalent to set (present)
  • compare against the previous set to look only for additions
  • if we see obvious aria fixes in wrapper, strip it
  • if we find new images with no alt text:
    • check their sizes
    • discard small ones
  • need to be able to fetch localization lists from mediawiki source

(namespaces, alt/link markers)

Work for Brooke (?):

  • helper script in js to fetch the keywords out of mediawiki
  • prototype logic in js to run some tests
  • port to swift to stick in the app

Open questions:

  • do we deal with _only new additions in the edit_ or _all alt-less template-less images in the text as of this revision_?
    • the former is implementable by doing the extraction of links on both old & new and diffing them
  • do we need to exclude full image markup passed as wikitext into a template as a parameter?
    • to implement: exclude all the {{....}} stuff
  • do we need to check very small icon images when sizes aren't passed in the params?

Notes:

  • magic words fetcher is already present! should be able to use that

@bvibber Notes on how to use the magic words stuff:

  1. These methods give you some examples of how I'm using our MagicWordUtils struct. Note I also bundled the file namespace logic into this as well for simplicity at the time, so I use the term "MagicWordUtils" loosely here. Sorry if that adds confusion.
  2. MagicWordUtils references these generated json files (DE example). Those json files can be updated by running a command line tool. This can be done by selecting the UpdateLanguages scheme in Xcode and running. Note it will likely change additional non-magicwords .json files - I recommend reverting those changes and only committing the changes to the magic words json files.

Screenshot 2024-06-26 at 12.45.02 PM (82×683 px, 14 KB)

  1. If you need to change the languages script and capture more magic words, you can do so by updating the command line utility script.

Let us know if you have any issues or questions!

Hi @HNordeenWMF! We had some questions about the task description:

Check whether an edit was eligible (any non-image-related edit made by a logged-in user on an article in the mainspace).

By this do you mean, an edit is potentially eligible if they were made using our article editor, as opposed to going through the image recommendations flow? Or is it that we need to inspect the article editor edit, and confirm they didn't make any changes related to images throughout the wikitext, and only then is it eligible?

Location & wikitext of one eligible image in need of alt text is identified

For determining an eligible image, are we expected to only consider new images added in that last article edit? Or do we consider any article image in the wikitext missing alt text, even if it wasn't touched in that edit?

The image is below 100x100 pixels, indicating it is an icon.
do we need to check very small icon images when sizes aren't passed in the params?

There may be image wikitext that does not specify pixels, we would need to do an additional fetch to commons to confirm its size. Just making sure you need us to do this additional check. I'm not sure how common this situation will be so sorry about the lack of guidance!

@bvibber Feel free to followup if I missed anything, thanks!

thanks @bvibber and @Tsevener for connecting on this, I understand that the current plan is to execute Flow C's check for the image without using the linter, doing it all client-side instead.

By this do you mean, an edit is potentially eligible if they were made using our article editor, as opposed to going through the image recommendations flow? Or is it that we need to inspect the article editor edit, and confirm they didn't make any changes related to images throughout the wikitext, and only then is it eligible?

The second - It should be an article editor edit on an article in the main namespace, and by a logged-in editor. The only edits we really want to avoid are edits where someone has just added Alt Text to an image in the article (a very small possibility I'm guessing). I thought to avoid this situation, it be easiest to exclude all edits to images, but what do you think?

For determining an eligible image, are we expected to only consider new images added in that last article edit? Or do we consider any article image in the wikitext missing alt text, even if it wasn't touched in that edit?

Any article image in the wikitext missing alt text, even if it wasn't touched in that edit. We actually expect this to be the most frequent situation: someone will have made a non-image related edit, and we suggest another way to improve that article that they demonstrated interest in.

There may be image wikitext that does not specify pixels, we would need to do an additional fetch to commons to confirm its size. Just making sure you need us to do this additional check. I'm not sure how common this situation will be so sorry about the lack of guidance!

This was an additional check suggested by Shay, that we saw in other alt text research research. It would help avoid suggesting images that are decorative icons are in need of alt text. If it's a very medium-to-large lift to do the extra check, we can only perform this check when the pixel sizes are included in the wikitext & forego the additional commons check for the experimental version. Acknowledging if this was ever built-out permanently we would add the extra check to Commons.

I thought to avoid this situation, it be easiest to exclude all edits to images, but what do you think?

@HNordeenWMF Unfortunately I think it will be difficult to determine that they are making an edit to image wikitext, and just as hard to know if they are adjusting alt text within an image wikitext. @bvibber Feel free to counteract if I'm being too pessimistic.

Since we think this is a small possibility, can we scrap this requirement? If they happen to fill in all the alt texts on the page, they should still not see the popup, because the followup logic to fetch any images without alt text should come up empty. If they fill in only one and others remain, then they will see the popup.

OK agreed, if the logic to fetch the image is happening after they submit their edit, it should not be an issue. I'll update the ticket to remove that requirement.

Should we exclude images in templates due to the specific formatting required? (infoboxes, galleries, timelines, or math formulas from our suggestions as they have specific formats)

@HNordeenWMF Just a heads up, I've been playing with this library and wanted to answer this question. I think as long as the format is (loosely) [[{FileNamespace}:{filename}|{additional parameters}]], it will be detected by this logic even if it is within a template. For example, the map picture of Carrollton, TX on German Wikipedia was detected, even though it's in an infobox:

Screenshot 2024-08-14 at 5.22.54 PM (214×316 px, 29 KB)

If you dig into the source it looks like this:

{{Infobox Ort in den Vereinigten Staaten
| Name = Carrollton
| Stadtspitzname = 
| Bundesstaat = Texas
| County = Dallas County
| County2 = Denton County
| County3 = Collin County
| Bild1 = Carrollton July 2019 11 (Carrollton Square gazebo).jpg
| Bildgröße1 = 
| Bildbeschreibung1 = Carrollton Square
| Siegel = 
| Flagge = Flag of Carrollton, Texas.svg
| Karte = [[Datei:Dallas County Texas Incorporated Areas Carrollton highighted.svg|250px]]
...

So that [[Datei:Dallas County Texas Incorporated Areas Carrollton highighted.svg|250px]] was detected, but I think Carrollton July 2019 11 (Carrollton Square gazebo).jpg will not be detected.

I think this will so far be fine, and our logic should be able to properly add alt text to something like [[Datei:Dallas County Texas Incorporated Areas Carrollton highighted.svg|250px]], even if it's in an infobox. Just note that detection may feel a little inconsistent within templates. We will detect them as best we can, but may miss some if they are lacking the surrounding brackets.

Note: I was playing with DE Wiki because they have more localized wikitext expectations. I realize DE Wiki will not actually participate in this experiment. :)

Ok @Tsevener I think that's fine as long as we're only adding alt text to images within templates that are already formatted like a typical image found in an article.

Can be tested in TestFlight 7.5.8 (3895).