Page MenuHomePhabricator

Images (on articles, on file/category pages, and in the media viewer) should default to use their structured data alt text when available
Open, Needs TriagePublic

Description

Currently images uses the filename as alt text both on file/category pages, in the media viewer, and in article thumbnails where an alt text is not manually specified. These should use a centrally provided default alt text, once that's implemented (via T166094: Allow editors to provide default alt text on Wikimedia Commons file description pages, extra credit if the mechanism is easy to make work for an alternative non-Wikibase-based implementation such as T21906: Allow default alternate text for images to be provided on the file description page too).

This would involve, at least:

  • making it possible to get the alt text of a File object (or maybe its structured data more generally - but then for foreign repos we'd have to deal with serialization/deserialization, which would probably result in requiring Wikibase in the client wiki, which is a bad outcome so better to just stick with the alt text; that also shields clients from having to parse the exact structured data model we use, which might change over time)
    • for LocalFile this should be straightforward via MCR
    • for ForeignAPIFile (InstantCommons) this would require exposing the data in the imageinfo API (the other option would be combining imageinfo with some Wikibase API, but since we can't rely on Wikibase being installed in all foreign image repos, that would make things a lot more complicated)
    • for ForeignDBFile, maybe rely on MCR but deserialize the content object on the wiki where the thumbnail is used (seems scary) or put the alt text in some dedicated place in the DB (page_props is used for similar things such as Wikidata descriptions, but it's multilingual so that wouldn't really work) or do a dedicated API call (we already do a HTTP request for fetching file descriptions, but those are cached in Varnish and the API isn't)? None of those options sound good.
  • displaying the default alt text in ThumbnailImage::toHtml() if an explicit alt text is not passed in the options (and also probably depending on the custom-*-link options - if this images is used as a navigational or decorative element we might want to suppress the alt text), probably by passing it in File::transform() when creating the ThumbnailImage object.
  • making sure it's not overridden with something less specific (e.g. ImagePage::view() passes in the filename as alt parameter now)
  • exposing the data in the imageinfo API and using it in MediaViewer

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jdforrester-WMF renamed this task from Images should use Captions as alt text when available to Images (on articles, on file/category pages, and in the media viewer) should default to use their structured data caption as alt text when available.Jan 15 2019, 12:16 AM

Please no. Captions are not, generally, suitable for use as alt attribute text[1]. Please consult with screen reader users and/or accessibility professionals.

[1] and neither are file names.

As I mentioned during the office hours, it would be sensible to make use of structured data captions in the projects. But I obviously didn't make clear enough that there are two very different and distinct pieces of text associated with images in Wikipedias.

One piece of text is the alt text, which is read by a screen reader or displayed if images are turned off in a browser. This is an essential part of accessibility, and its purpose is attempt to convey what the image looks like for those who cannot see the image. If you have an image of a bridge, then the alt text will say that it is an image of a bridge.

The other piece of text is the caption, which is read by everybody, including screen readers (so it doesn't duplicate the alt text because we don't want screen readers to be repeating the same information). A caption should not describe the image - either you can see it (and don't need to be told what you can see) or you can see/hear the alt text (with the same result). The caption brings the observer's attention to a salient point in the image, or links it into the text of the article. If you have an image of a bridge, then the caption might say that it's a hybrid cable-stayed/suspension bridge in New York City, or that the image shows how the bridge looked in 2005 - things that you don't get just by looking at the image.

Alt text is consistent across all uses of the image and it would be useful to have that stored with the image on Commons as structured data.

Captions will vary according to the article and use within the article, although there will often be common elements that could be stored with the image on Commons as structured data.

What would be ideal is two distinct pieces of structured data on Commons: one for alt text; and one for the common elements of captions. We only have one piece of data at present: the structured data captions. If these are going to be used for captions, then you can't use them as alt text, and vice-versa. But you can't have a mix of the two: that would lead to confusion and almost certain rejection of the data by potential users.

I'm very aware of the fact that Captions isn't ideal for alt texts, but it's better then only having the file name or no alt text at all.

Better to include alt itself (T166094) into Structured Data

I'm very aware of the fact that Captions isn't ideal for alt texts, but it's better then ... no alt text at all.

No, it isn't (feel free to support any counter argument with evidence).

Again: Please consult with screen reader users and/or accessibility professionals.

I'm very aware of the fact that Captions isn't ideal for alt texts, but it's better then ... no alt text at all.

No, it isn't (feel free to support any counter argument with evidence).

Again: Please consult with screen reader users and/or accessibility professionals.

I agree, alt text and caption have both so different purpose they can't be interchanged.

I am not familiar with screen readers, nor I'm a professional, but I can imagine how it works (read it with your car satellite navigation voice ;D):

[[File:GPMonaco2018-17.jpg|Monaco GP 2018|alt = Monaco GP 2018]]
Follows an image depicting ... Monaco GP 2018 ... with a title ... Monaco GP 2018.

vs.

[[File:GPMonaco2018-13.jpg|Monaco GP 2018|alt = Daniel Ricciardo crossing the finish line in his Aston Martin formula car followed by two other cars driven by Sebastian Vettel and Lewis Hamilton]]
Follows an image depicting ... Aston Martin formula car driven by Daniel Ricciardo is crossing the finish line followed by two other cars driven by Sebastian Vettel and Lewis Hamilton ... with a title ... Monaco GP 2018.

(example translated from my home wiki article Formula One and little bit changed to be a good example of good v bad practise)

Tgr renamed this task from Images (on articles, on file/category pages, and in the media viewer) should default to use their structured data caption as alt text when available to Images (on articles, on file/category pages, and in the media viewer) should default to use their structured data alt text when available.Jan 5 2020, 4:24 AM
Tgr updated the task description. (Show Details)

Seems pretty clear there is no support for using the caption that way (correctly, in my opinion), so rewrote the task to be agnostic of specifically how the alt text is stored. (But it will probably be a dedicated Wikidata property, see T166094: Allow editors to provide default alt text on Wikimedia Commons file description pages.) That's the easy part; passing it around internally, in a way that works with our file repo architecture, seems surprisingly complex. See especially the ForeignDBFile part in the task description.

Tgr updated the task description. (Show Details)

Some good points from another task:

  1. Having a default alt tag, will increase the size of an img transclusion on pages, which seems something to keep an eye on.
  2. Review... In much the same way as default content from wikidata was a problem for en.wp, I suspect that default content for alt tags from Commons might be a problem for that community. It would be the first text content from Commons that by default becomes part of wp articles, sourced by search engines etc. There is a vandalism angle possible there, and they may argue that Commons' review processes are not setup to detect that, so that might become a conflict point.