Page MenuHomePhabricator

Allow editors control of the page image
Open, MediumPublic

Description

Precursor

Since updating the page images algorithm is risky and we've hit issues in the past, T152252 should be resolved before work on this begins.

Story

PageImages should allow specifying the selection of a page image using a magic word (or maybe in image markup). It should be required that to be the lede, it actually be visible in the article.

For example, the first image of the article may be in the second section. That image may be appropriate to accompany the second section. However, there may be a more appropriate lead image (representing the topic as a whole) later in the article.

T90914: Provide semantic wiki-configurable styles for media display is about other markup options for images.

User story

As an editor, I want to be able to select the page image for pages that do not have an assigned image
As an editor, I want to be able to select the page image for pages where the current image is inappropriate

Background

Last year, we made a change to page images that restricted page images from selecting any image that was not in the lead section of an article (T152115). This change was made because often images that were in latter sections were displayed out of context and were not a good representation of the entire article. We would like to give editors the ability to select page images from the remainder of the images within the page to increase the coverage of pages with page images. In addition, there might still be cases where images appearing later in the page might be more appropriate as page images than the image selected from the lead section. This overwrite would give editors control over selecting these images.

Acceptance criteria

  • Create markup that allows editors to assign any image present on the page as the page image
  • If multiple images are assigned, the first instance will be the image used
  • If the image assigned is no-longer present on the page, we must default back to selecting the image as the first image from the lead section or infobox
  • If an user-selected image is no longer available on the page, the default image for the page will be used
  • If the image the user selects is not available on the page, the user must see an error upon saving their edit
  • If the image the user selects is of resolution lower than the minimum requirements, the user must see an error upon saving their edit
  • Add page image sourcing as per T91683#4088612

Markup

{{#pageimage:Foo.jpg}}

sets the page image to Foo.jpg if Foo.jpg is in the page.
If Foo.jpg is not in the page the command is ignored.

{{#pageimage:}}

does not blank the page image.

Note: It should be possible for the magic word to be used in place of an image like so:

[[File:{{#pageimage:Foo.jpg}}|thumb|Main image of the page]]
The algorithm

Will consider the page image provided by the user.

  • If the image is a bad choice (has a negative score) it is ignored. No feedback is given to the user.
  • If the image is not in the page it is not used. No feedback is given to the user.
  • If the image has a positive score, it is used regardless of whether there is an image with a higher score.
  • If the image is an empty string is has no impact on the image choice. No feedback is given to the user.

Sign off steps

Note: Allowing images from outside in a risky change and outside the scope for this task, but we'll want to consider that as part of sign off.

Risk:

  • Currently we limit images to the lead section on Wikipedia, after these changes are in a place, we may want to relax this. This adds risk however as it means we may expose many unsuitable images where previously we showed none.
  • There is a danger we may need to revert the change and run maintenance scripts to revert it. We should be prepared
  • The page image choice bubbles into the services layer, so we'll also need to make them aware of the change.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi and welcome @Hooman_Mallahzadeh! Note that having multiple images isn't covered by the user stories of this task and sounds like a separate enhancement request, for a separate ticket.

@Aklapper a subtask created at T290434 and I make it a child task of this task. Thanks.

For most non-Wikipedia wikis the automatic search is not restricted to the leading section (and sometimes not suitable, too). This restriction was introduced because of a not suitable search algorithm. The result of the automatic search is often not clear for many authors that's why is useful that the authors can autonomously select the page image of interest.

Maybe it can be realized with an additional image syntax argument like [[File:abc.jpg|pageimage|mini|...]].

For most non-Wikipedia wikis the automatic search is not restricted to the leading section (and sometimes not suitable, too). This restriction was introduced because of a not suitable search algorithm. The result of the automatic search is often not clear for many authors that's why is useful that the authors can autonomously select the page image of interest.

Maybe it can be realized with an additional image syntax argument like [[File:abc.jpg|pageimage|mini|...]].

As mentioned in Markup section above, we should use parser functions for this purpose, so we should implement some code for action of "changing PageImage of an article" by such syntax:

{{#pageimage:Foo.jpg}}  
or [[File:{{#pageimage:Foo.jpg}}|thumb|Main image of the page]]

This syntax seems reasonable.

Change 735061 had a related patch set uploaded (by Simon04; author: Simon04):

[mediawiki/extensions/PageImages@master] Exclude images with class=nopageimage

https://gerrit.wikimedia.org/r/735061

The proposed change picks up the idea from an earlier comment:

We already have a class, noviewer, for MediaViewer that removes images from that system. Consider whether there should be one way of image selection/removal rather than multiple...

Adding a class nopageimage to an image analogously excludes the corresponding image as a PageImages candidate. Using this technique, the issues T75921, T265713, T280535 could be resolved by adding class=nopageimage.

Does that provide sufficient control to the editor?

Maybe is more useful to authors to have only one class like pageimage or a markup for only one image instead of the multiple use of nopageimage.

Adding a class nopageimage to an image analogously excludes the corresponding image as a PageImages candidate. Using this technique, the issues T75921, T265713, T280535 could be resolved by adding class=nopageimage.

Does that provide sufficient control to the editor?

That doesn't seem to solve the story of actually allowing selection of an image, seems to only allow unselecting one specific image?

That doesn't seem to solve the story of actually allowing selection of an image, seems to only allow unselecting one specific image?

Correct, but its easy to do, while the other way around requires significant engineering (On an extension that currently doesn't have ppl responsible for it)

The question was just if that class option "provide sufficient control to the editor" - and I don't think so (with focus on "sufficient") based on the story criteria above, and also based on the way images are actually added to articles by editors, adding a class to them is not necessarily an easy step for a content editor.

The question was just if that class option "provide sufficient control to the editor" - and I don't think so (with focus on "sufficient") based on the story criteria above, and also based on the way images are actually added to articles by editors, adding a class to them is not necessarily an easy step for a content editor.

OK, so it won't solve this ticket then. The question then becomes, do we still want it at all, or not ? (it should get it's own ticket in that case).

Having the ability to unselect an image on a per-page basis with normal editing tools seems useful.

Having the ability to unselect an image on a per-page basis with normal editing tools seems useful.

Agreed.
For a specific example, at https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-10-27 we are not getting the top-most image as the PageImages selection, but instead getting the 2nd image (see action=info). This is creating confusion.

Having the ability to unselect an image on a per-page basis with normal editing tools seems useful.

Agreed.
For a specific example, at https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-10-27 we are not getting the top-most image as the PageImages selection, but instead getting the 2nd image (see action=info). This is creating confusion.

I don't think this example is good. If for whatever reason the algorithm fails to select the desired image, one wants to force it to the desired image (for example, by using the proposed markup in the description). Otherwise, as an editor what do you expect? Mark every image on the page except the first one with class=nopageimage? Or maybe mark the second one, save the page to see if it picks the first one, and if not, repeat the process (one save at a time) until you pick the desired one? Both options seem a terrible UX for me.

For decorative images, remember we have a [[MediaWiki:Pageimages-denylist]]

Change 736926 had a related patch set uploaded (by Simon04; author: Simon04):

[mediawiki/extensions/PageImages@master] Allow editors control of the page image

https://gerrit.wikimedia.org/r/736926

I think we should keep this simple - either the user defines the page image explicitly or the user marks page images not to use, rather than adding support for both. Using both adds some unnecessary complexity. Instead I'd suggest catering to the two use cases:

  1. Disable all page images in a page: Have a magic word DISABLE_PAGE_IMAGES which disables all page images on the page. This can be used in cases where there is no suitable page image but the article contains images.
  2. Allow a page image to be marked with pageimage class (as in your patch) to cater for the situation where the editor wants a certain image to be used.

The pageimage raises a few implementation questions:

  1. What if the image is not suitable for a page image? e.g. it's got the wrong license or is really low resolution? Does the editor choice override the page image quality detection?

Suggested: In this case I'd suggest setting the article page image to nothing.

  1. What if more than one image is marked. Which is used in that circumstance?

Suggested: Use the first in the page.

The above should be documented on Extension:PageImages

@Jdlrobson That sounds great to me.
@Ciencia_Al_Poder You're right, my example was not a good one of the use-case for removing a single image. (I had just wanted to share the example I had recently encountered). Sorry for the confusing example!

Having a class to remove a specific image isn't necessarily a bad idea, especially if the image is from a template. Having a class to say "use this one!" and the magicword to use no images both seem like net positives as well.

Change 735061 abandoned by Simon04:

[mediawiki/extensions/PageImages@master] Exclude images with class=nopageimage

Reason:

Abandoning in favour of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/736926

https://gerrit.wikimedia.org/r/735061

Hi @simon04 please retag with Web-Team-Backlog (Kanbanana-FY-2021-22) when you need another review.

Change 736926 abandoned by Simon04:

[mediawiki/extensions/PageImages@master] Allow editors control of the page image

Reason:

According to https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/736926/comment/f4bb6213_14b34b67/ this patch needs a complete rewrite. After 30min of reading through docs and various examples, I couldn't figure out which of the https://www.mediawiki.org/wiki/Manual:Hooks to apply how. I'm abandoning this issue altogether. Sorry. :/

https://gerrit.wikimedia.org/r/736926

One of the important requirements of the image selection is that the image must be in the article as otherwise this could be used for a non-obvious form of vandalism.

A magic word could be used to exclude a page image. A template could then exclude images it doesn't want to use. This is the best way to scale this to avoid millions of updates across lots of articles.

So I would suggest the solution looks as follows

  • Add a magic word/class {{#pageimage-ignore:File.jpg}}/ nopageimage that tells PageImages that File.jpg should not be used
  • Get PageImages to get all the names of the images that are prohibited
  • Get list of images in page, removing the prohibited one
  • Run the existing algorithm

One of the important requirements of the image selection is that the image must be in the article as otherwise this could be used for a non-obvious form of vandalism.

This is what is suggested in the task description. What's wrong with it to be completely ignored/rejected?

{{#pageimage:Foo.jpg}}

sets the page image to Foo.jpg if Foo.jpg is in the page.
If Foo.jpg is not in the page the command is ignored.

If for whatever reason, the page has no images in the leading section, and editors want to display one specific of the other images, which for whatever reason it's not chosen by the algorithm, they'll have to play whack-a-mole for every image in the page until the desired one appears as the page image? Note also that there's no way to tell which one is chosen during preview, which means the page will need to be saved and checked again for every test. Not to mention that an editor adding a new image has the potential to make it the new page image that someone needs to "exclude" again...

Change 736926 restored by Jdlrobson:

[mediawiki/extensions/PageImages@master] Allow editors control of the page image

https://gerrit.wikimedia.org/r/736926

This is what is suggested in the task description. What's wrong with it to be completely ignored/rejected?

I never rejected the inclusion usecase. I simply said let's add the exclusion use case now.
The reason this task is taking so long (2015!) is that we we are trying to create a perfect solution and as a result ending up with no solution. I see this all the time on Phabricator tickets.

The exclusion of an image is a far easier use case and problem to solve. It's been said multiple times in this thread. See T91683#7485861 (although note this latest suggestion is reconsidering the idea of having both) and previous comments for more details.

I suggest we address the exclusion problem first to have something we can actually use to address the problem of bad page images, which was almost done in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/736926

We can iterate off that and add the inclusion case later but this task will no longer be in a paralyzed state and editors will actually have something useful to use to address the problem.

I think we should keep this simple - either the user defines the page image explicitly or the user marks page images not to use, rather than adding support for both. Using both adds some unnecessary complexity.

Jdlrobson you were absolutely right. Defining an image to be excluded is actively unwanted for several reasons
#1, as you said, having both creates unnecessary complexity. Unnecessary complexity in this code, unnecessary complexity for any other software that would need to understand the multiple methods, and unnecessary complexity for the community.
#2, as Ciencia Al Poder explained, an exclude option is so broken as to be substantially useless. An editor would need to run through the page defining all-but-one image as excluded, and then that kludge silently breaks when anybody adds a new image to the page.
#3 Dev time is a scarce resource. Bad "solutions" inhibit work on real solutions because (theoretically) the problem is already solved or mitigated. Any further work on the issue gets fatally de-prioritized. The existence of a exclude-feature will actively undermine development of the feature we do need.

An exclude option does not address the user story or acceptance criteria, and we're literally better off without it.

My understanding of (some of) the discussion so far (from the last seven years):

  • The problem is that the wrong image is chosen sometimes.
  • Could we exclude transcluded images (i.e. to avoid things like icons in top-of-page message boxes)? No, because that would also exclude the main image in an infobox.
  • Has been a problem since ~2015.
  • The selection algorithm of PageImages is hard to understand for most contributors.
  • The MediaWiki:Pageimages-denylist is a per-wiki list of images that should never be chosen. It's not very widely known or used.
  • In February 2022 a new feature was added to PageImages, to allow an image in a page to be excluded by adding an HTML class of notpageimage. This solves lots of the existing issues.
  • The notpageimage system doesn't work if e.g. the first image on a page is excluded, but then later the page is rearranged and another image is added above it.
  • Would be better to just be able to specify which image is wanted.
  • Can Wikidata be queried for a good representative image (image (P18) property)? This is probably worth doing, but it's not enough because not all pages for a given item want to use the same image. Tracked in T95026.
  • It would be good to be able to specify the 'interest point' of an image, which should remain in frame if the image is to be cropped. (This is similar to what Wikidata-Page-Banner does with it's origin parameter.)
  • It'd be possible to exclude some types of image by making regexes for exclusion, but this is complicated and not the wiki way.
  • It should be possible to say that there is no suitable image. This can only currently be done by adding notpageimage to every image.
  • Would be good to be able to retrieve a different page's page image, e.g. in a Lua module.
  • Adding a new parser function is not a good idea, because it has no UI, is not localizable, and doesn't centralize the definition of the image.
  • The chosen page image is currently shown in action=info.
  • Even manually-set images should be constrained by various rules, e.g. minimum resolution and aspect ratio.
  • {{#pageimage:}} should be used instead of {{pageimage:}} or {{PAGEIMAGE:}} to avoid confusion with templates.
  • Randomly choosing from the available appropriate images in a page is not worth doing (e.g. an article about an election might choose a controversial candidate's photo).
  • Other systems such as MediaViewer also want to be able to exclude images. Is there some commonality here?
  • Should a composite matrix image be created, e.g. like some photo gallery software does for albums? This is common on some Wikipedias (where the images are produced manually).
  • Should it be permitted to set a page image that isn't actually used on the page? If an image is not used, this becomes a vector for vandalism.
  • We shouldn't add both per-image exclusion and manual page-image-setting, because it's complicated.
  • Add new __DISABLE_PAGE_IMAGES__ magic word? (If there isn't going to be a way to set images.)
  • It needs to be easy to see which images are excluded and which included, otherwise a page might set an image but not have it displayed because some template is setting it as notpageimage.

The Select preview image proposal in the 2022 Community Wishlist Survey (and a similar one from 2021) has asked for the addition of the parser function for setting a page image.

I suggest that we should add {{#pageimage:}} now. Does this sound okay? Are there still objections? I think the main issues (and resolutions) are as follows:

  1. It adds additional complexity by having multiple different ways to include and exclude page images. Lots of people want this feature, and presumably are happy to accept the additional complexity.
  2. Vandalism will be easier. There are already lots of ways to vandalize a page (either directly or by editing images, templates, etc.), and it doesn't look like this feature would be very different to these. It's probably most similar to the {{SHORTDESC:}} function from Wikibase.

One last point of contention might be the form that this takes: should it be a parser function? (and if so, what capitalization?) or an additional class added to an existing image in a page?

Interesting summary of your readings in the last comment. Makes we wonder if there’s a way to join the way we include and exclude images, but I guess that’s tricky as we would need to update all the previous usages of exclude images.

Change 835079 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/PageImages@master] Add new {{#pageimage:}} parser function

https://gerrit.wikimedia.org/r/835079

Change 835295 had a related patch set uploaded (by Samwilson; author: Nik Gkountas):

[mediawiki/extensions/PageImages@master] refactor: move utility methods from ParserFileProcessingHookHandlers

https://gerrit.wikimedia.org/r/835295

@Samwilson thanks for summarizing so many moving pieces! It helps me digest all of the history/threads.

Just saw @Jdlrobson had concerns about the parser function due to some of the risks that you flagged Sam:

[please fix] We considered this a while back as one potential approach but we decided that if adding this we'd need to enforce that the image is actually inside the page.

Consider the following:

Since the page image is only visible at ?action=info this kind of vandalism could go undetected for some time, during which it would be shown to a variety of people inside apps and search.

We saw this problem before with Wikidata descriptions when we showed them on mobile web only.

The other challenge here, is what to do if the image in the page is removed, but the parser tag is not updated.

@Samwilson above you had asked about the possibility of adding a class instead -- what are some of the complexities/tradeoffs there?

Since the page image is only visible at ?action=info this kind of vandalism could go undetected for some time, during which it would be shown to a variety of people inside apps and search.

This won't be the case if the parser function also returns (outputs) the image name for using inside wiki markup, like the proposed example:

[[File:{{#pageimage:Foo.jpg}}|thumb|Main image of the page]]

Of course, another instance could be added inside a hidden fragment, like <div style="display:none">{{#pageimage:Vandalism.jpg}}</div>, but of course, the same applies if you add a vandalism image in a hidden element and then tag every other image with class=notpageimage, or even if such image is automatically picked by the current algorithm, something perfectly possible if it's added at the leading section of the page before the infobox.

The other challenge here, is what to do if the image in the page is removed, but the parser tag is not updated.

This approach will also solve this challenge.

Next steps are being discussed in T319559.

For this part of the algorithm:

  • If the image is not in the page it is not used. No feedback is given to the user.

My use case is I'd like a specific fallback image for some pages in a category that don't have any images, but I don't want to actually include the image in the page, so if an option to turn off that requirement would be welcome.

Hi @Tiggleshorts I am interested in your use case. Where are you expecting this fallback image to show up? Fallbacks should be possible with the existing code. $wgPageImagesOpenGraphFallbackImage for example allows you to change the default image for the page when a link is shared to other websites.

Hi @Jdlrobson . After some thought it may be a bit more complex. I did see the case of setting $wgPageImagesOpenGraphFallbackImage , but that would set the fallback image for every page in the wiki right?

For my case, the main goal was to set pageimages for pages that didn't have an image (for the thumbnails in search), but the generic fallback would be too broad. So a fallback image for a category would be great.

I was thinking of setting this var to be transcluded for a specific category, even though the image is not in the actual page:
{{#pageimage:Foo.jpg}}

@Tiggleshorts but what are you using the page image for? Search results? Nearby articles? Something else?

@Jdlrobson Search results in my case - I recently upgraded my MediaWiki install to v.1.39 and the updated search which shows thumbnails of pages in search suggestions. I noticed most didn't have thumbnails as a lot of pages in my wiki don't have images. To better use this nice feature, editors have been adding an associated image to pages where practical.

Being able to set a fallback image for pages in a category (or just any page) would help populate images for thumbnails in a meaningful way if there is no specific image in the page.

@Tiggleshorts you can a rule to MediaWiki:Common.css to style these:

eg.

/* categories */
[href~="search=Category%3A"] .cdx-thumbnail__placeholder { background-image: url(/static/images/icons/categoryIcon.png);
/* other placeholder */
.cdx-thumbnail__placeholder { background-image: url(/static/images/icons/placeholder.png); }
.cdx-thumbnail__placeholder svg { display: none; } `)

A configurable fallback wouldn't help you here, as different features have different fallback images. Hope that helps.

@Jdlrobson Thanks for the suggestion, I will try that out.

Change 835295 abandoned by Nik Gkountas:

[mediawiki/extensions/PageImages@master] refactor: move utility methods from ParserFileProcessingHookHandlers

Reason:

https://gerrit.wikimedia.org/r/835295

Change 835079 abandoned by Samwilson:

[mediawiki/extensions/PageImages@master] Add new {{#pageimage:}} parser function

Reason:

https://gerrit.wikimedia.org/r/835079