Page MenuHomePhabricator

Allow editors control of the page image
Open, NormalPublic

Description

Precursor

Since updating the page images algorithm is risky and we've hit issues in the past, T152252 should be resolved before work on this begins.

Story

PageImages should allow specifying the selection of a page image using a magic word (or maybe in image markup). It should be required that to be the lede, it actually be visible in the article.

For example, the first image of the article may be in the second section. That image may be appropriate to accompany the second section. However, there may be a more appropriate lead image (representing the topic as a whole) later in the article.

T90914: Provide semantic wiki-configurable styles for media display is about other markup options for images.

User story

As an editor, I want to be able to select the page image for pages that do not have an assigned image
As an editor, I want to be able to select the page image for pages where the current image is inappropriate

Background

Last year, we made a change to page images that restricted page images from selecting any image that was not in the lead section of an article (https://phabricator.wikimedia.org/T152115). This change was made because often images that were in latter sections were displayed out of context and were not a good representation of the entire article. We would like to give editors the ability to select page images from the remainder of the images within the page to increase the coverage of pages with page images. In addition, there might still be cases where images appearing later in the page might be more appropriate as page images than the image selected from the lead section. This overwrite would give editors control over selecting these images.

Acceptance criteria

  • Create markup that allows editors to assign any image present on the page as the page image
  • If multiple images are assigned, the first instance will be the image used
  • If the image assigned is no-longer present on the page, we must default back to selecting the image as the first image from the lead section or infobox
  • If an user-selected image is no longer available on the page, the default image for the page will be used
  • If the image the user selects is not available on the page, the user must see an error upon saving their edit
  • If the image the user selects is of resolution lower than the minimum requirements, the user must see an error upon saving their edit
  • Add page image sourcing as per https://phabricator.wikimedia.org/T91683#4088612

Markup

{{#pageimage:Foo.jpg}}

sets the page image to Foo.jpg if Foo.jpg is in the page.
If Foo.jpg is not in the page the command is ignored.

{{#pageimage:}}

does not blank the page image.

Note: It should be possible for the magic word to be used in place of an image like so:

[[File:{{#pageimage:Foo.jpg}}|thumb|Main image of the page]
The algorithm

Will consider the page image provided by the user.

  • If the image is a bad choice (has a negative score) it is ignored. No feedback is given to the user.
  • If the image is not in the page it is not used. No feedback is given to the user.
  • If the image has a positive score, it is used regardless of whether there is an image with a higher score.
  • If the image is an empty string is has no impact on the image choice. No feedback is given to the user.

Sign off steps

Note: Allowing images from outside in a risky change and outside the scope for this task, but we'll want to consider that as part of sign off.

Risk:

  • Currently we limit images to the lead section on Wikipedia, after these changes are in a place, we may want to relax this. This adds risk however as it means we may expose many unsuitable images where previously we showed none.
  • There is a danger we may need to revert the change and run maintenance scripts to revert it. We should be prepared
  • The page image choice bubbles into the services layer, so we'll also need to make them aware of the change.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I see no way in which that task blocks this one.

That was the solution which achieved broadest consensus, but sure, it's not the only option.

Alsee added a subscriber: Alsee.EditedMar 15 2016, 11:10 PM

Any magicword needs to accept "none" as the image. This will override any automatic image selection treat it like an article with no images. This is needed in circumstances where none of the images in the article are remotely appropriate for representing the topic. A biography article may not have any available image of the person, but may contain an image of their spouse, or even something they fought against. (Someone opposed to pornography shouldn't be represented by a movie they tried to get banned, and a politician who lost an election shouldn't be represented by the politician who won.)

When the software rejects the first image (because it's non-free or strange height/width etc), and there is no magicword present to indicate a substitute, never blindly grab the second image. Treat it like an article with no images.

@Alsee correct.
Based on previous implementations for the Wikivoyage community's page banners we'd do something like:

{{PAGEIMAGE:}} would disable the pageimage for the current page.
{{PAGEIMAGE:Filepage title}} would force the provided file to be the page image for the current page
When neither of the above is present use automatic page image.

Qgil removed a subscriber: Qgil.Mar 16 2016, 11:27 AM
MaxSem removed a subscriber: MaxSem.Mar 16 2016, 8:04 PM
Alsee added a comment.EditedMar 16 2016, 10:30 PM

@Jdlrobson thanx. It's currently grabbing the next available image.

The biography for Fabia Drake is currently returning an image of Tallulah Bankhead, and there was a 50% chance it would have been an utterly random image of an ocean liner instead. Someone else came across a bio displaying a spouse.

  1. Make a parser function like {{#pageimage:File:Something or other.jpg}} that overrides the current automatic selection heuristics

This task is about setting a page image, but we may also want a parser function for getting (retrieving) a page image. For example, retrieving the page image of "White House" when viewing the "Barack Obama" article. In thinking about names for parser functions, we should be cognizant of this.

Jdlrobson renamed this task from Allow specifying lead image to Allow editors control of the page image.May 16 2016, 6:09 PM
Jdlrobson updated the task description. (Show Details)
Abbe98 added a subscriber: Abbe98.Jun 22 2016, 3:19 PM

A contributor shared some thoughts related to this task on MediaWiki.org. Cross-linking here in hopes someone with more insight could respond to the concerns on-wiki.

https://www.mediawiki.org/wiki/Topic:Tgognjiumq40ng1s

debt added a subscriber: debt.Apr 12 2017, 2:42 PM

While documenting, I did discover editors have some control...
https://www.mediawiki.org/wiki/Extension:PageImages#Can_I_exclude_certain_page_images.3F

I thought I'd add this in case it was useful information.

Tbayer added a subscriber: Tbayer.Jul 26 2017, 7:11 PM

T92457 raises the question of whether this should support images that are video stills (via the thumbtime parameter).

Some thoughts that we should fold up into the description to help get this ready.

  • What is the problem we're trying to solve?
    • Is it blacklisting certain images?
    • Is it allowing images not in the page (I think this could become a potential source of vandalism and unmaintainable)>
  • Currently editors can blacklist images inside a page e.g. https://en.m.wikipedia.org/wiki/MediaWiki:Pageimages-blacklist - this is not great as it applies globally and does not scale.
    • I'm curious why this was done this way. There are possibly technical reasons. We will need to explore.
    • It seems sane to move this blacklist
  • Both whitelist and blacklist approaches lead to some complexities we'll have to think about.
    • Edge cases:
      • What do we do if an editor chooses a bad page image e.g. an image that is 10px by 10px?
      • What if an editor chooses an image that is no longer on the page.
      • What happens if the page image choice is restricted to the lead section and an editor chooses one outside the lead section?
    • Do we create an editor warning when a user assigns an invalid page image to surface these to editors?
ovasileva updated the task description. (Show Details)Mar 14 2018, 5:50 PM

Something like the {{#setmainimage: }} could be used, as it's done with Extension:OpenGraphMeta. It returns the image, which can be used directly inside templates.

In some specific scenarios the extension may fail to choose the right image (for whatever internal parser priorities or interactions with other extensions), example in https://www.mediawiki.org/wiki/Topic:U9lt48mqqvawvj5p . This one wasn't obvious and blacklisting random images wouldn't be helpful.

Jdlrobson updated the task description. (Show Details)Mar 21 2018, 3:36 PM
Jdlrobson updated the task description. (Show Details)Mar 27 2018, 3:54 PM

In terms of giving more transparency to how this works we could make a row show up on https://en.wikipedia.org/wiki/Offset_printing?action=info which explains how page image was sourced
e.g.


ovasileva updated the task description. (Show Details)EditedMar 29 2018, 10:00 AM

@Jdlrobson - we have current constraints on resolution, right?

https://m.mediawiki.org/wiki/Extension:PageImages#Image_choice

It's not clear how these constraints would apply to a user defined image. My preference would be to ignore them to keep implementation as simple as possible.

pajz added a subscriber: pajz.EditedApr 24 2018, 6:23 AM
  • What is the problem we're trying to solve?
    • Is it blacklisting certain images?
    • Is it allowing images not in the page (I think this could become a potential source of vandalism and unmaintainable)>

I'd just add that it is also about excluding images that just don't make sense for a given page. E.g., we received word yesterday in OTRS from a Wikipedia reader, who pointed us to the picture choice for https://de.wikipedia.org/wiki/Parlamentarischer_Gesch%C3%A4ftsf%C3%BChrer (chief whip). The page contains only party logos. None of these make sense to illustrate the topic of chief whip. (To make matters worse, the algorithm picked the logo of a far-right party, causing considerable irritation.) So it should also be possible to just prevent the display of any image.

Jdlrobson added a comment.EditedApr 24 2018, 4:36 PM

We tried to estimate this and landed on 8, 13s and 20s. There's a lot of work and risk here. Recommend setting up a specific meeting or repurpose a grooming specifically to talk through the data. I'll update the task folding in the answers to the open questions. cc @ovasileva can you schedule this when you get back? (30 minutes should suffice).

Also see comment above regarding "So it should also be possible to just prevent the display of any image." - should we adapt the current plan?

Jdlrobson updated the task description. (Show Details)Apr 24 2018, 4:57 PM

Markup
{{pageimage|Foo.jpg}}
sets the page image to Foo.jpg if Foo.jpg is in the page.
If Foo.jpg is not in the page the command is ignored.

Have you considered my comment on T91683#4059642 ? If the parser function returns the same input, it can be used *in place* of the image. For example: [[File:{{#pageimage:Foo.jpg}}|thumb|Main image of the page]]. This will also make easier for people to use it for existing images on the page, and prevent someone from changing the image used on the page without also changing the pageimage contents.

Also please use {{#pageimage:Foo.jpg}} instead of {{pageimage|Foo.jpg}} to avoid it being confused with a template.

Jdlrobson updated the task description. (Show Details)Apr 24 2018, 7:38 PM
Jdlrobson updated the task description. (Show Details)Apr 24 2018, 7:51 PM

@ovasileva any updates on this?

Sounds like we need some community consultation and a specification on how this should work as there doesn't seem agreement on the smaller details. If this is important I'd suggest we build a spec similar to how we did the summary endpoint and schedule that activity via a spike.

Serious problem case: Articles on upcoming elections may contain an image of each candidate. Page image grabs one political-candidate photo and uses it as the image for the election itself. Yikes!

The only workaround I can think of offhand is to move the infobox (and images) out of the lead section. But this issue is going to crop up on each new election article, until someone spots it. And even when someone spots the problem they likely won't know how to fix it.

Maybe a randomized selection would work? So if there are 3 candidates with 3 images it displays one at random?

Randomization wouldn't be much of an improvement. At minimum it still hurts us when some people think we aligned with one candidate and other people think we aligned with another candidate. I'll also note that the infobox may contain five or more candidates, some of whom are extremely fringe candidate with no chance of winning. We could literally select a Nazi.

The basic problem is that none of the images are remotely representative of the article as a whole. We either need some neutral image (which generally doesn't exist in these articles), or no image at all.