Page MenuHomePhabricator

Images inside infoboxes are not being used as page image
Closed, ResolvedPublic

Description

pasted_file (150×488 px, 19 KB)

pasted_file (151×510 px, 18 KB)

There's a perfectly acceptable image in the infobox that the article should've used, so why is it selecting this weird graveyard map? [It's using the image from the navbox]

Event Timeline

Images on that page get parsed in a reeeally interesting order:

maxsem@tin$ mwscript eval.php enwiki

> $t=Title::newFromText('Shimon Peres');$wp=WikiPage::factory($t);$popts=new ParserOptions();$po=$wp->getParserOutput($popts);

> var_dump($po->getExtensionData('pageImages'));
array(51) {
  [0]=>
  array(7) {
    ["frame"]=>
    array(4) {
      ["link-title"]=> <zapped>
      ["alt"]=>
      string(19) "Page semi-protected"
      ["caption"]=>
      string(100) "This article is semi-protected to promote compliance with the policy on biographies of living people"
      ["title"]=>
      string(100) "This article is semi-protected to promote compliance with the policy on biographies of living people"
    }
    ["handler"]=>
    array(1) {
      ["width"]=>
      int(20)
    }
    ["horizAlign"]=>
    array(0) {
    }
    ["vertAlign"]=>
    array(0) {
    }
    ["filename"]=>
    string(18) "Padlock-silver.svg"
    ["fullwidth"]=>
    int(128)
    ["fullheight"]=>
    int(128)
  }
  [1]=>
  array(7) {
    ["frame"]=>
    array(3) {
      ["caption"]=>
      string(0) ""
      ["alt"]=>
      string(47) "Nation's Great Leaders Graves Map Mt. Herzl.png"
      ["title"]=>
      string(0) ""
    }
    ["handler"]=>
    array(1) {
      ["width"]=>
      int(350)
    }
    ["horizAlign"]=>
    array(0) {
    }
    ["vertAlign"]=>
    array(0) {
    }
    ["filename"]=>
    string(47) "Nation's_Great_Leaders_Graves_Map_Mt._Herzl.png"
    ["fullwidth"]=>
    int(1228)
    ["fullheight"]=>
    int(1515)
  }
  [2]=>
  array(7) {
    ["frame"]=>
    array(5) {
      ["frameless"]=>
      bool(false)
      ["upright"]=>
      string(1) "1"
      ["caption"]=>
      string(0) ""
      ["alt"]=>
      string(35) "Shimon Peres by David Shankbone.jpg"
      ["title"]=>
      string(0) ""
    }
    ["handler"]=>
    array(1) {
      ["width"]=>
      int(220)
    }
    ["horizAlign"]=>
    array(0) {
    }
    ["vertAlign"]=>
    array(0) {
    }
    ["filename"]=>
    string(35) "Shimon_Peres_by_David_Shankbone.jpg"
    ["fullwidth"]=>
    int(925)
    ["fullheight"]=>
    int(1273)
  }
  [3]=>
  array(7) {
    ["frame"]=>
    array(3) {
      ["alt"]=>
      string(0) ""
      ["caption"]=>
      string(24) "Shimon Peres's signature"
      ["title"]=>
      string(24) "Shimon Peres's signature"
    }
    ["handler"]=>
    array(2) {
      ["width"]=>
      int(128)
      ["height"]=>
      int(80)
    }
    ["horizAlign"]=>
    array(0) {
    }
    ["vertAlign"]=>
    array(0) {
    }
    ["filename"]=>
    string(26) "Shimon_Peres_Signature.svg"
    ["fullwidth"]=>
    int(317)
    ["fullheight"]=>
    int(74)
  }
 . . .

That explains why that image is being chosen; it's the first image that matches the criteria to be parsed. But why is an image that's almost right at the end of the article being parsed close to first? The infobox image is immediately after it.

because it's free?

So is the image in the infobox, which has the advantage of not being the last image in the page. :-)

@bd808 had a hypothesis that it was something to do with the fact that the image is in an image map. I looked at a bunch of other articles with image maps, and a very large proportion of them did have the images maps as the page image, irrespective of where the image map was in the article. I'm inclined to agree with his hypothesis that there's some weird interaction going on here that's causing this.

We're hopefully T87336 should fix this as it will mean sections in the lead of the article will get preference. We will track this in the mean time.

Jdlrobson renamed this task from Page image is chosen seemingly at random on Shimon Peres article to Images inside infoboxes are not being used as page image.Nov 23 2016, 6:47 PM
Jdlrobson updated the task description. (Show Details)

because it's free?

So is the image in the infobox, which has the advantage of not being the last image in the page. :-)

Do you mean this image? https://en.wikipedia.org/wiki/File:Helen_Joanne_Cox.jpg#mw-jump-to-license
It's not free.

because it's free?

So is the image in the infobox, which has the advantage of not being the last image in the page. :-)

Do you mean this image? https://en.wikipedia.org/wiki/File:Helen_Joanne_Cox.jpg#mw-jump-to-license
It's not free.

I was talking about Shimon Peres, which does have a free image in the infobox. The Jo Cox example was added after I wrote my comment. :-)

@Deskana thanks for clarifying. In that case the current example is misleading and maybe hiding the true cause of the problem.

@Jdlrobson why was the initial link changed to Jo Cox? These are two different cases.

@bmansurov I added a link, I didn't change it. At the time after a little investigation, they looked to be the same problem, but given it's free I've removed.

Here is what I learned so far:

  1. The template https://en.wikipedia.org/wiki/Template:Nation%27s_Great_Leaders_Plot,_Mount_Herzl makes use of #tag syntax to generate an image map.
  2. When the imagemap tag is parsed, it calls Parser::makeImage, which in turn runs the hook ParserMakeImageParams.
  3. PageImages listens to the ParserMakeImageParams hook and keeps record of the images passed to the hook.
  4. Since the parser runs imagemap before parsing the images on the page, the image inside this tag will be passed along to the PageImages before any other images that may appear on the page.

To parse images in order they appear in the document, we'll have to not rely on the ParserMakeImageParams.

Nice detective work! So basically the image candidates are not returned in order they appear in the page?

When we switch to images from the lead section only this will be less of a concern as it would only impact pages which have image maps in the lead (which I suspect is very rare).

Maybe we should ignore image maps altogether?

So basically the image candidates are not returned in order they appear in the page?

Yes.

Can we merge this with T87336 or decline?
Technically this is doing the right thing. Pageimages will find the best image based on width/height ratio not on position.
T87336 will however make this problem go away.
Otherwise we might want to explore updating the ranking system to take into account whether its in the lead (overkill given T87336)

I'm not sure if we should merge this with T87336, as the problem happens as described in T151276#2836821 and not because the ratio of the last image is better than the ratio of the first image.

We may decline this task once T87336 is resolved though.

If I understand the PageImages codebase position of image should only matter if the score of an image that appears earlier in the list is equal to one that occurs later in the list. We currently do not use section information in scores.

e..g if image a is outside the lead but has score 20 and image c has score 20 but appears in lead, image a will be used: [ a:20, b:10, c:20]

Jdlrobson claimed this task.

This is now resolved since we only load images from inside the lead section.