PageImages not compatible with webm files
Closed, DeclinedPublic2 Story Points

Description

Report at https://www.mediawiki.org/wiki/Topic:Scb851zpf3p2i230
Visit https://en.wikipedia.beta.wmflabs.org/wiki/Category:Page_Previews and hover over "Test video hovers". A black box shows.

Possible solutions

Allow editors to set a default thumbnail for videos

Blacklisting

  • Blacklist webm files from PageImages extension

Developer notes

There are 2 possible ways to do this. Inside LinksUpdateHookHandler::getUrlBlacklist or operating on LinksUpdateHookHandler::fetchFileMetadata

Signoff criteria

There are a very large number of changes, so older changes are hidden. Show Older Changes

Needs investigation to consistently reproduce.

ovasileva lowered the priority of this task from Normal to Low.Jan 26 2017, 4:13 PM
ovasileva added a subscriber: ovasileva.

hovercards/pageimages should not be returning images. It seems the image displayed is actually the image appearing in the infobox of the article

Jdlrobson raised the priority of this task from Low to Normal.Jun 2 2017, 8:59 PM

hovercards/pageimages should not be returning images. It seems the image displayed is actually the image appearing in the infobox of the article

Not correct.

The image being used is a video. See https://de.wikipedia.org/wiki/Citizenfour?action=info

https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/CITIZENFOUR_%282014%29_trailer.webm/300px--CITIZENFOUR_%282014%29_trailer.webm.jpg is being chosen as the page image

This looks like a special case which would be best fixed by T91683 IMO
But we should check the PageImage extension doesn't use webm files as page images and blacklist if necessary.

@Jdlrobson - I agree, at least as a temporary solution prior to T91683: Allow editors control of the page image

Jdlrobson updated the task description. (Show Details)Jun 6 2017, 4:54 PM
ovasileva set the point value for this task to 2.Jun 6 2017, 5:06 PM
ovasileva updated the task description. (Show Details)Jun 22 2017, 1:39 PM
Jdlrobson renamed this task from PageImages should use the specified thumbnail-timestamp for video files to PageImages should blacklist webm files.Jun 22 2017, 7:09 PM
Tbayer added a subscriber: Tbayer.Jul 26 2017, 7:09 PM

@Jdlrobson - I agree, at least as a temporary solution prior to T91683: Allow editors control of the page image

Agree that blacklisting is OK as a temporary solution. But we should be aware that T91683 puts quite a lot of unnecessary work on editors. As the task description says, the proper solution is to support thumbtime in the preview directly. (Also, even the manual workaround option will depend on whether T91683 will support thumbtime itself.)

Jdlrobson added a comment.EditedJul 26 2017, 7:54 PM

I disagree. We should measure how many pages are impacted by this issue. My guess is low. I'd argue an article with only a video in the lead would probably benefit from an image too.

Tbayer added a comment.Sep 6 2017, 5:28 PM

I disagree. We should measure how many pages are impacted by this issue. My guess is low. I'd argue an article with only a video in the lead would probably benefit from an image too.

Not quite sure what "disagree" referred to exactly here. But it should be quite obvious that asking editors to replace the existing editorial choice (video still) with a different image means creating manual work that would be unnecessary if previews were able to handle that image format. I agree it may be worth measuring how many pages are affected, and if the number is very low, this solution (blacklisting webms and have editors replace the image) could be an acceptable tradeoff - but it would still be a tradeoff, requiring some amount of volunteer work to work around shortcomings of our code.

@Jdlrobson, @ovasileva, is some further discussion with @Tbayer necessary to work on this task?

ovasileva lowered the priority of this task from Normal to Low.Oct 10 2017, 12:29 PM
Jdlrobson updated the task description. (Show Details)Nov 15 2017, 7:46 PM
ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Mar 15 2018, 3:55 PM
Krinkle added a subscriber: Krinkle.

Looks like this is still broken.


At https://en.wikipedia.org/wiki/Ready_Player_One#Plot, the link to Scoreboard is enhanced with a preview attempting to use a raw .webm file as target for <image href> inside an SVG. Aside from creating a broken thumbnail.

[page summary API data](https://en.wikipedia.org/api/rest_v1/page/summary/Score_(game):

"thumbnail": {
  "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Greg_gets_59m_high_score_in_Solipskier.webm/320px--Greg_gets_59m_high_score_in_Solipskier.webm.jpg",
  "width": 320, "height" :180 
},
"originalimage": {
  "source": "https://upload.wikimedia.org/wikipedia/commons/9/9e/Greg_gets_59m_high_score_in_Solipskier.webm"

It seems the API does provide a thumbnail, but the JS code is deciding to use the original instead of restricting to thumbnails. Not sure why. There shouldn't be any scenario in which a consumer should use an original directly on a page.

Internally within MediaWiki, there are some edge cases where a small photo may not be worth re-scaling if it is already small and in a page/browser-compatible format, in which case MediaWiki is known to re-use the original file url and advertise it as a "thumbnail". That's fine for MediaWiki to do internally, but there should be no need for consumers to do that. (example 1, example 2)

But.. maybe the page summary API is breaking that, which would explain why Popups considers the originalimage?

Maybe the bug summary should be rephrased: the issue is that page previews attempt to load something that is not suitable as thumbnail.

ovasileva raised the priority of this task from Low to Normal.Apr 11 2018, 12:45 PM

@Krinkle - interesting...it's showing the thumbnail for me:

Regardless, might be a good time to fix this anyhow.

@ovasileva It likely depends on device pixel ratio. It only happens if dppx 2.0 or higher (aka "Retina" or "HiDPI") as otherwise Popups logic will probably use the thumbnail. See https://mydevice.io/ for example.

Should we use a whitelist (.gif, .jpg, .png, ...) instead of a blacklist? We recently had an issue with previews using .ogv files which rendered as a similar broken image (it seems to work now though).

brion added a comment.Apr 25 2018, 5:37 PM

Is there a problem preventing using still images from videos, specifically, or is there another reason y'all are trying to blacklist videos? And is it just videos or anything with extra parameters? Does this affect PDF, TIFF, and DjVu files with multiple pages, for instance, or SVG files rendered with a particular language?

Jdlrobson updated the task description. (Show Details)Apr 25 2018, 5:45 PM

@brion the issue is that the thumbnail of video's can sometimes be a blank screen e.g. https://en.m.wikipedia.org/wiki/File:CITIZENFOUR_(2014)_trailer.webm

The most ideal solution here would be for editors to be able to set specific thumbnails for video at a certain frame point. Any ideas on how to do that? I'd love for us to not have to touch PageImages here.

brion added a comment.Apr 25 2018, 5:55 PM

Editors can use the thumbtime parameter to set the thumbnail to load from a particular timestamp: https://www.mediawiki.org/wiki/Extension:TimedMediaHandler#Syntax_synopsis

A default thumbtime for the File: page can't currently be set, but any regular usages in articles can set them; for instance for

[[File:CITIZENFOUR (2014) trailer.webm|thumbtime=1:00]]

you'll get the more attractive thumbnail:

https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/CITIZENFOUR_%282014%29_trailer.webm/1280px-seek%3D60-CITIZENFOUR_%282014%29_trailer.webm.jpg

(Note the seek=60 in the filename in the place where there's blank on the default thumbnail.)

(I think there might be an old task open for providing a way to set a default to be used.)

@brion but the problem is https://en.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=query&format=json&prop=pageimages&titles=Test_video_hovers&piprop=thumbnail%7Cname%7Coriginal&pithumbsize=50&pilimit=50 returns the default thumbnail... essentially this code:

$file = wfFindFile( $fileName );
$thumb = $file->transform( [ 'width' => $size, 'height' => $size ] );

The thumbtime parameter only seems to apply in wiki it's used on, so ideally we'd need some way to define the default thumb within the file page.
Do you have any sense of how difficult it would be to do that and how that could be done?

A default thumbtime for the File: page can't currently be set,

I think that would be the perfect solution here.

brion added a comment.Apr 25 2018, 6:32 PM

Ok, so looks like two possible routes here (which may work together nicely, as well):

  1. Provide a way to set a default thumb time that's used when no parameter provided, which would then get used when generating the thumbs from PageImages.
  1. Have PageImages save any non-size parameters used alongside the filename, store then in the page props, and re-apply them when running the transform.

There's then the separate issue of whether the raw source file gets used in some circumstances, such as high devicePixelRatio?

The thumbtime parameter only seems to apply in wiki it's used on, so ideally we'd need some way to define the default thumb within the file page.
Do you have any sense of how difficult it would be to do that and how that could be done?

Let's revive T22647 for that discussion! Original suggestion was something like a keyword or parser function to be used on the File: page and saved into metadata of some sort. That probably works, though we need to make sure the default time gets exported across wikis for Commons (including via InstantCommons).

Jdlrobson added a subscriber: Tgr.Apr 25 2018, 6:42 PM
Krinkle added a comment.EditedApr 28 2018, 1:38 AM

The suggestions and ideas around video thumbnails sound good, but I think we're getting distracted from the original issue. Even if we have customisable video thumbnails, it won't solve the underlying issue. The issue is that Popups sometimes uses the original file instead of the thumbnail. This is a bug. It must not do that.

The main problem described in this task is a video file being downloaded for an image element. This causes visible corruption (see F16907798), and needlessly consumes network/device resources.

There are also likely other ways original files may cause problems such as with raw SVGs (MediaWiki intentionally does not display user-uploaded SVGs directly on pages), and (paged) TIFF files, 3D files, and more.

The client should always a thumbnail provided by the API, or at least a url that is derived from the thumbnail url (Popups sometimes increases the width based on devicePixelRatio, which is fine). It should pick either the default thumbnail, a larger variant , or no thumbnail.

That should be relatively easy to change.

Future

Alternatively, instead of the current client-side implementation, the Page Summary API could provide one or two extra thumb urls for common ratios. The MediaWiki imageinfo API that Page Summary uses already has a batching to facilitate this.

MediaWiki does internally have a narrow set of scenarios in which it may expose a "thumbnailurl" fieldthat matches the original. (Mostly for JPEGs as optimisation, up for debate at T67383). Popups will already handle that transparently like a thumbnail. But its own code shouldn't consider the original. originalimage may not even need to be in the Page Summary API?

The issue is that Popups sometimes uses the original file instead of the thumbnail. This is a bug. It must not do that.

I'm not aware of this happening. Can you qualify what you mean by "sometimes"? I've never seen this happen so would like to reproduce as this sounds like another bug. We use the thumbnail from https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/summary/Test_video_hovers which comes from the mediawiki API.

Jdlrobson renamed this task from PageImages should blacklist webm files to PageImages not compatible with webm files.Apr 30 2018, 4:47 PM
Jdlrobson updated the task description. (Show Details)

I think the line in question is here (although I don't have an example handy): https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Popups/+/master/src/gateway/rest.js#139.

If that's turning thumbnails into videos, that's news to me. We should file a separate ticket with replication steps.

@Jdlrobson, well, what I mean is that if you look at an example like:

// https://en.wikipedia.org/api/rest_v1/page/summary/Completing_the_square

{
  "type": "standard",
  "title": "Completing the square",
  "displaytitle": "Completing the square",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "titles": {
    "canonical": "Completing_the_square",
    "normalized": "Completing the square",
    "display": "Completing the square"
  },
  "pageid": 303500,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/Completing_the_square.ogv/320px--Completing_the_square.ogv.jpg",
    "width": 320,
    "height": 240
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/3/3d/Completing_the_square.ogv",
    "width": 640,
    "height": 480
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "836342689",
  "tid": "0f82bedc-3fa5-11e8-8178-ff9ff9b54684",
  "timestamp": "2018-04-14T05:31:14Z",
  "description": "technique used to solve a quadratic equation.",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Completing_the_square",
      "revisions": "https://en.wikipedia.org/wiki/Completing_the_square?action=history",
      "edit": "https://en.wikipedia.org/wiki/Completing_the_square?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Completing_the_square"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Completing_the_square",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Completing_the_square",
      "edit": "https://en.m.wikipedia.org/wiki/Completing_the_square?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Completing_the_square"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Completing_the_square",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Completing_the_square",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Completing_the_square"
  },
  "extract": "In elementary algebra, completing the square is a technique for converting a quadratic polynomial of the form",
  "extract_html": "<p>In elementary algebra, <b>completing the square</b> is a technique for converting a quadratic polynomial of the form</p><dl><dd><span class=\"mwe-math-element\"><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/126c6935d3dd9f1c1da0c388ca2799be4f6f237c\" class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"vertical-align:-0.505ex;width:12.629ex;height:2.843ex;\" /></span></dd></dl>"
}

originalimage is a video.

Tbayer added a comment.May 2 2018, 3:04 AM

A default thumbtime for the File: page can't currently be set,

I think that would be the perfect solution here.

I don't think this would be a good solution.

First, it would create a lot of extra work for the editors who would have to transfer locally chosen thumbtimes to Commons as the default thumbtime, and update them there in case they are changed locally.

Second, different articles may require different thumbtimes - that's not a very exotic situation, in fact I just recently uploaded a webm myself where this was the case, see https://commons.wikimedia.org/wiki/File:Robert_Schumann_-_Fantasiest%C3%BCcke,_Op._73_(Narek_Hakhnazaryan,_Cello,_Roman_Rabinovich,_Piano,_2008).webm#globalusage (for the article https://en.wikipedia.org/wiki/Narek_Hakhnazaryan , a still focusing on one of the two players was used, but for the other two articles that would have been less suitable)

@Jdlrobson, well, what I mean is that if you look at an example like:

Splitting out that conversation into T193792

First, it would create a lot of extra work for the editors who would have to transfer locally chosen thumbtimes to Commons as the default thumbtime, and update them there in case they are changed locally.

I'm not sure it's true that it creates a lot of work.
We've only seen 1 case in the wild so far where the page image has been the initial screen and the initial screen of that video was a black screen. The default of the first frame would continue to work without any editor intervention, but it would give editors an option to override that where necessary. Articles would continue to be able to use thumbtime as far as I'm concerned. We should probably move this discussion to T22647. Any how, this seems a lot better than blacklisting webm files altogether which was the original plan and what's suggested here...?

Jdlrobson changed the task status from Open to Stalled.May 3 2018, 8:09 PM

I'm stalling this task until we've worked out whether T22647 is viable - let's talk about that there!

ovasileva closed this task as Declined.Jun 12 2018, 4:34 PM

Seems like T22647: Allow way to choose thumbnail frame for video on its File: description page is the way to move forward with this. Closing this.

...

First, it would create a lot of extra work for the editors who would have to transfer locally chosen thumbtimes to Commons as the default thumbtime, and update them there in case they are changed locally.

I'm not sure it's true that it creates a lot of work.
We've only seen 1 case in the wild so far where the page image has been the initial screen and the initial screen of that video was a black screen.

Well, but that's a very narrow interpretation of the issue that this task should have resolved. It is more appropriately described as "the initial screen is not a suitable page image".

There are >1500 uses of the thumbtime parameter on enwiki alone. (And presumably this number will grow as ongoing and future multimedia efforts increase content and the adoption of the relatively young thumbtime parameter spreads.) While the video may not generate the PageImages image in all of these cases, it's clearly a more widespread problem.

Seems like T22647: Allow way to choose thumbnail frame for video on its File: description page is the way to move forward with this. Closing this.

It doesn't solve the case described in T92457#4173454 (the same video used with two different thumbtimes in https://en.wikipedia.org/wiki/Narek_Hakhnazaryan and https://en.wikipedia.org/wiki/Fantasy_Pieces_for_Clarinet_and_Piano_(Schumann) ).

Well, but that's a very narrow interpretation of the issue that this task should have resolved.

Sure, but at least it provides an option for certain use cases. I also hope we'd agree this is better than blacklisting all webm files.

It doesn't solve the case described in T92457#4173454 (the same video used with two different thumbtimes...)

That's correct.

I think this task was always a bit unhelpfully vague, so I've opened a more specific task at T197839. I'm not sure if it's feasible for us to fix it, as page images only captures the title of the associated page image and would likely need quite a large re-architecture to support that but we'll see.

Well, but that's a very narrow interpretation of the issue that this task should have resolved.

Sure, but at least it provides an option for certain use cases. I also hope we'd agree this is better than blacklisting all webm files.

Right, that true of course.

It doesn't solve the case described in T92457#4173454 (the same video used with two different thumbtimes...)

That's correct.

I think this task was always a bit unhelpfully vague, so I've opened a more specific task at T197839. I'm not sure if it's feasible for us to fix it, as page images only captures the title of the associated page image and would likely need quite a large re-architecture to support that but we'll see.

Thanks!