Page MenuHomePhabricator

Parsoid doesn't support image blacklist.
Open, NormalPublic

Description

The mediawiki software prevents the display of inline images when the image source appears in the list in [[MediaWiki:Bad_image_list]]. See GlobalFunctions.php::wfIsBadImage in mediawiki/core. There are existing parser tests for this, which use:

!!article
MediaWiki:bad image list
!!text

  • [[File:Bad.jpg]] except [[Nasty page]]

!!endarticle

to set up the bad image list.

Parsoid doesn't support this, and the core imageinfo API does not include whether the image is Bad or not. So if there is a bad image which is suppressed on the page in article view, it will suddenly appear when visual editor is started.

We probably need:

  1. To patch mediawiki/core to add the result of wfIsBadImage to imageinfo (actually, if the image is bad, we need to return the set of exception page title, so we can't use wfIsBadImage directly).
  1. To suppress the <img src> attribute if the queried image is bad for the current page context. (The resource attributes can probably stay.)

Version: unspecified
Severity: normal

Details

Reference
bz73581

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:52 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz73581.
cscott created this task.Nov 18 2014, 9:05 PM
Arlolra triaged this task as Normal priority.Nov 25 2014, 11:52 PM
Arlolra added a subscriber: Arlolra.

We can hide bad images with CSS if they are marked as such. An easy way to do this is to set a CSS class or typeof. Users can then override the CSS to opt out of the blacklist. Alternatively, we could implement that bit completely client-side, with CSS or JS supplied by an API.

In any case, lets avoid adding a different DOM structure (no src for example) just for blocking, as that will create extra complexity for clients.

cscott added a comment.Dec 1 2014, 9:11 PM

Some lightly-edited discussion from IRC:

subbu: cscott, gwicke makes a good point about stripping the src attribute from bad images ... what are the security issues you had in mind if the src was displayed? a compromise could be to add a src to a placeholder image (Foo.jpg, for ex.) for those bad images.

cscott: Use case is vulnerabilities in libpng (for example). We want to avoid letting the client load the image.

subbu: wont such images be removed from commons? i think bad images are for "bad" images .. vandalism or what is considered not suitable for display on a particular wiki.

cscott: perhaps. but the point is that the current PHP parser does not emit an <img> attribute for blacklisted images, while parsoid does, so i just want to be sure we think through the implications carefully. browsers will still load the <img> even if the CSS is set to display: none. Is that a problem? what if the image were very large, eg?

subbu: that can be handled by pointing src to a placeholder image perhaps? without having to strip src and thus have clients handle the case of a missing src attr?

cscott: i personally would rather have src be omitted than src be incorrect. but maybe gwicke disagrees?

gwicke: just for the bandwidth savings?

cscott: well, i'm trying to think of possible ways that griefers might abuse the parsoid output. i'm probably not as creative as our troll community

gwicke: we should certainly balance those against the cost of a different DOM structure. I think it would also be a feature to be able to opt out of the block list by setting some user CSS

cscott: sure, but if the client hasn't thought about blacklist implementation, i think it's safer to omit the blacklist than allow the naive client to maybe show the blocked image.

gwicke: keeping the same DOM structure avoids client complexity, lets users fix the image etc

cscott: in any case, if you want to target via CSS, you need a different CSS-accessible class. right now we are using mw:Error for both things, and distinguishing them only in data-mw. so we'd need to add mw:Bad as a typeof or a mw-bad-image class to let the CSS distinguish between broken images and blacklisted images

subbu: gwicke, cscott so, 3 options: (a) no src (b) placeholder src (c) leave src but use css. (b) and (c) lets clients not worry about special checks. (a) and (b) replicate php parser behavior. so, respecting wiki policy seems the safer path there.

cscott: no, PHP parser behavior does not have the browser load the referenced image, and does not make the referenced image available to scripts. it completely removes all traces of it from the output. display:none does *not* mean "do not load this image"; most browsers will load the image, regardless of display.

gwicke: cscott: http://timkadlec.com/2012/04/media-query-asset-downloading-results/ might be interesting to you

gwicke: no matter what you do, please keep the costs of irregular DOM structures in mind. maybe it's worth it, that's a matter of relative importance

cscott: sure. the options are (a) [typeof~="mw:Error"] and no src, vs (c) [typeof~="mw:Error"].mw-bad-image or something like that.

subbu: my current preference is to use a placeholder image for bad images just like a placeholder image for missing images.

cscott: i think with a placeholder you still need to tag with mw-bad-image or something like that. and it begs the question about what the src attribute should actually be.

subbu: yes, we still need the right mw:error typeof and error tag. i could be convinced by option (c) .. just not quite yet.

cscott: my point still stands: with the src attribute, we will be loading the image regardless of blacklist setting. i'm not certain that's a bad thing, necessarily -- we don't currently have image-size griefers, AFAIK -- i'm just pointing out that it's a change in behavior.

gwicke: bad image blocking doesn't help with image size attacks. you can use any image for those

cscott: yeah, i'm not sure this will be of interest to griefers. but i'm also not certain that i've thought of all the ways that "display: none" might be abused by griefers, either.

cscott: OTOH, I'm sympathetic to the "use CSS" argument, and I'm not sure how you'd target "empty src attribute" with CSS.

gwicke: img[src=""] ?

cscott: does that actually work? i'd need to test it. i certainly agree that img.mw-blacklist or something like that would be a more convenient & obvious selector. so perhaps we should agree to use a class or typeof tag, regardless of what we decide the src attribute should be

cscott: and to be clear, my argument for omitting the src attribute is mostly about being conservative wrt "security" (where by security i mean things vandals might abuse), not an actual attack. so i'm not -2 on including a src attribute, i'm just -0.

RoanKattouw: If [src=""] doesn't work for asserting an empty src attribute, you could use :not([src])
RoanKattouw: Or img { property: onevalue; } img[src] { property: othervalue; }
RoanKattouw: Avoiding :not() when possible using the latter strategy is always nicer

cscott: i don't think many browsers implement :not, though. and it has kinda funny semantics, IIRC.

RoanKattouw: support isn't that bad: http://caniuse.com/#search=%3Anot

cscott: and in any case i think we're agreed that matching "no src" is kinda awkward in CSS, so it would be better to use an explicit class or typeof.

RoanKattouw: Yes that would be much better

marcoil set Security to None.