Page MenuHomePhabricator

Parsoid doesn't support image allow/deny configuration
Closed, ResolvedPublic

Description

The mediawiki software prevents the display of inline images when the image source appears in the list in [[MediaWiki:Bad_image_list]]. See GlobalFunctions.php::wfIsBadImage in mediawiki/core. There are existing parser tests for this, which use:

!!article
MediaWiki:bad image list
!!text

  • [[File:Bad.jpg]] except [[Nasty page]]

!!endarticle

to set up the bad image list.

Parsoid doesn't support this, and the core imageinfo API does not include whether the image is Bad or not. So if there is a bad image which is suppressed on the page in article view, it will suddenly appear when visual editor is started.

We probably need:

  1. To patch mediawiki/core to add the result of wfIsBadImage to imageinfo (actually, if the image is bad, we need to return the set of exception page title, so we can't use wfIsBadImage directly).
  1. To suppress the <img src> attribute if the queried image is bad for the current page context. (The resource attributes can probably stay.)

Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:52 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz73581.
Arlolra triaged this task as Medium priority.Nov 25 2014, 11:52 PM
Arlolra subscribed.

We can hide bad images with CSS if they are marked as such. An easy way to do this is to set a CSS class or typeof. Users can then override the CSS to opt out of the blacklist. Alternatively, we could implement that bit completely client-side, with CSS or JS supplied by an API.

In any case, lets avoid adding a different DOM structure (no src for example) just for blocking, as that will create extra complexity for clients.

Some lightly-edited discussion from IRC:

subbu: cscott, gwicke makes a good point about stripping the src attribute from bad images ... what are the security issues you had in mind if the src was displayed? a compromise could be to add a src to a placeholder image (Foo.jpg, for ex.) for those bad images.

cscott: Use case is vulnerabilities in libpng (for example). We want to avoid letting the client load the image.

subbu: wont such images be removed from commons? i think bad images are for "bad" images .. vandalism or what is considered not suitable for display on a particular wiki.

cscott: perhaps. but the point is that the current PHP parser does not emit an <img> attribute for blacklisted images, while parsoid does, so i just want to be sure we think through the implications carefully. browsers will still load the <img> even if the CSS is set to display: none. Is that a problem? what if the image were very large, eg?

subbu: that can be handled by pointing src to a placeholder image perhaps? without having to strip src and thus have clients handle the case of a missing src attr?

cscott: i personally would rather have src be omitted than src be incorrect. but maybe gwicke disagrees?

gwicke: just for the bandwidth savings?

cscott: well, i'm trying to think of possible ways that griefers might abuse the parsoid output. i'm probably not as creative as our troll community

gwicke: we should certainly balance those against the cost of a different DOM structure. I think it would also be a feature to be able to opt out of the block list by setting some user CSS

cscott: sure, but if the client hasn't thought about blacklist implementation, i think it's safer to omit the blacklist than allow the naive client to maybe show the blocked image.

gwicke: keeping the same DOM structure avoids client complexity, lets users fix the image etc

cscott: in any case, if you want to target via CSS, you need a different CSS-accessible class. right now we are using mw:Error for both things, and distinguishing them only in data-mw. so we'd need to add mw:Bad as a typeof or a mw-bad-image class to let the CSS distinguish between broken images and blacklisted images

subbu: gwicke, cscott so, 3 options: (a) no src (b) placeholder src (c) leave src but use css. (b) and (c) lets clients not worry about special checks. (a) and (b) replicate php parser behavior. so, respecting wiki policy seems the safer path there.

cscott: no, PHP parser behavior does not have the browser load the referenced image, and does not make the referenced image available to scripts. it completely removes all traces of it from the output. display:none does *not* mean "do not load this image"; most browsers will load the image, regardless of display.

gwicke: cscott: http://timkadlec.com/2012/04/media-query-asset-downloading-results/ might be interesting to you

gwicke: no matter what you do, please keep the costs of irregular DOM structures in mind. maybe it's worth it, that's a matter of relative importance

cscott: sure. the options are (a) [typeof~="mw:Error"] and no src, vs (c) [typeof~="mw:Error"].mw-bad-image or something like that.

subbu: my current preference is to use a placeholder image for bad images just like a placeholder image for missing images.

cscott: i think with a placeholder you still need to tag with mw-bad-image or something like that. and it begs the question about what the src attribute should actually be.

subbu: yes, we still need the right mw:error typeof and error tag. i could be convinced by option (c) .. just not quite yet.

cscott: my point still stands: with the src attribute, we will be loading the image regardless of blacklist setting. i'm not certain that's a bad thing, necessarily -- we don't currently have image-size griefers, AFAIK -- i'm just pointing out that it's a change in behavior.

gwicke: bad image blocking doesn't help with image size attacks. you can use any image for those

cscott: yeah, i'm not sure this will be of interest to griefers. but i'm also not certain that i've thought of all the ways that "display: none" might be abused by griefers, either.

cscott: OTOH, I'm sympathetic to the "use CSS" argument, and I'm not sure how you'd target "empty src attribute" with CSS.

gwicke: img[src=""] ?

cscott: does that actually work? i'd need to test it. i certainly agree that img.mw-blacklist or something like that would be a more convenient & obvious selector. so perhaps we should agree to use a class or typeof tag, regardless of what we decide the src attribute should be

cscott: and to be clear, my argument for omitting the src attribute is mostly about being conservative wrt "security" (where by security i mean things vandals might abuse), not an actual attack. so i'm not -2 on including a src attribute, i'm just -0.

RoanKattouw: If [src=""] doesn't work for asserting an empty src attribute, you could use :not([src])
RoanKattouw: Or img { property: onevalue; } img[src] { property: othervalue; }
RoanKattouw: Avoiding :not() when possible using the latter strategy is always nicer

cscott: i don't think many browsers implement :not, though. and it has kinda funny semantics, IIRC.

RoanKattouw: support isn't that bad: http://caniuse.com/#search=%3Anot

cscott: and in any case i think we're agreed that matching "no src" is kinda awkward in CSS, so it would be better to use an explicit class or typeof.

RoanKattouw: Yes that would be much better

Note that there are also a number of related configuration variables (T254803), and parts of this feature (allowing external images, the "allow list" for external images, etc) aren't used in WMF production. It's possible that, rather than implement this in Parsoid, we should add a narrow hook for validating image URLs and move all the allow/deny logic into an extension (T254802). The extension could share most or all of the code between the Parsoid and core implementations by implementing the hook appropriately.

cscott renamed this task from Parsoid doesn't support image blacklist. to Parsoid doesn't support image allow/deny configuration.Jun 10 2020, 5:04 PM
ssastry removed a project: Parsing-Active-Work.

Change 771376 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add badFile to standalone response

https://gerrit.wikimedia.org/r/771376

To patch mediawiki/core to add the result of wfIsBadImage to imageinfo (actually, if the image is bad, we need to return the set of exception page title, so we can't use wfIsBadImage directly).

This part was done in https://github.com/wikimedia/parsoid/commit/fafe7291fdb03b68a03e578b4b91a8cfc32d6fbe

Change 771376 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add badFile to standalone response

https://gerrit.wikimedia.org/r/771376

Change 774513 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.16.0-a3

https://gerrit.wikimedia.org/r/774513

Change 774513 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.16.0-a3

https://gerrit.wikimedia.org/r/774513

Change 777450 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add check for bad files

https://gerrit.wikimedia.org/r/777450

Change 778362 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Fix bad file tests in parserTests

https://gerrit.wikimedia.org/r/778362

Change 777450 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add check for bad files

https://gerrit.wikimedia.org/r/777450

Change 778362 merged by jenkins-bot:

[mediawiki/core@master] Fix bad file tests in parserTests

https://gerrit.wikimedia.org/r/778362

Change 779538 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Move bad file tests to mediaParserTests

https://gerrit.wikimedia.org/r/779538

Change 784338 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a6

https://gerrit.wikimedia.org/r/784338

Change 784341 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@wmf/1.39.0-wmf.8] Bump parsoid to 0.16.0-a6

https://gerrit.wikimedia.org/r/784341

Change 784338 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a6

https://gerrit.wikimedia.org/r/784338

Change 784341 abandoned by Arlolra:

[mediawiki/vendor@wmf/1.39.0-wmf.8] Bump parsoid to 0.16.0-a6

Reason:

Will just ride the train next week

https://gerrit.wikimedia.org/r/784341

Change 785914 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Re-enabled bad file tests

https://gerrit.wikimedia.org/r/785914

Change 779538 merged by jenkins-bot:

[mediawiki/core@master] Move bad file tests to mediaParserTests

https://gerrit.wikimedia.org/r/779538

Change 785914 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Re-enabled bad file tests

https://gerrit.wikimedia.org/r/785914

Change 792236 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a8

https://gerrit.wikimedia.org/r/792236

Change 792236 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.16.0-a8

https://gerrit.wikimedia.org/r/792236