Page MenuHomePhabricator

Security review for WikibaseMediaInfo
Closed, ResolvedPublic

Description

Please review the WikibaseMediaInfo extension for deployment to the WMF cluster. This extension will be enabled on Wikimedia Commons, and is part of the StructuredDataOnCommons project.

Please note, this deployment will not happen for some time as we are blocked on partial completion of Multi-Content-Revisions - we're shooting for a release in October, if you're scheduling reviews out that far.

Event Timeline

From the other task:

Was already done in T159709 unless there is anything that makes you think we should repeat it.

Ah, hmm. @Ramsey-WMF mentioned it. Will Security think we need one with the migration to MCR-based storage?

Review of d7c4404e607 (Oct 2, 2018) of WikibaseMediaInfo

Overall:

I'm concerned about a general lack of escaping in the html generation part of the code. In addition to the parts that must be escaped, I would strongly encourage escaping "safe" non-html values that are generated far away, so that it is easy to tell at a glance that the code is safe. For example, I would strongly encourage wrapping things like $this->languageDirectionalityLookup->getDirectionality( $languageCode ); in htmlspecialchars so that we can easily tell such things are safe without having to follow the various levels of indirection across many modules to verify its safety.

PHP security issues

  • "WikibaseMediaInfoHooks.php" line 229 - raw html messages not allowed - wikibasemediainfo-filepage-structured-data-heading
    • filepage-entitytermscaptionelement - probably things like langauge code and directionality are safe, but would be nice to escape so it can be seen at a glance.
    • "FilePageEntityTermsView.php" line 168 - wikibasemediainfo-entitytermsforlanguagelistview-caption used as raw html.
  • "WikibaseMediaInfoHooks.php" line 190 - You should call $page->getContext()->getOutput()->preventClickjacking(); as you are adding interaction to the image page.
  • "FilePageEntityTermsView.php" line 179 - It appears that labels aren't properly escaped, resulting in XSS if the user puts a script tag in a label

Javascript:

  • "CaptionsPanel.js" line 81 - wikibasemediainfo-filepage-caption-empty Raw html messages are not allowed
  • "CaptionsPanel.js" line 66 - the language name should always be text right? Can we just use .text() here instead of .html().
  • "CaptionsPanel.js" line 160 - wikibasemediainfo-filepage-caption-approaching-limit - Raw html messages are not allowed
  • "CaptionsPanel.js" line 185 - wikibasemediainfo-filepage-caption-too-long - Raw html is not allowed.
  • "CaptionsPanel.js" line 504 - Should (presumably) use .text() for error.detailedMessage
  • XSS in label caption handling (e.g. Create a label containing <script>alert(1);</script>). This appears to be a separate XSS from the the one on the php side.

Other non-security comments

  • I was under the impression this was going to use MCR. I'm not sure I understand how this is integrating into that. I don't see any MCR specific code here
  • "WikibaseMediaInfoHooks.php" line 187 - I think calling wfScript( 'api' ); would maybe be better to better isolate against any changes in mediawiki path handling.
  • "WikibaseMediaInfoHooks.php" line 313 - I'm kind of confused what the point of this is, if we can always programmatically determine what the entity is, why bother saving it to the db?

I was under the impression this was going to use MCR. I'm not sure I understand how this is integrating into that. I don't see any MCR specific code here

That landed five minutes ago, in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/462743

Change 466531 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Security fixes from T200279

https://gerrit.wikimedia.org/r/466531

Hey Brian - see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/466531

PHP security issues

"WikibaseMediaInfoHooks.php" line 229 - raw html messages not allowed - wikibasemediainfo-filepage-structured-data-heading

Fixed on line 175

"WikibaseMediaInfoHooks.php" line 190 - You should call $page->getContext()->getOutput()->preventClickjacking(); as you are adding interaction to the image page.

Called on line 195

filepage-entitytermscaptionelement - probably things like langauge code and directionality are safe, but would be nice to escape so it can be seen at a glance.

Fixed in MediaInfoEntityTermsView.php on the method starting on line 182

"FilePageEntityTermsView.php" line 168 - wikibasemediainfo-entitytermsforlanguagelistview-caption used as raw html.

Fixed in MediaInfoEntityTermsView.php on line 170

"FilePageEntityTermsView.php" line 179 - It appears that labels aren't properly escaped, resulting in XSS if the user puts a script tag in a label

Fixed in MediaInfoEntityTermsView.php on the method starting on line 182

Javascript:

"CaptionsPanel.js" line 81 - wikibasemediainfo-filepage-caption-empty Raw html messages are not allowed

Fixed line 132

"CaptionsPanel.js" line 66 - the language name should always be text right? Can we just use .text() here instead of .html().

The variable languageContent can contain just a string or a language selector. I've now escaped it where necessary when calling createTableRow

"CaptionsPanel.js" line 160 - wikibasemediainfo-filepage-caption-approaching-limit - Raw html messages are not allowed

Fixed line 211 (using .text())

"CaptionsPanel.js" line 185 - wikibasemediainfo-filepage-caption-too-long - Raw html is not allowed.

Fixed line 326 (using .text())

"CaptionsPanel.js" line 504 - Should (presumably) use .text() for error.detailedMessage

Fixed line 595 (using .text())

XSS in label caption handling (e.g. Create a label containing <script>alert(1);</script>). This appears to be a separate XSS from the the one on the php side.

Fixed using mw.html.escape on the arguments whenever createTableRow is called

Other non-security comments

I was under the impression this was going to use MCR. I'm not sure I understand how this is integrating into that. I don't see any MCR specific code here

See James's comment - MCR code is in the latest

"WikibaseMediaInfoHooks.php" line 187 - I think calling wfScript( 'api' ); would maybe be better to better isolate against any changes in mediawiki path handling.

Done (line 213)

"WikibaseMediaInfoHooks.php" line 313 - I'm kind of confused what the point of this is, if we can always programmatically determine what the entity is, why bother saving it to the db?

See https://phabricator.wikimedia.org/T205417

Change 466531 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Security fixes from T200279

https://gerrit.wikimedia.org/r/466531

Ok looks good. thanks for your patience on this, I know i was a bit delayed.

Small comment, you don't have to escape languageCode and direction arguments to createTableRow() because .attr() will do the escaping for you.