Page MenuHomePhabricator

Security review of FileAnnotations
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

Allows creating/editing/deleting/viewing annotations on file pages, with a structured JSON schema for storage instead of hacky and nasty wikitext/gadget storage and rendering.

Description of how the tool will be used at WMF

Presently the goal is to push it to the beta cluster for testing and exploration, then we'll hopefully push it to Commons and other wikis next quarter after polishing the features a little.

Dependencies

EventLogging, for schema validation

Has this project been reviewed before?

CR within the team and by @aaron =

Working test environment

N/A, currently only tested locally (I can fire up a labs instance if needed)

Post-deployment

Multimedia responsible team, @MarkTraceur can be lead maintainer.

Issues Found

Non-security

  • Unclear licensing (Some places say GPLv2 some say GPLv3)
  • Link in the namespace tab isn't red if file annotations page doesn't exist
  • Should the wikidata prop ids that are hard coded be config variables? Presumably those may change over time.
    • Note: Could be fixed if necessary, but until there's a problem, I don't think there's any need to change?
  • Missing hook for Code Editor extension to do syntax highlighting on JSON File_annotation pages
  • Loading OOUI-js on every page, even if unneeded. Is that ok performance wise?
    • Note: Could lazy-load? Could separate edit-able annotation loads?
  • The Sidebar has a higher z-index than the insert annotation dialog, which looks weird when they overlap. (On the other hand, we also do not want users to be able to overlap interface elements with inline css, so its probably preferable to move the dialog to the right instead of making it actually overlap the sidebar)
  • While the file annotation is a namespace, so it make sense to go with the other namespace links, I actually think it would make more sense in the drop-down links (with history, move page, etc), since usually we do not expect users to go to the file_annotations namespace page.
    • Design question, will wait for design review
  • Line 231 FileAnnotation.js: Edit summaries are not i18n-ized. (OTOH, I have no idea how to get a message in content language in JS, so perhaps not i18n-izing it is for the best.).
    • Not sure this is necessary, left undone
  • Line 429 of ApiFileAnnotations.php - Is this intentionally skipping the stuff that happens in ParserOutput::getText() ? If so, that's probably worth a comment. If not, its probably less confusing to get call ->getText() instead of mText.
  • Line 435 of ApiFileAnnotations.php - $dom->loadXML() will spew a bunch of warnings if fed invalid XML. MediaWiki can generate invalid XML in a bunch of cases, and probably will start to do so in even more cases as we move away from Tidy. Perhaps the errors should be suppressed.
  • Line 450 of ApiFileAnnotations.php - Should probably match these domains more carefully in case someone links to https://commons.wikimedia.org.some.other.domain.com. Similarly for https://foo.com?https://en.wikipedia.org/wiki/baz and https://bar.com?https://wikidata.org/Q123. This may sound far fetched, but something like https://commons.wikimedia.org/wiki/Special:Whatlinkshere/Category:Foo is probably a plausible false positive. (See also note in high severity section about renderWikipediaLink()).
  • Line 167 of ApiFileAnnotations.php - $this->getSafeCacheAsOfForUser( 'enwiki' ); doesn't look right given that it could be any wiki, not just enwiki.
  • ApiFileAnnotations.php - renderWdImage() does not handle the case where the image is missing on commons very gracefully.
    • Could probably fix this, but if there is a Wikidata claim for a Commons image that is not there, maybe it could be shunted as a content problem.

Low severity

  • ApiFileAnnotations.php - Oversight people will probably not be happy that revision deleted stuff may stay in the cache for a little while via renderWikipediaLink() and friends. But perhaps that is ok.
    • Not sure whether this matters...as likely (if not more likely) the content will be cached in a good state and never be rendered in a vandalised state
  • In the event something gets overloaded, and one of the sites we are sending GET requests to during the rendering of the annotation goes down, this will result in not caching the result, which might make availability problems become worse as now every view will result in a fetch. Perhaps failed requests should cache for a short amount of time to prevent a pile on effect.
    • I wasn't sure how to deal with this, but maybe @aaron has a thought.

High severity

  • Line 181 of ApiFileAnnotations.php - MediaWiki could be tricked into fetching an arbitrary domain since $language isn't sufficiently validated. This could be used as part of a DOS attack. If MediaWiki is not configured to use a proxy that limits internal data center requests, this could be used as a stepping stone to discover/attack services on the internal network.
  • ApiFileAnnotations.php line 435: When using DomDocument::loadXml, call it like $oldValue = libxml_disable_entity_loader( true ); $dom->loadXML( $xml ); libxml_disable_entity_loader( $oldValue ); See https://www.mediawiki.org/wiki/XML_External_Entity_Processing

Event Timeline

Review of FileAnnotation extension as of 87a8bba27045d3b1c7 (oct 13)

Overall looks good. My biggest concern is how validation of "url" render annotations are done.

Non-security

Feel free to ignore anything in this section if you disagree.

  • Unclear licensing (Some places say GPLv2 some say GPLv3)
  • Link in the namespace tab isn't red if file annotations page doesn't exist
  • Should the wikidata prop ids that are hard coded be config variables? Presumably those may change over time.
  • Missing hook for Code Editor extension to do syntax highlighting on JSON File_annotation pages
  • Loading OOUI-js on every page, even if unneeded. Is that ok performance wise?
  • The Sidebar has a higher z-index than the insert annotation dialog, which looks weird when they overlap. (On the other hand, we also do not want users to be able to overlap interface elements with inline css, so its probably preferable to move the dialog to the right instead of making it actually overlap the sidebar)
  • While the file annotation is a namespace, so it make sense to go with the other namespace links, I actually think it would make more sense in the drop-down links (with history, move page, etc), since usually we do not expect users to go to the file_annotations namespace page.
  • Line 231 FileAnnotation.js: Edit summaries are not i18n-ized. (OTOH, I have no idea how to get a message in content language in JS, so perhaps not i18n-izing it is for the best.).
  • Line 429 of ApiFileAnnotations.php - Is this intentionally skipping the stuff that happens in ParserOutput::getText() ? If so, that's probably worth a comment. If not, its probably less confusing to get call ->getText() instead of mText.
  • Line 435 of ApiFileAnnotations.php - $dom->loadXML() will spew a bunch of warnings if fed invalid XML. MediaWiki can generate invalid XML in a bunch of cases, and probably will start to do so in even more cases as we move away from Tidy. Perhaps the errors should be suppressed.
  • Line 450 of ApiFileAnnotations.php - Should probably match these domains more carefully in case someone links to https://commons.wikimedia.org.some.other.domain.com. Similarly for https://foo.com?https://en.wikipedia.org/wiki/baz and https://bar.com?https://wikidata.org/Q123. This may sound far fetched, but something like https://commons.wikimedia.org/wiki/Special:Whatlinkshere/Category:Foo is probably a plausible false positive. (See also note in high severity section about renderWikipediaLink()).
  • Line 167 of ApiFileAnnotations.php - $this->getSafeCacheAsOfForUser( 'enwiki' ); doesn't look right given that it could be any wiki, not just enwiki.
  • ApiFileAnnotations.php - renderWdImage() does not handle the case where the image is missing on commons very gracefully.

Low severity

  • ApiFileAnnotations.php - Oversight people will probably not be happy that revision deleted stuff may stay in the cache for a little while via renderWikipediaLink() and friends. But perhaps that is ok.
  • In the event something gets overloaded, and one of the sites we are sending GET requests to during the rendering of the annotation goes down, this will result in not caching the result, which might make availability problems become worse as now every view will result in a fetch. Perhaps failed requests should cache for a short amount of time to prevent a pile on effect.

High severity

  • Line 181 of ApiFileAnnotations.php - MediaWiki could be tricked into fetching an arbitrary domain since $language isn't sufficiently validated. This could be used as part of a DOS attack. If MediaWiki is not configured to use a proxy that limits internal data center requests, this could be used as a stepping stone to discover/attack services on the internal network.
  • ApiFileAnnotations.php line 435: When using DomDocument::loadXml, call it like $oldValue = libxml_disable_entity_loader( true ); $dom->loadXML( $xml ); libxml_disable_entity_loader( $oldValue ); See https://www.mediawiki.org/wiki/XML_External_Entity_Processing
Bawolff subscribed.

Change 317491 had a related patch set uploaded (by MarkTraceur):
Restrict language codes for Wikipedia annotations

https://gerrit.wikimedia.org/r/317491

Change 317493 had a related patch set uploaded (by MarkTraceur):
Disable external entities in XML processing

https://gerrit.wikimedia.org/r/317493

Those two patches should fix the high-severity issues. I think @aaron should chime in on the latter of the low-severity issues, since he largely contributed the caching code.

I'm not sure what to do about the oversight problem - it's always possible that a third party will pull a bad API request, and that's essentially what's happening here, except that we're the third party. enwiki oversighters probably shouldn't affect Commons caching. But maybe I'm wrong?

I'll work on the non-security stuff en masse.

Change 317493 merged by jenkins-bot:
Disable external entities in XML processing

https://gerrit.wikimedia.org/r/317493

I don't think this was a security issue, as the input we're parsing is trusted (output from Parser). But I agree it's good to disable external entities, just as a good practice.

Change 317491 merged by jenkins-bot:
Restrict language codes for Wikipedia annotations

https://gerrit.wikimedia.org/r/317491

Change 318308 had a related patch set uploaded (by MarkTraceur):
Fix minor problems found in security review

https://gerrit.wikimedia.org/r/318308

I don't think this was a security issue, as the input we're parsing is trusted (output from Parser). But I agree it's good to disable external entities, just as a good practice.

Yep. You are correct.

Change 318308 merged by jenkins-bot:
Fix minor problems found in security review

https://gerrit.wikimedia.org/r/318308

Bawolff claimed this task.

I'm going to mark this bug resolved. All the remaining stuff was just non-security potential suggestions anyways.