Page MenuHomePhabricator

Security review for ElectronPdfService Extension
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

The ElectronPdfService extension provides access to the Electron Service for browser based PDF rendering.

Description of how the tool will be used at WMF

The extension adds a new "Download as PDF" link to the MediaWiki sidebar. By clicking it, the actual page is rendered as PDF using the Electron PDF service. If the Collection extension is installed and provides another method for PDF rendering, clicking the "Download as PDF" link will provide a selection screen (see https://phabricator.wikimedia.org/T142201#2660206) to choose between the Electron rendered PDF and the rendering provided by the Collection extension.

Dependencies

Has this project been reviewed before?

Only review inside the WMDE-TechWish.

Working test environment

wfLoadExtension( 'ElectronPdfService' );
$wgElectronPdfService = [
	"serviceUrl" => "https://pdf-electron.wmflabs.org",
	"format" => "pdf",
	"key" => "secret",
	"pageUrl" => "https://en.wikipedia.org/wiki/Caipirinha"
];
  • In order to render a page, it needs to be publicly accessible. If testing on local machine, you need to add one in the "pageUrl" setting (see above).

Post-deployment

WMDE-TechWish

Related Objects

Event Timeline

@dpatrick So it looks like the date of the event has passed, has the review happened? or..?

@dpatrick So it looks like the date of the event has passed, has the review happened? or..?

We're finishing the review early this week and will release notes shortly.

Sorry for the delay in reviewing this. The extension looks good and code is easy to read.

Most of what I was checking for was XSS. The use of the OOUI methods made such issues much less likely, which is awesome to see :) Overall looks very good.

Non-security

Feel free to ignore anything in this section if you disagree

  • ElectronPdfServiceHooks::onSidebarBeforeOutput() - This returns false if it is not a view/purge action. Returning false prevents all other subscribers to this hook from running. I don't think that is wanted in this case.
  • In the switch statement in SpecialElectronPdf.php line 48 - the redirect case falls through to the default case. Well this doesn't overly matter since the page ends up a redirect, it could be a little confusing if someone was refactoring the code at a later date and didn't notice. I think the code would be more clear if that case also had a return.
  • Note there is a StreamFile class in core that you might be able to make use of here.
  • ext.ElectronPdfService.special.js - That should probably run in an onload event - the initial call to setSelected isn't going to do much if the js runs before the form is loaded (Javascript is often super heavily cached, so its entirely possible for JS to execute prior to most of the html page being loaded).

Minor

  • In ElectronPdfServiceHooks::generateSelectionScreenLink() - 'coll-download-url' => urlencode( $collectionUrl ) - This is double encoding the url.
  • Line 49 ElectronPdfServiceHooks - 'text' => $skin->msg( 'electronPdfService-sidebar-portlet-print-text' )->escaped(... (and similar lines)This is double escaping. BaseTemplate::makeLink will escape the text key before outputting.
  • It would be nice if writeToTempFile kept track of how much data it was writing, and had some sort of hard limit as a defense in depth mechanism in case there's ever a bug where the user can generate a gb big pdf file.
  • Its preferred to use TempFSFile::factory instead of tmpfile().
  • If the api key for the electron service is supposed to stay secret, it would be better to do a POST request with the key in the post body, as that way it is much less likely to end up in log files.
  • As a precautionary measure, maybe call $this->getRequest()->checkUrlExtension(); when doing pdf download.

Normal

  • SpecialElectronPdf::sendPdfToOutput() - The content-disposition header should be sanitized.
    • Web browsers are supposed to only listen to the header if the suggested filename is sane. However as a precaution I think we should refuse to set the header to anything where the first character is a period or anything containing a /
    • I wonder if we should use $title->getPrefixedDBKey() instead of ->getPrefixedText() for the pdf name (Underscores seem more natural than spaces in a file name, to me). On the other hand, with modern GUI interfaces, spaces in file names are rather common, so maybe its fine to stick with the spaces
    • I believe the filename paramter should be a quoted string in case it has spaces in it. (e.g. filename="Foo \"baz\".pdf")
    • For filenames that are non-ascii, according to http://greenbytes.de/tech/tc2231/ you have to do something like Content-disposition: inline; filename*=UTF-8''%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0.pdf

@Bawolff Thanks a lot for the review! We have tackled all of your comments. See below.

Sorry for the delay in reviewing this. The extension looks good and code is easy to read.

Most of what I was checking for was XSS. The use of the OOUI methods made such issues much less likely, which is awesome to see :) Overall looks very good.

Non-security

Feel free to ignore anything in this section if you disagree

  • ElectronPdfServiceHooks::onSidebarBeforeOutput() - This returns false if it is not a view/purge action. Returning false prevents all other subscribers to this hook from running. I don't think that is wanted in this case.

Done in T149772.

  • In the switch statement in SpecialElectronPdf.php line 48 - the redirect case falls through to the default case. Well this doesn't overly matter since the page ends up a redirect, it could be a little confusing if someone was refactoring the code at a later date and didn't notice. I think the code would be more clear if that case also had a return.

Done in T149774.

  • Note there is a StreamFile class in core that you might be able to make use of here.

Done in T149775.

  • ext.ElectronPdfService.special.js - That should probably run in an onload event - the initial call to setSelected isn't going to do much if the js runs before the form is loaded (Javascript is often super heavily cached, so its entirely possible for JS to execute prior to most of the html page being loaded).

Done in T149776.

Minor

  • In ElectronPdfServiceHooks::generateSelectionScreenLink() - 'coll-download-url' => urlencode( $collectionUrl ) - This is double encoding the url.

Done in T149777.

  • Line 49 ElectronPdfServiceHooks - 'text' => $skin->msg( 'electronPdfService-sidebar-portlet-print-text' )->escaped(... (and similar lines)This is double escaping. BaseTemplate::makeLink will escape the text key before outputting.

Done in T149778.

  • It would be nice if writeToTempFile kept track of how much data it was writing, and had some sort of hard limit as a defense in depth mechanism in case there's ever a bug where the user can generate a gb big pdf file.

Done in T149780.

  • Its preferred to use TempFSFile::factory instead of tmpfile().

Done in T149779.

  • If the api key for the electron service is supposed to stay secret, it would be better to do a POST request with the key in the post body, as that way it is much less likely to end up in log files.

Tracked in T149781. But as outlined in T149781#2771609 I did not find a way to include the accessKey parameter in a POST request. But I could have missed something - waiting for a reply there.

  • As a precautionary measure, maybe call $this->getRequest()->checkUrlExtension(); when doing pdf download.

Done in T149782.

Normal

  • SpecialElectronPdf::sendPdfToOutput() - The content-disposition header should be sanitized.
    • Web browsers are supposed to only listen to the header if the suggested filename is sane. However as a precaution I think we should refuse to set the header to anything where the first character is a period or anything containing a /
    • I wonder if we should use $title->getPrefixedDBKey() instead of ->getPrefixedText() for the pdf name (Underscores seem more natural than spaces in a file name, to me). On the other hand, with modern GUI interfaces, spaces in file names are rather common, so maybe its fine to stick with the spaces
    • I believe the filename paramter should be a quoted string in case it has spaces in it. (e.g. filename="Foo \"baz\".pdf")
    • For filenames that are non-ascii, according to http://greenbytes.de/tech/tc2231/ you have to do something like Content-disposition: inline; filename*=UTF-8''%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0.pdf

Done in T149773.

@dpatrick @Bawolff did you get the chance already to have another look at the code and would it be ok from your side to close the ticket?

I havent yet, and we've had some very high priority things over the last couple days which takes precedence. However if youve done everything on the list than it should be fine to deploy

Addshore claimed this task.