Page MenuHomePhabricator

Security Review For Parsoid-PHP
Open, LowPublic

Description

Project Information

Description of the tool/project

I will just point to https://www.mediawiki.org/wiki/Parsoid for now. I can follow up with any additional info as required.
But, specifically, the context for this is that we are porting Parsoid to PHP which will be integrated into core as a composer library and will run in-process on REST API (see T221738 ) requests made to MediaWiki. We want to deploy this in July/August.

Parsoid/JS (currently deployed on the Wikimedia cluster) is not exposed directly to the internet. All requests to it go through RESTBase exposed REST API for wikis (Ex: https://en.wikipedia.org/api/rest_v1/ ). But, with the Parsoid/PHP offering which will be integrated into core, we can similarly deploy to an internal cluster that is not directly accessible on the internet and disable it on the app cluster and elsewhere where the MediaWiki API is exposed to the internet.

Description of how the tool will be used at WMF

This will replace Parsoid/JS which is used by VisualEditor, Flow, ContentTranslation, Android App, Kiwix. Eventually, this will serve all page view and edit views.

Dependencies

List dependencies, or upstream projects that this project relies on.
The composer.json file in the repository is the authoritative source for this.

  • In production mode, all except two libraries are used by mediawiki core. The two new libraries are wikipeg and zest. wikipeg is a wikimedia fork of the pegjs project zest is a wikimedia port of zest.js repository. Both these libraries are developed / maintained by Wikimedia engineers.
  • In developer mode, all except one library are used by mediawiki core. The exception is alea which is a wikimedia port of alea.js. This port is by Wikimedia engineers and is also maintained by us.

Has this project been reviewed before?

Not that I know of

Working test environment

Please link or describe setup process for setting up a test environment.
Parser Tests
php bin/parserTests.php runs tests in all but the selser mode across all test files (~6K tests and finishes in ~55 secs on my laptop).
php bin/parserTests.php --wt2html --wt2wt --html2wt --html2html --selser auto runs tests in the specified modes across all test files (~26K tests and takes ~5m on my laptop)
Add the --quiet option to suppress verbose output.

All parser tests should be green.

PHP Unit Tests:

composer test will run phpcs, phpunit, and phan jobs.

Running in integrated / production mode
For both of these above scenarios (parser tests and unit tests), you don't need a working MediaWiki installation. You can run those standalone simply by checking out the Parosid repository and running those commands after installing the vendor modules.

But, to test and review the Parsoid REST API, you will need to run in MW-integration mode. https://github.com/wikimedia/parsoid/tree/T227209/extension has the instructions for running Parsoid in integrated mode.

Parsoid/JS API is documented on wiki. Parsoid/PHP supports all of those endpoints (there are still test failures on the API tests and Arlo is working through those).

Parsoid/PHP in integrated mode also exists on scandium. Assuming you have access to the server, you can hit the Parsoid API by logging to that server or via an ssh tunnel. We'll send the curl commands to access those endpoints if you need them.

Running Parsoid/PHP against an external wiki
From a security review point of view, Parsoid when run in this mode is simply yet another MediaWiki API client. I am including it here simply for completeness' sake.

For development and debugging purposes, Parsoid also supports accessing an external MediaWiki installation via its action API. Parsoid/JS doesn't require that external wiki to have installed the ParsoidBatchAPI extension, but Parsoid/PHP depends on that currently.

The bin/parse.js script lets you point Parsoid to an arbitrary mediawiki installation. bin/parse.php also supports this. bin/parse.php --help should be informative if you wish to run this.

Post-deployment

Name of team responsible for tool/project after deployment and primary contact.

Event Timeline

ssastry created this task.Jul 3 2019, 4:33 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2019, 4:33 PM
sbassett triaged this task as Low priority.Jul 5 2019, 2:21 PM
sbassett renamed this task from {WIP] Security Review For Parsoid-PHP to [WIP] Security Review For Parsoid-PHP.Jul 9 2019, 8:59 PM
sbassett claimed this task.Jul 30 2019, 5:00 PM
sbassett moved this task from Backlog to In Progress on the Security-Team-Reviews board.
sbassett added subscribers: Reedy, mmarble, JBennett.
sbassett added a project: Restricted Project.Jul 31 2019, 3:49 PM
sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
ssastry updated the task description. (Show Details)Aug 5 2019, 3:23 PM

@sbassett, @Reedy, I had said I would cut a security branch end of day today for you to start security review. But, I am going to do it Wednesday morning from the hackathon to give us one more day on Tuesday to round out and fix a few more things.

@ssastry - no problem, thanks for the update.

ssastry renamed this task from [WIP] Security Review For Parsoid-PHP to Security Review For Parsoid-PHP.Aug 19 2019, 10:29 AM
ssastry updated the task description. (Show Details)

@sbassett, @Reedy I overestimated my abilities to be able to get this done during Wikimania.:-) But, we fixed a number of bugs in Parsoid in the interim (including a few crashers) so that is a good thing.

As in the description, I have cut a T227209 branch of the Parsoid repository. This version is 99.4% green on parser tests and has also had some initial performance tweaking done as well. We will continue correctness and performance work on the master branch and at different points between now and deployment, we can provide you with an aggregated diff of changes relative to this branch as necessary.

Please contact @Arlolra if you need any additional clarifications between now and Aug 28th. I will be checking email sporadically between now and then, but otherwise will be away.

@Arlolra, could you please go over the description and update / correct anything in there that needs updating / clarification? Thanks!

ssastry updated the task description. (Show Details)Aug 19 2019, 10:34 AM

For development and debugging purposes, Parsoid also supports accessing an external MediaWiki installation via its action API. Parsoid/JS doesn't require that external wiki to have installed the ParsoidBatchAPI extension, but Parsoid/PHP depends on that currently (* TODO: Verify *).

Yeah, it currently depends on the BatchAPI to populate media info,
https://github.com/wikimedia/parsoid/blob/master/src/Config/Api/DataAccess.php#L155

Arlolra updated the task description. (Show Details)Aug 19 2019, 8:45 PM
Jcross added a subscriber: Jcross.Aug 20 2019, 5:35 PM

@Reedy - could use your eyes on this. Thank you!

@Reedy - could use your eyes on this. Thank you!

@Reedy Maybe ping @Arlolra when you are about to begin so that he could update the branch with the latest changes on master.

Hey @ssastry - thanks for cutting the security review branch. @Reedy and I will plan to review that soon and reach out to @Arlolra with any questions.

sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Thu, Aug 22, 2:41 PM

Update: @Reedy and I are having a look at this and T230140 and should have some reports soon. We're going to timebox to maybe two weeks or so - hopefully that doesn't push back too much on any targeted deployment dates.

Update: @Reedy and I are having a look at this and T230140 and should have some reports soon. We're going to timebox to maybe two weeks or so - hopefully that doesn't push back too much on any targeted deployment dates.

Thanks! That sounds good. Should I rebase the security branch with latest changes to master?

Thanks! That sounds good. Should I rebase the security branch with latest changes to master?

@ssastry - Sure, that'd be great.

Should I rebase the security branch with latest changes to master?

Done, https://phabricator.wikimedia.org/diffusion/GPAR/history/T227209/

sbassett updated the task description. (Show Details)Fri, Sep 20, 9:19 PM
ssastry updated the task description. (Show Details)Fri, Sep 20, 9:22 PM

@ssastry @Arlolra - I've been looking at the T227209 branch of parsoid this week. I wanted to list some assumptions and questions here in an attempt to constrain the scope of this review as much as possible while still ensuring I'm covering all relevant pieces.

  1. Unless I'm missing something, everything within /guides seems relevant to the older Node-based parsoid service. I assume these will be updated or removed, eventually?
  2. Assuming the above is correct, I think the best/easiest way for me to actually security-test Parsoid/PHP is with bin/parse.php, as described within the Working test environment section above. I'd also be interested in testing within MW-integration mode, though if this would be fairly redundant (and seems a temporary testing solution anyways), I wouldn't want to waste time testing against it as well. I suppose I'm most interested in the best/easiest way to test Parsoid/PHP's inherent functionality. If that's not accomplished via bin/parse.php, or that simply is not representative enough of how Parsoid/PHP will be used within the context of Wikimedia projects, please let me know.
  3. I'm not planning to analyze any PHP or JS dependencies from composer or npm. A quick composer security:check and npm audit show there aren't any reported vulnerable packages anyways. An npm outdated shows 29 packages, but these do not appear to have corresponding security issues. I'm also curious how relevant the dependencies within package.json/npm-shrinkwrap.json are for Parsoid/PHP - can you shed some light on that?
  4. I'm guessing that I can ignore most things in /lib, /tools and /extension (as /extension is a temporary solution for testing) - please let me know if this is incorrect. While the tests are interesting, at around ~26k total tests, it's kind of an oppressive volume to do anything with in the context of a security review, though I have given them a very high-level overview. Anyhow, I've mainly been reviewing the 195-ish php files within /src as that seems to be where the bulk of Parsoid/PHP's functionality lies.

If the above sounds good and there are no lingering issues or concerns, I should be able to have some meaningful results later next week.