Page MenuHomePhabricator

Create a web demo for phan taint-check
Closed, ResolvedPublic

Description

Perhaps https://phan.github.io/demo/ can be used for a start. The demo should do the exact same things, with two differences:

  • it should load taint-check
  • it should hide all non-taint-check issues.

Event Timeline

Tried this on toolforge. It's not straightforward, because:

  • We need to import a few dependencies (most notably ace editor) due to CSP
  • We need to fiddle with the demo source to make it work (see https://github.com/phan/demo/issues/11)
  • Most importantly, the demo is based on Emscripten, which is not installed there.
Daimona claimed this task.

https://taintcheck.toolforge.org/

Served you are, my young padawan.

https://taintcheck.toolforge.org/

Served you are, my young padawan.

For future reference, this was achieved by:

  • Forking https://phan.github.io/demo/
  • Applying some changes tracked on the fork repo on github:
    • Cosmetic changes (avoid external images, mention that it's a fork, mention taint-check, use a different example, etc.)
    • Backend changes (download taint-check as well, embed it in emscripten, etc.)
  • My fork on github is always up-to-date with the tool on toolforge
  • You can setup everything by running build.sh
    • On toolforge this took 74 minutes, so running it often is not exactly fun
  • I had to implant some horrible hacks in emscripten because it requires Python 3.6+ (toolforge has 3.5.3 and I didn't want to build python from source)
    • These hacks are mentioned in a MEMO file in the tool's $HOME

Change 643548 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[integration/config@master] Create image for building phan-taint-check-plugin demo

https://gerrit.wikimedia.org/r/643548

Change 643549 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[integration/config@master] Run new mw-tools-phan-demos-publish job postmerge

https://gerrit.wikimedia.org/r/643549

Change 643548 merged by jenkins-bot:
[integration/config@master] Create image for building phan-taint-check-plugin demo

https://gerrit.wikimedia.org/r/643548

Change 643549 merged by jenkins-bot:
[integration/config@master] Run new mw-tools-phan-demos-publish job postmerge

https://gerrit.wikimedia.org/r/643549

So... https://doc.wikimedia.org/mediawiki-tools-phan-SecurityCheckPlugin/master/demos/ exists! It'll get updated whenever a new commit is merged in phan-taint-check-plugin (takes ~15 min but we can speed it up a lot).

Except it seems to be getting blocked by CSP: Content Security Policy: The page’s settings blocked the loading of a resource at blob:https://doc.wikimedia.org/f8ce60d3-f6e6-4f98-b42b-ab1c401856fc (“script-src”).

Do we need to adjust a CSP rule for this?

Notes on how to speed up the build:

  • Keep the tarballs in $XDG_CACHE_HOME so we don't need to redownload them every time.
  • The emscripten build of PHP7 also seems safe to cache (invalidate on the PHP version or emscripten version changing?), which would probably be the biggest speedup if getting the invalidation right isn't too hard.

It would be nice to move this into the repo itself since it seems a bit tightly integrated with the manual file list. I *think* we could get away with only keeping copies of our files and then relying on the main phan/demo repo for the C and other stuff?

So... https://doc.wikimedia.org/mediawiki-tools-phan-SecurityCheckPlugin/master/demos/ exists! It'll get updated whenever a new commit is merged in phan-taint-check-plugin (takes ~15 min but we can speed it up a lot).

Except it seems to be getting blocked by CSP: Content Security Policy: The page’s settings blocked the loading of a resource at blob:https://doc.wikimedia.org/f8ce60d3-f6e6-4f98-b42b-ab1c401856fc (“script-src”).

Do we need to adjust a CSP rule for this?

So this would need to add blob: to script-src (blob: is not included by default with 'self'). You probably also need 'wasm-eval' as well.

Krinkle reopened this task as Open.EditedNov 26 2020, 3:31 AM
Krinkle added a subscriber: Krinkle.

Re-opening to track the above. Also, once done, might be good to add an entry to the https://doc.wikimedia.org/ index for easier (re)discovery of the exact URL.

So... https://doc.wikimedia.org/mediawiki-tools-phan-SecurityCheckPlugin/master/demos/ exists! It'll get updated whenever a new commit is merged in phan-taint-check-plugin (takes ~15 min but we can speed it up a lot).

Amazing, thank you!

Also, once done, might be good to add an entry to the https://doc.wikimedia.org/ index for easier (re)discovery of the exact URL.

+1

Aklapper renamed this task from Create a web demo to Create a web demo for phan taint-check.Dec 7 2020, 3:38 PM

So....I don't think what we're doing here is legal unfortunately. The PHP license is incompatible with the GPL. Normally this isn't a problem because it doesn't affect code that PHP executes, but here we've compiled GPL code with PHP code and distributing it as one wasm binary.

Can we have them as separate wasm binaries so they aren't being distributed together? That would actually be a pretty big win technically because then we're not recompiling PHP on every patch too.

Or if that's not possible and we still want a web demo, we'd need to relicense to Apache 2 or MIT.

So....I don't think what we're doing here is legal unfortunately.

Oh, that's a bummer.

but here we've compiled GPL code with PHP code and distributing it as one wasm binary.

I might be missing something but what code is licensed under the GPL here?

Can we have them as separate wasm binaries so they aren't being distributed together? That would actually be a pretty big win technically because then we're not recompiling PHP on every patch too.

Or if that's not possible and we still want a web demo, we'd need to relicense to Apache 2 or MIT.

I've killed the webservice for the time being.

but here we've compiled GPL code with PHP code and distributing it as one wasm binary.

I might be missing something but what code is licensed under the GPL here?

the taint-check-plugin, see https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/phan/SecurityCheckPlugin/+/refs/heads/master/COPYING

but here we've compiled GPL code with PHP code and distributing it as one wasm binary.

I might be missing something but what code is licensed under the GPL here?

the taint-check-plugin, see https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/phan/SecurityCheckPlugin/+/refs/heads/master/COPYING

Yuck, I thought I was missing something. Relicensing *might* be an option if everything else fails, but let's think about separating the binaries first. As you said, that would be hugely helpful. I'm not at all expert with emscripten though, so not sure about the feasibility.

@Legoktm I've just tried packaging php and taint-check separately. It should be doable if we:

  • Remove taint-check from the emcc invocation
  • Build PHP with -s FORCE_FILESYSTEM=1
  • Run file_packager, passing it the path to taint-check

This will generate a .data and a .js file for taint-check, which should be included in the web page before the main PHP file (docs).

I've just tried this on toolforge. After the usual hour it took to build PHP it errored out due to a stupid syntax error that I left in the build script. So I had to run the final commands manually. In the end the PHP part works, but taint-check doesn't because it cannot be found in the virtual FS. I'm currently trying to understand why.

Would you have some spare time to help me get this to doc.wm.org once I get it to work?

but taint-check doesn't because it cannot be found in the virtual FS. I'm currently trying to understand why.

So apparently the JS is not being injected correctly. What emcc does with --preload-file is packaging the file (which we're doing manually) and adding it to --pre-js. Loading the file separately as I tried earlier will not work (perhaps when php.js is loaded it erases everything created by taint-check). But we can still use --pre-js to make it work! This way we end up with:

  • taint-check.js and taint-check.data that contain taint-check
  • php.js, php.data, php.wasm which only contain PHP and phan; if --pre-js is used when generating these, it basically dumps the content of taint-check.js inside php.js. However, taint-check.js is basically just wiring code for taint-check.data, which is the actual executable.

Would this work? Pushing a PR shortly.

Done. There are also a couple of FIXMEs that I'd like to fix shortly, and I should also rebase my fork to upstream (but index.html was removed so it's not straightforward).

Change 644367 had a related patch set uploaded (by Daimona Eaytoy; author: Legoktm):

[integration/config@master] docker: Switch back to Daimona's repo for phan-taint-check demo

https://gerrit.wikimedia.org/r/644367

Change 680337 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[operations/puppet@production] Relax CSP rules for doc.wm.org/mediawiki-tools-phan-SecurityCheckPlugin

https://gerrit.wikimedia.org/r/680337

Change 680403 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Update demo link

https://gerrit.wikimedia.org/r/680403

Change 681328 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/tools/phan/SecurityCheckPlugin@master] Add scripts for generating a phar archive

https://gerrit.wikimedia.org/r/681328

Change 681328 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Add scripts for generating a phar archive

https://gerrit.wikimedia.org/r/681328

Change 644367 merged by jenkins-bot:

[integration/config@master] docker: Switch back to Daimona's repo for phan-taint-check demo

https://gerrit.wikimedia.org/r/644367

Mentioned in SAL (#wikimedia-releng) [2021-04-30T19:21:54Z] <James_F> Docker: Publishing mediawiki-phan-taint-check-demo:0.1.1 for T257301

Change 684019 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Require composer-scratch

https://gerrit.wikimedia.org/r/684019

Change 684019 merged by jenkins-bot:

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Provide composer

https://gerrit.wikimedia.org/r/684019

Change 693969 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] jjb: [mw-tools-phan-demos-publish] Switch to image with composer

https://gerrit.wikimedia.org/r/693969

Change 693974 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Add ext-dom/ext-intl/ext-mbstring

https://gerrit.wikimedia.org/r/693974

Change 693974 merged by jenkins-bot:

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Add ext-dom/ext-intl/ext-mbstring

https://gerrit.wikimedia.org/r/693974

Change 693990 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Install ext-zip and unzip

https://gerrit.wikimedia.org/r/693990

Change 693990 merged by jenkins-bot:

[integration/config@master] dockerfiles: [mediawiki-phan-taint-check-demo] Install ext-zip and unzip

https://gerrit.wikimedia.org/r/693990

Change 693969 merged by jenkins-bot:

[integration/config@master] jjb: [mw-tools-phan-demos-publish] Switch to image with composer and PHP exts

https://gerrit.wikimedia.org/r/693969

OK, this now fails with:

00:07:57.746 emcc:WARNING: --llvm-lto ignored when using llvm backend
00:07:57.870 emcc: warning: EXTRA_EXPORTED_RUNTIME_METHODS is deprecated, please use EXPORTED_RUNTIME_METHODS instead [-Wdeprecated]
00:07:58.004 warning: undefined symbol: getdtablesize (referenced by top-level compiled C/C++ code)
00:07:59.271 warning: undefined symbol: popen (referenced by top-level compiled C/C++ code)
00:07:59.271 + cp out/php.wasm out/php.js out/php.data out/taint-check.js out/taint-check.data ..
00:08:35.956 cp: cannot stat 'out/php.data': No such file or directory
00:08:35.969 Build step 'Execute shell' marked build as failure

OK, this now fails with:

00:07:57.746 emcc:WARNING: --llvm-lto ignored when using llvm backend

This one also happens on toolforge, but seems not to affect the final result (the build script should be updated though).

00:07:57.870 emcc: warning: EXTRA_EXPORTED_RUNTIME_METHODS is deprecated, please use EXPORTED_RUNTIME_METHODS instead [-Wdeprecated]

I think I've never seen this one on toolforge, but this also seems a deprecation warning with no actual effect

00:07:58.004 warning: undefined symbol: getdtablesize (referenced by top-level compiled C/C++ code)
00:07:59.271 warning: undefined symbol: popen (referenced by top-level compiled C/C++ code)

Both seen on toolforge, I haven't investigated why they happen, but again, this shouldn't have any effect; especially because we pass ERROR_ON_UNDEFINED_SYMBOLS=0. (It might be related to the restrictions of running PHP from a web browser).

00:07:59.271 + cp out/php.wasm out/php.js out/php.data out/taint-check.js out/taint-check.data ..
00:08:35.956 cp: cannot stat 'out/php.data': No such file or directory

This is the actual issue. Yet I don't know why we're facing it. None of the issues above should abort the process. Let me check again.

So... It took me quite a while to run everything on toolforge; first of all, it was using emscripten 2.0.8, so I had to upgrade to 2.0.20. Which also required building python from source, since it requires python 3.6+ and monkey-patching it is becoming too hard (and toolforge has 3.5, T268438). I then re-ran the build script and got the same result as CI. Apparently, php.data stopped being created somewhere between emscripten 2.0.8 and 2.0.20. However, the demo still works without it, so I've updated the demo source to copy the file only if it exists (to support older emscripten).

I've just merged r694473 to run the postmerge job, and now the demo works like a charm!

There's still the CSP thing to fix (r680337), but AFAICS that only affects syntax checking for the ace editor. As such, we can already update the demo link IMHO.

@Legoktm I've just tried packaging php and taint-check separately. It should be doable if we:

  • Remove taint-check from the emcc invocation
  • Build PHP with -s FORCE_FILESYSTEM=1
  • Run file_packager, passing it the path to taint-check

This will generate a .data and a .js file for taint-check, which should be included in the web page before the main PHP file (docs).

I've just tried this on toolforge. After the usual hour it took to build PHP it errored out due to a stupid syntax error that I left in the build script. So I had to run the final commands manually. In the end the PHP part works, but taint-check doesn't because it cannot be found in the virtual FS. I'm currently trying to understand why.

Would you have some spare time to help me get this to doc.wm.org once I get it to work?

Sorry missed this earlier, but yes. What still needs doing? I'll merge the CSP thing shortly.

Sorry missed this earlier, but yes. What still needs doing? I'll merge the CSP thing shortly.

We're pretty much done! The demo on doc.wm.org works perfectly now, the CSP thing is only preventing live syntax check (which is not so important). Aside from that, the only remaining thing is to update the demo link in the readme, i.e. r680403.

Change 680403 merged by jenkins-bot:

[mediawiki/tools/phan/SecurityCheckPlugin@master] Update demo link

https://gerrit.wikimedia.org/r/680403

Daimona raised the priority of this task from Low to Medium.Oct 15 2022, 5:22 PM

Change 680337 merged by Dzahn:

[operations/puppet@production] doc: Relax CSP rules for taint-check-demo

https://gerrit.wikimedia.org/r/680337