Page MenuHomePhabricator

Phan-docker output (at least) triple-escapes HTML
Closed, ResolvedPublic

Description

See for instance https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/30971/console, where there are things like

<error line="115" severity="warning" message="Assigning array&amp;lt;string,array&amp;lt;string,\Flow\Model\UUID&amp;gt;&amp;gt; to property but \Flow\Model\UUID::$instances is \Flow\Model\UUID[][][]" source="PhanTypeMismatchProperty"/>

Entities are being double-escaped, which can result in poor readability.

Context:

docker run docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.15 -m checkstyle
phan/phan 1.3.4

+ exec /srv/phan/vendor/bin/phan -d . -m checkstyle
<?xml version="1.0" encoding="ISO-8859-15"?>
<checkstyle version="6.5">
  <file name="Hooks.php">
    <error line="97" severity="error" message="Call to method launchTourByCookie from undeclared class \GuidedTourLauncher" source="PhanUndeclaredClassMethod"/>
  </file>
  <file name="includes/Api/ApiFlow.php">
    <error line="98" severity="error" message="Call to undeclared method \ApiBase::needsPage" source="PhanUndeclaredMethod"/>
    <error line="99" severity="error" message="Call to undeclared method \ApiBase::setPage" source="PhanUndeclaredMethod"/>
  </file>
...

Event Timeline

The phan -m checkstyle options is to make it generate a checkstyle report to stdout which is a XML document. That is why some characters end up being escaped.

I think the intent was to have Jenkins to process the checkstyle report and generate a nice / human friendly version out of it. That is show down in the console log when the plugin tries to find a file log/phan-issues for the report:

00:01:34.272 [CHECKSTYLE] Searching for all files in /srv/jenkins-workspace/workspace/mwext-php70-phan-docker that match the pattern log/phan-issues
00:01:34.272 [CHECKSTYLE] No files found. Configuration error?

The file is not found!

The phan-issues log file comes from MediaWiki core wrapper tests/phan/bin/phan but it is only used for a few repositories (mediawiki/extensions/Wikibase is one of them). When Phan is recent enough (most case), we should be able to pass it a file to write to: -m checkstyle -o phan-issues. And the file should probably be more explicitly named.

@hashar Thanks for the explanation! Indeed, the file name is not that explicative... Anyway, a couple of remarks:
1 - The output is actually triple-escaped, since for e.g. < we have: < (as pure HTML) ==> &lt; (shown as <) ==> &amp;lt; (shown as &lt;) ==> &amp;amp;lt; (shown as &amp;lt;). Here every double arrow represents an escaping phase, and three of those are performed.
2 - I think this is a relatively recent regression. At the moment I'm unable to find an output for docker-phan which: 1-fails, 2-is old enough not to have this bug and 3-hasn't been garbage deleted, but I don't recall the output being so messed up.

Daimona renamed this task from Phan-docker output (at least) double-escapes HTML to Phan-docker output (at least) triple-escapes HTML.Apr 15 2020, 7:01 PM

When Phan is recent enough (most case), we should be able to pass it a file to write to: -m checkstyle -o phan-issues. And the file should probably be more explicitly named.

Is it recent enough now? I should also note that currently, the only reference to "phan-issues" on codesearch is in mediawiki-extensions.yaml.

Daimona assigned this task to hashar.

Fixed by T259890

We had Phan emit checkstyle to the Jenkins console and we replaced that by text output.