Page MenuHomePhabricator

Run etsy/phan from jenkins for MediaWiki core
Closed, ResolvedPublic

Description

Within CirrusSearch we are starting to use a static analyzer for php, phan. This package can analyze php 5 but is run under php 7 and requires the extra php-ast extension to be added to php7. We would like to set this up to run in jenkins once we've cleaned up the code sufficiently, but it will require these packages to be installed to the runners.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 13 2016, 9:57 PM

Are there debs available for php7 and php7-ast that we can use on trusty or jessie?

There are debs for trusty, but they are not official. For many years now ppa:ondrej/php has been the go-to ppa for more up to date php packages than are available directly. This contains packages for php7 which are co-installable with php5. The down side is it might also trigger an update of the php5 packages (I havn't checked). We could alternatively build packages using the same packaging scripts to push into the wikimedia apt repo. I'm not sure which is the better idea.

After looking into it more i can't suggest using that ppa. It provides a a series of packages prefixed php5.6 and forces you to upgrade to them, which would break the installation of our own php packages.

I'm not sure how to check, but are the job runners on trusty or jessie? I'm going to guess from greg's comment they are on jessie :) Will spin up a VM with jessie and see how dotdeb plays with our own php packages.

Both trusty and jessie, we can utilize whatever works.

Had a chance to try out dotdeb's php on a jessie instance in labs. The php7.0-cli package does not appear to conflict in any way with the php5-cli packages that are already installed.

The dotdeb repository is missing the php-ast extension, so that would have to be either compiled onto the instances via puppet (doubtful) or packaged up internally?

Also i was going to see if dotdeb was interested in adding php-ast to their repo, but he already has a page (perhaps outdated) about it: https://www.dotdeb.org/2008/09/25/how-to-package-php-extensions-by-yourself/

EBernhardson added a comment.EditedMay 25 2016, 9:53 PM

After playing around with dotdeb on labs, it's possible but there are potential inconsistencies. After looking around more i like our most elegant solution might be to use docker containers, ala https://github.com/cloudflare/docker-phan

On a trusty instance in labs the basic process is (would need to be puppetized obviously):

sudo apt-get install docker-io
git clone https://github.com/cloudflare/docker-phan /srv/docker-phan
sudo adduser <username> docker
sudo -u <username> /src/docker-phan/build

This is very simple and causes no issues with interaction between php7 and the existing php installation, but does require that docker is installed. Once the image has been built (takes 5 to 10 minutes, should be built once per runner). This docker container is setup to have access to the external . After that phan can be run something like:

docker run -v /path/to/mediawiki/source:/mnt/src --rm cloudflare/phan:latest <phan arguments>

the specific arguments to phan might take some consideration. Within CirrusSearch we have been using a shell script to collect the list of files for phan to read. Simplest might be for extensions to implement some script where the command to run phan can be passed in as an argument, and then it runs that command with the appropriate arguments appended?

We have PHP7 now, the external repo we're using doesn't have php-ast yet, so we need to ask them to include it, or package/build it ourselves. I'll try the former.

hashar triaged this task as Low priority.Sep 26 2016, 11:11 AM
hashar moved this task from Untriaged to Backlog on the Continuous-Integration-Infrastructure board.

I did some poking around, ubuntu (16.04LTS) and debian (stretch, to be released next year) both have native php-ast packages. ondrej's ppa has php-ast packages for trusty and xenial. Sadly the DPA (for debian) does not yet contain php-ast. I've filed https://github.com/oerdnj/deb.sury.org/issues/470 to request debian packages be uploaded.

issue addressed, there are now php-ast packages in the DPA for jessie.

Change 315711 had a related patch set uploaded (by Legoktm):
contint: Install php7.0-ast for phan

https://gerrit.wikimedia.org/r/315711

Change 315711 merged by Gehel:
contint: Install php7.0-ast for phan

https://gerrit.wikimedia.org/r/315711

Change 325873 had a related patch set uploaded (by Legoktm):
Add experimental 'mediawiki-core-php70-phan' job

https://gerrit.wikimedia.org/r/325873

Change 325873 merged by jenkins-bot:
Add experimental 'mediawiki-core-php70-phan' job

https://gerrit.wikimedia.org/r/325873

In a few minutes, the phan job will be set up non-voting against all mw/core patches. Here are the next steps (as I see it) to making it voting:

  • Get puppet patch merged https://gerrit.wikimedia.org/r/#/c/325877/
  • Move phan job to nodepool jessie
  • Document on wiki how to run phan locally (clone it somewhere and add PHAN env variable)
  • announce to wikitech-l

Steps after that:

  • Move the version number out of CI config and into mw/core git repo
  • Move CirrusSearch's phan stuff over to jenkins, figure out and document how extensions can use it to
  • Does phan support checkstyle.xml or similar? If not, we should file upstream tasks asking for that.
  • ...profit!
Legoktm renamed this task from Install php7 and the php-ast extension so etsy/phan can be run from jenkins to Run etsy/phan from jenkins for MediaWiki core.Dec 8 2016, 9:58 AM

Puppet match merged.

Moving the version out of CI into mw/core might be difficult. Currently with composer if we were to add phan as a dev requirement composer can then only be run with php >= 7.0. There might be workarounds for that though.

I created a ticket (https://github.com/etsy/phan/issues/436) for phan to release a new version. 0.6.0 was cut in july and isn't capable of analyzing mediawiki core. Will see if they are willing to make an 0.7.0 release soon-ish.

I wrote a first draft of documentation, please adjust where it makes sense: https://www.mediawiki.org/wiki/Manual:Static_Analysis

I also copied over a script we are using in CirrusSearch to add per-line @suppress annotation support. This is in https://gerrit.wikimedia.org/r/325963

Quick response. morria over at phan cut an 0.7.0 release today. https://github.com/etsy/phan/issues/436

Moving the version out of CI into mw/core might be difficult. Currently with composer if we were to add phan as a dev requirement composer can then only be run with php >= 7.0. There might be workarounds for that though.

Yeah, I tried that too. We could do something hacky like...

km@km-tp ~/p/v/mediawiki> git diff
diff --git a/composer.json b/composer.json
index f57b9ce..62edb0b 100644
--- a/composer.json
+++ b/composer.json
@@ -94,6 +94,7 @@
                                "composer.local.json"
                        ],
                        "merge-dev": false
-               }
+               },
+               "phan-version": "0.7.0"
        }
 }
km@km-tp ~/p/v/mediawiki> cat composer.json | jq '.extra."phan-version"'
"0.7.0"

Mostly I'm concerned about the problem that occurs once REL1_29 is cut, if we upgrade the version for master, it might make the REL1_29 job fail. Putting the version in the repository avoids that.

nodepool doesn't have enough memory to run phan, so I'm moving it back to permanent slaves: https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan-jessie/2/console

hashar added a subscriber: hashar.Dec 8 2016, 9:09 PM

The Jessie permanent slaves have a single executor, so the high memory consumption will have no side effect.

I looked at the job that runs on permanent slaves https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan/ and tried to correlate its build with the Grafana memory spikes. Looks like the job goes to 1.5GBytes at least.

The Nodepool instances are m1.medium with 4GBytes of RAM, the same amount of memory available on permanent slaves. But the build chokes at ~ 1.2G?!!

+ ./tests/phan/bin/phan
mmap() failed: [12] Cannot allocate memory
mmap() failed: [12] Cannot allocate memory
mmap() failed: [12] Cannot allocate memory
mmap() failed: [12] Cannot allocate memory
mmap() failed: [12] Cannot allocate memory
PHP Fatal error:  Out of memory (allocated 1272971264) (tried to allocate 262144 bytes)
in /home/jenkins/workspace/mediawiki-core-php70-phan-jessie/ \
   src/vendor/etsy/phan/src/Phan/AST/Visitor/KindVisitorImplementation.php on line 22

I am surprised it fails at such a low level. I have looked at php7.0 -i and the memory_limit is -1. That is going to be not so fun to debug that and find the root cause :(

hashar added a comment.Dec 8 2016, 9:42 PM

Found the issue: E_TOO_MANY_CONCURRENT_FORKS

I have booted a Jessie instance in the Nodepool pool. Did the setup the same way the Jenkins jobs does (zuul-cloner, composer update...)

Phan bails out during the analyze phase and shows 1.2GB memory used.

$ PHAN=vendor/bin/phan ./tests/phan/bin/phan
   analyze ███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   5% 1185MB/1186MB

phan is run with 4 parallel job (-j 4) which does not fit in memory.


With j -1:

$ time PHAN=vendor/bin/phan ./tests/phan/bin/phan
   analyze ████████████████████████████████████████████████████████████ 100% 1416MB/1419MB
real    1m40.170s
user    1m33.876s
sys     0m3.072s

With j -2:

$ time PHAN=vendor/bin/phan ./tests/phan/bin/phan
   analyze ███████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  52% 1311MB/1314MB
real    1m5.599s
user    1m31.816s
sys     0m3.772s

With j -3

$ time PHAN=vendor/bin/phan ./tests/phan/bin/phan
   analyze ████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  33% 1276MB/1277MB
real    1m9.028s
user    1m34.156s
sys     0m4.420s

As a summary:

JobsElapsed (real)User CPU
11m40.170s1m33.876s
21m5.599s1m31.816s
31m9.028s1m34.156s

Looks like there is some benefit to run two tasks in parallel over a single one. Three tasks (and more) does not seem to yield any benefit. May I suggest we drop the parallel jobs from 4 to 2 ?

hashar added a comment.Dec 8 2016, 9:43 PM

Forgot, I got the progress/mem by passing to phan: --progress-bar

Change 326043 had a related patch set uploaded (by EBernhardson):
Use 1 processes instead of 4 for phan

https://gerrit.wikimedia.org/r/326043

The usage of 1GB per core is pretty excessive. I've uploaded a patch to change our default configuration to use 1 process, and added a way to pass through options from the commandline into phan so users can tune the number of processes for their situation with phans -j flag.

EBernhardson added a comment.EditedDec 8 2016, 10:25 PM

There was an earlier question about checkstyle, it looks like phan supports the following modes: 'text', 'json', 'csv', 'codeclimate', 'checkstyle', or 'pylint'

We can swap it around, the main thing we would need to adjust is I will have to update https://gerrit.wikimedia.org/r/#/c/325963/ to support per-line suppression for formats other than text. This probably isn't particularly hard, if anything it might be easier with structured input.

Example checkstyle output:

<?xml version="1.0" encoding="ISO-8859-15"?>
<checkstyle version="6.5">
  <file name="includes/libs/Xhprof.php">
    <error line="56" severity="error" message="Call to undeclared function \tideways_enable()" source="PhanUndeclaredFunction"/>
    <error line="74" severity="error" message="Call to undeclared function \tideways_disable()" source="PhanUndeclaredFunction"/>
  </file>
</checkstyle>

Okay awesome! I'll look into hooking it jenkins with checkstyle and moving it back to nodepool.

Change 326043 merged by jenkins-bot:
Use 1 processes instead of 4 for phan

https://gerrit.wikimedia.org/r/326043

https://gerrit.wikimedia.org/r/#/c/326043/ now let us pass extra arguments to phan. For the Jenkins job we can thus:

Pass j -2 which is known to work within a 4GB Nodepool instance and speed up the build.

Generate a checkstyle report via --output-mode checkstyle -o $WORKSPACE/log/checkstyle.xml. We can then get the Jenkins job to analyze the report and build some kind of report in the Jenkins web interface. For JJB add a publisher such as the one we use for MediaWiki phpcs job:

publishers:
 - checkstyle:
    pattern: 'log/checkstyle.xml'
    can-run-on-failed: true
    thresholds:
      failed:
        total-all: 1

Though that is probably already the case. Zuul has a tweak to change the URL reported back to Gerrit:

failure-pattern: 'https://integration.wikimedia.org/ci/job/{job.name}/{build.number}/checkstyleResult/'

\O/

Yep, sorry I set up the checkstyle stuff last night, and forgot to comment back on the bug :)

I tried adding a progress bar except it gets buffered until phan is finished so then it just outputs a wall of black, so it's not useful for jenkins.

Change 326877 had a related patch set uploaded (by Legoktm):
Make mediawiki-core-php70-phan-jessie voting

https://gerrit.wikimedia.org/r/326877

Change 326877 merged by jenkins-bot:
Make mediawiki-core-php70-phan-jessie voting

https://gerrit.wikimedia.org/r/326877

Change 326880 had a related patch set uploaded (by Legoktm):
Actually add 'mediawiki-core-php70-phan-jessie' to gate...

https://gerrit.wikimedia.org/r/326880

Change 326880 merged by jenkins-bot:
Actually add 'mediawiki-core-php70-phan-jessie' to gate...

https://gerrit.wikimedia.org/r/326880

Legoktm closed this task as Resolved.Dec 13 2016, 2:27 AM
Legoktm claimed this task.
greg awarded a token.Dec 13 2016, 5:52 PM