Add lint/CI to all wikimedia/discovery analytics repositories
Open, NormalPublic10 Story Points

Description

We have multiple repositories in Gerrit under wikimedia/discovery which seem to be active but they lack CI configuration. Would be rather nice to have tests to automatically run for each of them.

RepoNotes
dashboardWill soon be retired in favor of managing via Puppet (T161354)
- princeR/Shiny-based dashboard for Wikipedia.org portal metrics (eligible for lint)
- rainbowR/Shiny-based dashboard for search team metrics (eligible for lint)
- twilightsparqlR/Shiny-based dashboard for WDQS traffic & usage (eligible for lint)
- wetzelR/Shiny-based dashboard for Maps usage (eligible for lint)
- wonderboltR/Shiny-based dashboard for externally referred traffic breakdown (eligible for lint)
RepoNotes
experimentalhttps://discovery-experimental.wmflabs.org/, submodules on Github (bearloga/wmf-delphi and chelsyx/wmf-poultry); may be retired in favor of Puppet
goldenmetric-retrieving codebase (SQL/Hive queries & R scripts) using Reportupdater (T150915); can't do tests because it'd require access to private data via stat1002, but eligible for lint checking
ortizR package with unit tests (via testthat); eligible for CI
polloiR package of common functions used by Discovery Dashboards; eligible for lint
wmfR package with some unit tests (via testthat); eligible for lint and maybe some CI
hashar created this task.Dec 21 2016, 12:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 21 2016, 12:49 PM
hashar updated the task description. (Show Details)Dec 21 2016, 1:07 PM

Updated the task description to better describe each of the repositories. Most are R applications without any test so I am not sure what CI can offer there. Maybe there is some R syntax check/linter we could run.

oritz and wmf have tests using testthat. If we want to run tests in CI, I guess we will need R and whatever package manager exists to magically download/install the dependencies.

discernatron has tests composer fails installing dependencies ( filled as T153859 ).

Quite a few of these are related to analysis, @mpopov and @chelsyx should comment on those.

The dashboard repositories are R/Shiny applications (e.g. http://discovery.wmflabs.org/metrics/) that we can't apply CI to.

wikimedia/discovery/golden is a set of data-retrieving scripts that we're currently migrating to use Analytics' Reportupdate framework (T150915) which we can't apply CI-for-software to, although CI-but-for-data is something we plan to investigate in T145445

wikimedia/discovery/polloi is an R package of miscellaneous functions shared by golden and the dashboards repos, primarily for returning UI elements. I'll think about what unit tests potentially make sense there.

wikimedia/discovery/dashboard and wikimedia/discovery/experimental repos are Vagrant configurations for our dashboards hosted on Labs. I've talked to Gehel and we might migrate them to use proper Puppet configs, but that is currently super low priority.

wikimedia/discovery/ortiz and wikimedia/discovery/wmf are R packages that have some unit tests that are run when we build the packages locally (which we do before submitting patches for review). I know that Travis can do package build checks but I'd have to figure out Jenkins R Plugin, which I don't even know if Ops would be OK with me installing it via the Jenkins dashboard.

There's not much for Discovery-Analysis to do on this ticket, except getting R package CI up and running with Jenkins, but I'd need help there because I'm not dev-opsy enough to pioneer that effort by myself.

debt closed this task as Declined.Feb 14 2017, 9:21 PM
debt added a subscriber: debt.

Declining this ticket as there isn't much we can do with it in Discovery - based on the notes written here.

greg added a subscriber: greg.Feb 14 2017, 10:24 PM

The dashboard repositories are R/Shiny applications (e.g. http://discovery.wmflabs.org/metrics/) that we can't apply CI to.

wikimedia/discovery/golden is a set of data-retrieving scripts that we're currently migrating to use Analytics' Reportupdate framework (T150915) which we can't apply CI-for-software to, although CI-but-for-data is something we plan to investigate in T145445

wikimedia/discovery/polloi is an R package of miscellaneous functions shared by golden and the dashboards repos, primarily for returning UI elements. I'll think about what unit tests potentially make sense there.

A shared library used by multiple user-facing tools sounds like it needs some tests. Also, at least a lint could be run against them

wikimedia/discovery/dashboard and wikimedia/discovery/experimental repos are Vagrant configurations for our dashboards hosted on Labs. I've talked to Gehel and we might migrate them to use proper Puppet configs, but that is currently super low priority.

Again, at least a basic syntax/linter would probably be useful during code-review.

Generally: Let's get the code that we develop and serve to user to have tests. Passing up the opportunity to setup the test config for these repos now sends the message that these repos don't matter. Based on what you said, I don't think that's the case.

wikimedia/discovery/ortiz and wikimedia/discovery/wmf are R packages that have some unit tests that are run when we build the packages locally (which we do before submitting patches for review).

Perfect, let's get those to run per patch-set submission!

I know that Travis can do package build checks but I'd have to figure out Jenkins R Plugin, which I don't even know if Ops would be OK with me installing it via the Jenkins dashboard.

There's not much for Discovery-Analysis to do on this ticket, except getting R package CI up and running with Jenkins, but I'd need help there because I'm not dev-opsy enough to pioneer that effort by myself.

You don't need to worry about that part, that's what RelEng does for you when you use our infrastructure (Gerrit(or Differential)+CI). We'll help you get the right plugins installed on our shared Jenkins etc. RelEng is who maintains integration.wikimedia.org.

Do you still want to decline this task even though there are things that could be done?

@debt: Any reply to the last comment / reconsidering? :)

mpopov added a comment.Mar 1 2017, 4:51 PM

@Aklapper @debt: I'm OK with re-opening this ticket if @greg or someone can help us set it up. If we can get ortiz and wmf (which already have unit tests) to do CI checks on Gerrit, Chelsy and I can look into the other ones like polloi :)

greg added a comment.Mar 1 2017, 4:57 PM

We will :)

greg reopened this task as Open.Mar 1 2017, 7:57 PM

@Aklapper @debt: I'm OK with re-opening this ticket if @greg or someone can help us set it up.

We will :)

debt moved this task from Needs triage to Up Next on the Discovery-Analysis board.Mar 2 2017, 9:12 PM
debt triaged this task as Normal priority.

Linking to the lintr package here for future reference: https://github.com/jimhester/lintr has instructions for configuring linters and adding it to Travis CI, which we could adapt to Jenkins when we get to it :)

mpopov renamed this task from Add CI to all wikimedia/discovery repositories that are active to Add lint/CI to all wikimedia/discovery analytics repositories.May 8 2017, 10:49 PM
mpopov claimed this task.
mpopov updated the task description. (Show Details)
mpopov set the point value for this task to 10.
mpopov removed a subscriber: Deskana.

Also for future reference: RStudio (the folks behind Shiny) are making a PhantomJS-based thing for automated testing of Shiny applications: https://rstudio.github.io/shinytest/

Shinytest uses snapshot-based testing strategy. The first time it runs a set of tests for an application, it performs some scripted interactions with the app and takes one or more snapshots of the application’s state. These snapshots are saved to disk so that future runs of the tests can compare their results to them.

To create tests, the easiest method is to use the recordTest() function. This launches the application in a web browser and records your interactions with the application. These interactions are saved in a .R file, and are run using the strategy described above.

The summary from my side:

  • we will start with wikimedia/discovery/ortiz (already has some unit tests)
  • I need a list of packages to install on the CI machines. Seems the puppet module shiny_server has a good starting list.
  • CI already has all the logic to connect to Gerrit, clone the repo, and add the patch. We then need a list of commands to run.
  • Since the package manager is embedded in R. It might make sense to use the Jenkins R plugin. This way the job would directly run the R commands.

To configure the jobs in Jenkins, we are using the utility jenkins-jobs. It lets one write the jobs definition using a YAML based DSL which are then processed, converted to Jenkins format and is able to update the Jenkins instance to create/update the jobs.

There is a step-by-step tutorial on https://www.mediawiki.org/wiki/CI/JJB that covers everything. Though it is geared toward production.

Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.

The summary from my side:

  • we will start with wikimedia/discovery/ortiz (already has some unit tests)
  • I need a list of packages to install on the CI machines. Seems the puppet module shiny_server has a good starting list.

Debian packages: r-base, r-base-dev, r-recommended (see https://cloud.r-project.org/bin/linux/debian/ for more info). For the ortiz R package, the following command installs its dependencies:

R -e "install.packages(c('devtools', 'Rcpp', 'testthat'), repos = c(CRAN = 'https://cran.rstudio.com'))"
  • CI already has all the logic to connect to Gerrit, clone the repo, and add the patch. We then need a list of commands to run.

Suppose we have the cloned repo and we're operating directly above it. First, we build a source package:

R --no-site-file --no-environ --no-save --no-restore CMD build --no-resave-data --no-manual ortiz

(latest version should yield a ortiz_0.0.3.tar.gz)

Then we check the built source (yes, I know, it's a little counter-intuitive to not just run R CMD check ortiz):

R --no-site-file --no-environ --no-save --no-restore --quiet CMD check --timings --no-manual ortiz_0.0.3.tar.gz

This will do a bunch of R's built-in checks and also run the unit tests and should return a status code of 0.

  • Since the package manager is embedded in R. It might make sense to use the Jenkins R plugin. This way the job would directly run the R commands.

To be perfectly honest I'm not even sure what benefits the Jenkins R plugin actually brings. Looking at its source code, it doesn't seem to really do much. But also I don't know enough about Jenkins.

To configure the jobs in Jenkins, we are using the utility jenkins-jobs. It lets one write the jobs definition using a YAML based DSL which are then processed, converted to Jenkins format and is able to update the Jenkins instance to create/update the jobs.

There is a step-by-step tutorial on https://www.mediawiki.org/wiki/CI/JJB that covers everything. Though it is geared toward production.

Thanks! I'll try going through it and ping you.

Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.

I have no idea. Maybe? Wouldn't a Jenkins instance need to get hooked into Gerrit/Zuul?

Thank you @mpopov ! I am going to do the do the provisioning on the CI instance and craft some very basic job. I guess we can then iterate from there.

> Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.
I have no idea. Maybe? Wouldn't a Jenkins instance need to get hooked into Gerrit/Zuul?

We can get a job that just get triggered on demand and does a git pull :-] It was a random idea really, maybe it is easier to Just do it ™.

Change 362107 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/polloi@master] Fix spline smoothing and add tests

https://gerrit.wikimedia.org/r/362107

Change 362309 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] (DO NOT SUBMIT) experimental R based job

https://gerrit.wikimedia.org/r/362309

I have played with R most of the afternoon/evening and have read bunch of docs. It still terribly confuses me.

I have created a very basic Jenkins job https://integration.wikimedia.org/ci/job/ortiz-test-jessie/20/console and it fails with my terrible install command:

> install.packages(".", repos="http://cran.us.r-project.org", type="source", dependencies=TRUE)
Installing package into ‘/srv/jenkins-workspace/workspace/ortiz-test-jessie/Rpackages’
(as ‘lib’ is unspecified)
Warning message:
package ‘.’ is not available (for R version 3.1.1)

What I am trying to achieve is to install from the git workspace and get the Imports/Suggests from cran.

Note: I have set R_LIBS_USER="$(pwd)/Rpackages"

Change 362107 merged by Chelsyx:
[wikimedia/discovery/polloi@master] Fix spline smoothing and add tests

https://gerrit.wikimedia.org/r/362107

Change 363337 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] (WIP) packages for R

https://gerrit.wikimedia.org/r/363337

Change 364000 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/ortiz@master] Add lint checking & update doc formatting

https://gerrit.wikimedia.org/r/364000

Change 362309 merged by jenkins-bot:
[integration/config@master] R based job for wikimedia/discovery/ortiz

https://gerrit.wikimedia.org/r/362309

Change 366045 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Rename ortiz job to remove -jessie suffix

https://gerrit.wikimedia.org/r/366045

Change 366045 merged by jenkins-bot:
[integration/config@master] Rename ortiz job to remove -jessie suffix

https://gerrit.wikimedia.org/r/366045

Change 366170 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] Move R-related code from shiny_server to separate module

https://gerrit.wikimedia.org/r/366170

Change 366170 merged by Gehel:
[operations/puppet@production] Move R-related code from shiny_server to separate module

https://gerrit.wikimedia.org/r/366170