Page MenuHomePhabricator

Add lint/CI to all wikimedia/discovery analytics repositories
Closed, DeclinedPublic10 Estimated Story Points

Description

We have multiple repositories in Gerrit under wikimedia/discovery which seem to be active but they lack CI configuration. Would be rather nice to have tests to automatically run for each of them.

RepoNotes
dashboardWill soon be retired in favor of managing via Puppet (T161354)
- princeR/Shiny-based dashboard for Wikipedia.org portal metrics (eligible for lint)
- rainbowR/Shiny-based dashboard for search team metrics (eligible for lint)
- twilightsparqlR/Shiny-based dashboard for WDQS traffic & usage (eligible for lint)
- wetzelR/Shiny-based dashboard for Maps usage (eligible for lint)
- wonderboltR/Shiny-based dashboard for externally referred traffic breakdown (eligible for lint)
RepoNotes
experimentalhttps://discovery-experimental.wmflabs.org/, submodules on Github (bearloga/wmf-delphi and chelsyx/wmf-poultry); may be retired in favor of Puppet
goldenmetric-retrieving codebase (SQL/Hive queries & R scripts) using Reportupdater (T150915); can't do tests because it'd require access to private data via stat1002, but eligible for lint checking
ortizR package with unit tests (via testthat); eligible for CI
polloiR package of common functions used by Discovery Dashboards; eligible for lint
wmfR package with some unit tests (via testthat); eligible for lint and maybe some CI

Event Timeline

Updated the task description to better describe each of the repositories. Most are R applications without any test so I am not sure what CI can offer there. Maybe there is some R syntax check/linter we could run.

oritz and wmf have tests using testthat. If we want to run tests in CI, I guess we will need R and whatever package manager exists to magically download/install the dependencies.

discernatron has tests composer fails installing dependencies ( filled as T153859 ).

Quite a few of these are related to analysis, @mpopov and @chelsyx should comment on those.

The dashboard repositories are R/Shiny applications (e.g. http://discovery.wmflabs.org/metrics/) that we can't apply CI to.

wikimedia/discovery/golden is a set of data-retrieving scripts that we're currently migrating to use Analytics' Reportupdate framework (T150915) which we can't apply CI-for-software to, although CI-but-for-data is something we plan to investigate in T145445

wikimedia/discovery/polloi is an R package of miscellaneous functions shared by golden and the dashboards repos, primarily for returning UI elements. I'll think about what unit tests potentially make sense there.

wikimedia/discovery/dashboard and wikimedia/discovery/experimental repos are Vagrant configurations for our dashboards hosted on Labs. I've talked to Gehel and we might migrate them to use proper Puppet configs, but that is currently super low priority.

wikimedia/discovery/ortiz and wikimedia/discovery/wmf are R packages that have some unit tests that are run when we build the packages locally (which we do before submitting patches for review). I know that Travis can do package build checks but I'd have to figure out Jenkins R Plugin, which I don't even know if Ops would be OK with me installing it via the Jenkins dashboard.

There's not much for Discovery-Analysis to do on this ticket, except getting R package CI up and running with Jenkins, but I'd need help there because I'm not dev-opsy enough to pioneer that effort by myself.

debt subscribed.

Declining this ticket as there isn't much we can do with it in Discovery - based on the notes written here.

The dashboard repositories are R/Shiny applications (e.g. http://discovery.wmflabs.org/metrics/) that we can't apply CI to.

wikimedia/discovery/golden is a set of data-retrieving scripts that we're currently migrating to use Analytics' Reportupdate framework (T150915) which we can't apply CI-for-software to, although CI-but-for-data is something we plan to investigate in T145445

wikimedia/discovery/polloi is an R package of miscellaneous functions shared by golden and the dashboards repos, primarily for returning UI elements. I'll think about what unit tests potentially make sense there.

A shared library used by multiple user-facing tools sounds like it needs some tests. Also, at least a lint could be run against them

wikimedia/discovery/dashboard and wikimedia/discovery/experimental repos are Vagrant configurations for our dashboards hosted on Labs. I've talked to Gehel and we might migrate them to use proper Puppet configs, but that is currently super low priority.

Again, at least a basic syntax/linter would probably be useful during code-review.

Generally: Let's get the code that we develop and serve to user to have tests. Passing up the opportunity to setup the test config for these repos now sends the message that these repos don't matter. Based on what you said, I don't think that's the case.

wikimedia/discovery/ortiz and wikimedia/discovery/wmf are R packages that have some unit tests that are run when we build the packages locally (which we do before submitting patches for review).

Perfect, let's get those to run per patch-set submission!

I know that Travis can do package build checks but I'd have to figure out Jenkins R Plugin, which I don't even know if Ops would be OK with me installing it via the Jenkins dashboard.

There's not much for Discovery-Analysis to do on this ticket, except getting R package CI up and running with Jenkins, but I'd need help there because I'm not dev-opsy enough to pioneer that effort by myself.

You don't need to worry about that part, that's what RelEng does for you when you use our infrastructure (Gerrit(or Differential)+CI). We'll help you get the right plugins installed on our shared Jenkins etc. RelEng is who maintains integration.wikimedia.org.

Do you still want to decline this task even though there are things that could be done?

@debt: Any reply to the last comment / reconsidering? :)

@Aklapper @debt: I'm OK with re-opening this ticket if @greg or someone can help us set it up. If we can get ortiz and wmf (which already have unit tests) to do CI checks on Gerrit, Chelsy and I can look into the other ones like polloi :)

@Aklapper @debt: I'm OK with re-opening this ticket if @greg or someone can help us set it up.

We will :)

debt triaged this task as Medium priority.Mar 2 2017, 9:12 PM
debt moved this task from Needs triage to Up Next on the Discovery-Analysis board.

Linking to the lintr package here for future reference: https://github.com/jimhester/lintr has instructions for configuring linters and adding it to Travis CI, which we could adapt to Jenkins when we get to it :)

mpopov renamed this task from Add CI to all wikimedia/discovery repositories that are active to Add lint/CI to all wikimedia/discovery analytics repositories.May 8 2017, 10:49 PM
mpopov claimed this task.
mpopov updated the task description. (Show Details)
mpopov set the point value for this task to 10.
mpopov removed a subscriber: Deskana.

Also for future reference: RStudio (the folks behind Shiny) are making a PhantomJS-based thing for automated testing of Shiny applications: https://rstudio.github.io/shinytest/

Shinytest uses snapshot-based testing strategy. The first time it runs a set of tests for an application, it performs some scripted interactions with the app and takes one or more snapshots of the application’s state. These snapshots are saved to disk so that future runs of the tests can compare their results to them.

To create tests, the easiest method is to use the recordTest() function. This launches the application in a web browser and records your interactions with the application. These interactions are saved in a .R file, and are run using the strategy described above.

The summary from my side:

  • we will start with wikimedia/discovery/ortiz (already has some unit tests)
  • I need a list of packages to install on the CI machines. Seems the puppet module shiny_server has a good starting list.
  • CI already has all the logic to connect to Gerrit, clone the repo, and add the patch. We then need a list of commands to run.
  • Since the package manager is embedded in R. It might make sense to use the Jenkins R plugin. This way the job would directly run the R commands.

To configure the jobs in Jenkins, we are using the utility jenkins-jobs. It lets one write the jobs definition using a YAML based DSL which are then processed, converted to Jenkins format and is able to update the Jenkins instance to create/update the jobs.

There is a step-by-step tutorial on https://www.mediawiki.org/wiki/CI/JJB that covers everything. Though it is geared toward production.

Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.

The summary from my side:

  • we will start with wikimedia/discovery/ortiz (already has some unit tests)
  • I need a list of packages to install on the CI machines. Seems the puppet module shiny_server has a good starting list.

Debian packages: r-base, r-base-dev, r-recommended (see https://cloud.r-project.org/bin/linux/debian/ for more info). For the ortiz R package, the following command installs its dependencies:

R -e "install.packages(c('devtools', 'Rcpp', 'testthat'), repos = c(CRAN = 'https://cran.rstudio.com'))"
  • CI already has all the logic to connect to Gerrit, clone the repo, and add the patch. We then need a list of commands to run.

Suppose we have the cloned repo and we're operating directly above it. First, we build a source package:

R --no-site-file --no-environ --no-save --no-restore CMD build --no-resave-data --no-manual ortiz

(latest version should yield a ortiz_0.0.3.tar.gz)

Then we check the built source (yes, I know, it's a little counter-intuitive to not just run R CMD check ortiz):

R --no-site-file --no-environ --no-save --no-restore --quiet CMD check --timings --no-manual ortiz_0.0.3.tar.gz

This will do a bunch of R's built-in checks and also run the unit tests and should return a status code of 0.

  • Since the package manager is embedded in R. It might make sense to use the Jenkins R plugin. This way the job would directly run the R commands.

To be perfectly honest I'm not even sure what benefits the Jenkins R plugin actually brings. Looking at its source code, it doesn't seem to really do much. But also I don't know enough about Jenkins.

To configure the jobs in Jenkins, we are using the utility jenkins-jobs. It lets one write the jobs definition using a YAML based DSL which are then processed, converted to Jenkins format and is able to update the Jenkins instance to create/update the jobs.

There is a step-by-step tutorial on https://www.mediawiki.org/wiki/CI/JJB that covers everything. Though it is geared toward production.

Thanks! I'll try going through it and ping you.

Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.

I have no idea. Maybe? Wouldn't a Jenkins instance need to get hooked into Gerrit/Zuul?

Thank you @mpopov ! I am going to do the do the provisioning on the CI instance and craft some very basic job. I guess we can then iterate from there.

Would it make sense to set up a small Jenkins instance on which you can exercise / prototype? Probably one can spawn a labs instance and add the jenkins puppet class to it and maybe that is sufficient.

I have no idea. Maybe? Wouldn't a Jenkins instance need to get hooked into Gerrit/Zuul?

We can get a job that just get triggered on demand and does a git pull :-] It was a random idea really, maybe it is easier to Just do it ™.

Change 362107 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/polloi@master] Fix spline smoothing and add tests

https://gerrit.wikimedia.org/r/362107

Change 362309 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] (DO NOT SUBMIT) experimental R based job

https://gerrit.wikimedia.org/r/362309

I have played with R most of the afternoon/evening and have read bunch of docs. It still terribly confuses me.

I have created a very basic Jenkins job https://integration.wikimedia.org/ci/job/ortiz-test-jessie/20/console and it fails with my terrible install command:

> install.packages(".", repos="http://cran.us.r-project.org", type="source", dependencies=TRUE)
Installing package into ‘/srv/jenkins-workspace/workspace/ortiz-test-jessie/Rpackages’
(as ‘lib’ is unspecified)
Warning message:
package ‘.’ is not available (for R version 3.1.1)

What I am trying to achieve is to install from the git workspace and get the Imports/Suggests from cran.

Note: I have set R_LIBS_USER="$(pwd)/Rpackages"

Change 362107 merged by Chelsyx:
[wikimedia/discovery/polloi@master] Fix spline smoothing and add tests

https://gerrit.wikimedia.org/r/362107

Change 363337 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] (WIP) packages for R

https://gerrit.wikimedia.org/r/363337

Change 364000 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[wikimedia/discovery/ortiz@master] Add lint checking & update doc formatting

https://gerrit.wikimedia.org/r/364000

Change 362309 merged by jenkins-bot:
[integration/config@master] R based job for wikimedia/discovery/ortiz

https://gerrit.wikimedia.org/r/362309

Change 366045 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Rename ortiz job to remove -jessie suffix

https://gerrit.wikimedia.org/r/366045

Change 366045 merged by jenkins-bot:
[integration/config@master] Rename ortiz job to remove -jessie suffix

https://gerrit.wikimedia.org/r/366045

Change 366170 had a related patch set uploaded (by Bearloga; owner: Bearloga):
[operations/puppet@production] Move R-related code from shiny_server to separate module

https://gerrit.wikimedia.org/r/366170

Change 366170 merged by Gehel:
[operations/puppet@production] Move R-related code from shiny_server to separate module

https://gerrit.wikimedia.org/r/366170

On integration-r-lang01.integration.eqiad.wmflabs I have noticed that puppet always spam:

Notice: /Stage[main]/Profile::Rlang::Dev/R_lang::Cran[lintr]/Exec[package-lintr]/returns: executed successfully
Notice: /Stage[main]/Profile::Rlang::Dev/R_lang::Cran[testthat]/Exec[package-testthat]/returns: executed successfully
Notice: /Stage[main]/R_lang/R_lang::Cran[openssl]/Exec[package-openssl]/returns: executed successfully
Notice: /Stage[main]/R_lang/File[/usr/local/lib/R/site-library]/mode: mode changed '0755' to '0770'
Notice: /Stage[main]/R_lang/R_lang::Cran[xml2]/Exec[package-xml2]/returns: executed successfully
Notice: /Stage[main]/R_lang/R_lang::Cran[curl]/Exec[package-curl]/returns: executed successfully
Notice: /Stage[main]/R_lang/R_lang::Cran[devtools]/Exec[package-devtools]/returns: executed successfully

The reason is puppet r_lang::cran does not install anything. From puppet agent -tv --debug I ran:

$ /usr/bin/R -e "install.packages('devtools', repos = c(CRAN = 'https://cloud.r-project.org'), lib = '/usr/local/lib/R/site-library')"
Warning: unable to access index for repository https://cloud.r-project.org/src/contrib
Warning message:
package ‘devtools’ is not available (for R version 3.1.1) 
> 

And obviously /usr/local/lib/R/site-library is empty. If I change the repos from HTTPS to HTTP, it works fine. What I suspect is that install.packages depends on another package to be installed in order to support HTTPS.

Another nit pick, on Debian R library default to '/usr/local/lib/R/site-library' so we should not have to pass it by default.

And obviously /usr/local/lib/R/site-library is empty. If I change the repos from HTTPS to HTTP, it works fine. What I suspect is that install.packages depends on another package to be installed in order to support HTTPS.

That's really weird! I've never had problems downloading and installing packages from a CRAN mirror via HTTPS with a fresh copy of R. The only times I've had issues have been on WMF machines when I've forgotten to set the http[s]_proxy env variables in my .bashrc

I guess I should create a fresh instance and verify again. Maybe integration-r-lang-01 is in a bad shape :-(

It looks like we're pretty much done with this from a proof of concept, @mpopov will take one more look.

@mpopov is the current job working properly? Should we move it to the gate-and-submit queue so it has to pass before patches can be merged? I'm probably going to refactor the existing jobs over to our new docker-based system but wanted to make sure its actually useful and works before doing so.

Change 363337 merged by Dzahn:
[operations/puppet@production] contint: profile, role, and packages for R language

https://gerrit.wikimedia.org/r/363337

So I setup a rough lintR job used on the wmde WDCM repo using a docker container.
Might be an idea to see if we can use that here?

@mpopov is the current job working properly? Should we move it to the gate-and-submit queue so it has to pass before patches can be merged? I'm probably going to refactor the existing jobs over to our new docker-based system but wanted to make sure its actually useful and works before doing so.

Sorry, I really dropped the ball on following up on this. What's the current status? It's not clear whether it worked on https://gerrit.wikimedia.org/r/#/c/364000/

My team was in disarray and restructuring when bulk of the work happened on this (hence the delayed responses and lack of CR) and now we don't have the need or the bandwidth for this. We will continue testing changes locally as we have been doing for years, although these days we're not even actively working on any repos/packages listed.

I'm sorry it didn't work out.

:(

Regardless, it's always our goal to have basic lint checks running for all repositories that we depend upon. If your team has no bandwidth for this, feel free to remove your team projects from this. Hopefully an interested volunteer will want to take on R support.

Change 538265 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Remove r-lang / ortiz job

https://gerrit.wikimedia.org/r/538265

Mentioned in SAL (#wikimedia-releng) [2019-09-20T13:23:00Z] <hashar> Deleting Jenkins agent integration-r-lang-01 (unused) # T153856

Re declining, we did a quick experiment two years ago, but it never concretized. Maybe later we can revisit using Docker containers and pairing with people knowledgeable about R and its test/package infrastructure.

The unused infrastructure and jobs added overhead even though they were not used, I have removed all the experimental bits from the CI configuration (unconfigured from Zuul, removed the job in Jenkins, removed the agent and the WMCS instance).

Change 538265 merged by jenkins-bot:
[integration/config@master] Remove r-lang / ortiz job

https://gerrit.wikimedia.org/r/538265

Change 364000 merged by Bearloga:
[wikimedia/discovery/ortiz@master] Add lint checking & update doc formatting

https://gerrit.wikimedia.org/r/364000