Page MenuHomePhabricator

Move Parsoid and RESTBase testing from Travis CI to our Jenkins
Closed, DeclinedPublic

Description

We're currently using Travis (https://travis-ci.org/wikimedia/restbase) to run our tests. While this is convenient, it is really slow and is starting to cause a backlog of pull requests that must wait for a test box to become available for testing.

Let's investigate moving this to our own hardware (e.g. an existing Jenkins infrastructure, or roll our own Travis cluster) so that we can allocate test boxes readily/on-demand.

Event Timeline

Jdouglas raised the priority of this task from to Normal.
Jdouglas updated the task description. (Show Details)
Jdouglas added a project: RESTBase.
Jdouglas changed Security from none to None.
Jdouglas added a subscriber: Jdouglas.

Some things we'll need on testing machines:

  • Node/nvm
  • cassandra

+1, this would be really useful.

Not a small task, alas. I'll try adding in the QA team.

I have removed the Jenkins tag since it is meant for Tasks that affect Jenkins itself. Continuous-Integration would be enough.

We have Trusty instances in labs that come with node.js 0.10.25, probably good enough for now. To get nvm support, maybe fill a subtask and sync with Timo about it.

Cassandra, I guess it is all about adding the package to the contint labs slave. It can be added to modules/contint/manifests/packages/labs.pp. Since the labs instances are reused, you would need scripts to setup the Cassandra database before running the tests then a script to teardown it after the build has completed. Beware of race conditions when two jobs run on the same instance and sharing the same DB.

@hashar, we talked about the issues around race conditions, port conflicts etc before. Right now it might be simpler for us to spin up an LXC container and run npm test inside of it, rather than trying to manually work around a lack of isolation between tests.

We should solve this problem generically. I think we should seriously look into container / vm based CI options like travis. They have a Docker backend that we could probably use if we can do some plumbing to integrate with zuul instead of github.

ssastry moved this task from Backlog to Testing on the Parsoid board.Dec 20 2014, 1:06 AM
Krinkle removed a subscriber: Krinkle.Jan 8 2015, 11:17 AM
Krinkle added a subscriber: Krinkle.
Krinkle added a comment.EditedJan 8 2015, 11:25 AM

The generic solution for test isolation is T47499. Note that, once implemented, we could (though I'm not yet convinced we'll have to) adopt a .travis.yml-like convention as test entry point. Most of Travis' infrastructure is open-source:

And they also provide an on-sight Enterprise solution http://blog.travis-ci.com/2014-12-19-introducing-travis-ci-enterprise/.

For the moment, let's try and fulfil this request by adding the necessary infrastructure to our labs slaves in general (we already have node v0.10 on Ubuntu Trusty). And in cases where concurrency within a node is not feasible within the test suite, we can still parallelise as we have multiple slaves.

Krinkle renamed this task from Move testing to our own hardware to Move Parsoid testing from Travis CI to our Jenkins.Jan 8 2015, 11:28 AM
GWicke added a comment.EditedJan 8 2015, 3:59 PM

@Krinkle, this issue was / is about restbase testing, which requires cassandra. At this point we'll probably also continue to use travis at least on a mirror (as it's easy to do, and has tools like coveralls.io), but it would be nice to be able to run the same tests in Jenkins and thus use the WMF infrastructure.

GWicke renamed this task from Move Parsoid testing from Travis CI to our Jenkins to Move Parsoid and RESTBase testing from Travis CI to our Jenkins.Jan 8 2015, 4:00 PM

Possibly we could get cassandra added to the current CI slaves but the setup/teardown is going to be a bit cumbersome. CiviCRM has a similar issue, they would like to use a MySQL database as a backend.

I am currently busy with T47499: [EPIC] Run CI jobs in disposable VMs, I think we should wait for it before adding more craziness to our Jenkins / CI infra.

Jdouglas added a comment.EditedJan 13 2015, 4:38 PM

It might be worth noting that Travis CI slowness hasn't been an issue recently. It only occasionally crops up when the team is working on a lot of inter-dependent things in parallel, causing us to wait around for Travis builds before we can push dependencies through the pipeline.

Since we're working on isolated/parallelizable tasks right now, there isn't a pressing need to move away from Travis.

Point in favor of setting up our own infrastructure: Travis is currently in a bad state, and all we can do is wait.

https://travis-ci.org/wikimedia/restbase/jobs/48088065

hashar added a comment.Feb 5 2015, 9:03 AM

Following on discussions I had last week with Gabriel Wicke and Adam Wight (for CiviCRM testing T86374).

We can get one or more dedicated labs instance that you can configure how ever you want via puppet and we can add it as a slave. Then configure a job that invokes a test entry point (ex: npm test) and only runs on that instance.

The only requirement should be applying the puppet class role::ci::slave::labs::common which set up the requirements to be able to add the instance to the Jenkins master.

I propose to get the labs project quota bumped a bit and create an integration-restbase01.eqiad.wmflabs instance then add you guys to the labs project so you can set it up via puppet and add whatever you need.

marcoil moved this task from Testing to Backlog on the Parsoid board.Feb 13 2015, 12:50 PM

@hashar, this still sounds fairly complex and labor intense. How many vms would we need to maintain? One per repository? Would we have to manually manage node versions (currently testing most repos against 0.10 and 0.12, hoping for iojs)? What would the per-repository setup process be?

If there is a way to generalize this without us maintaining our own jenkins slaves etc then I would very much prefer that. Did you see the comments I left at https://www.mediawiki.org/wiki/Talk:Continuous_integration/Architecture/Isolation?

GWicke lowered the priority of this task from Normal to Low.Mar 15 2015, 5:36 PM
GWicke moved this task from Backlog to In progress on the RESTBase board.Mar 17 2015, 8:24 PM
GWicke moved this task from In progress to Blocked / others on the RESTBase board.

As an update, this is the current status: We are testing each commit in restbase and various modules (like https://github.com/gwicke/restbase-mod-table-cassandra and https://github.com/wikimedia/restbase-mod-table-sqlite) against node 0.10, iojs 2.5, node 0.12, node 4.1. Speed has improved significantly since switching to the container infrastructure, but there are still occasionally periods where the tests take a couple minutes to start.

We are also getting coveralls code coverage reports for our most important repositories, which has kept us motivated to keep coverage up.

Overall, I'd say testing is currently working reasonably well.

jayvdb added a subscriber: jayvdb.Oct 8 2015, 4:16 AM

..
We are also getting coveralls code coverage reports for our most important repositories, which has kept us motivated to keep coverage up.

You might like to look at http://codecov.io ; IMO it is better in every regard. See T74863 and T96601.
What would be even better is if WMF had its own 'coverage' server, like https://github.com/localytics/shamer

GWicke added a comment.EditedOct 8 2015, 11:44 PM

@jayvdb: http://codecov.io looks interesting, thanks for the pointer!

I'd really like to see a coverage diff annotation between revisions (pointing out the specific code lines newly covered / no longer covered in each file). Does http://codecov.io provide something like that?

@jayvdb: http://codecov.io looks interesting, thanks for the pointer!
I'd really like to see a coverage diff annotation between revisions (pointing out the specific code lines newly covered / no longer covered in each file). Does http://codecov.io provide something like that?

If you are looking for coverage overlaid on the diff, I *think* that is what this page is showing: https://codecov.io/github/wikimedia/wikipedia-ios/commit/f5e7393c5d3a54a43d219ead25a88707ff0206e5 . @BGerstle-WMF might be able to confirm that.

However I see https://github.com/codecov/support/issues/20 is still open.

@jayvdb: http://codecov.io looks interesting, thanks for the pointer!
I'd really like to see a coverage diff annotation between revisions (pointing out the specific code lines newly covered / no longer covered in each file). Does http://codecov.io provide something like that?

If you are looking for coverage overlaid on the diff, I *think* that is what this page is showing: https://codecov.io/github/wikimedia/wikipedia-ios/commit/f5e7393c5d3a54a43d219ead25a88707ff0206e5 .

This is pretty close, but I'm not sure if a change to tests only (without changes in tested code) would really be highlighted as a "coverage diff" in all tested files.

@BGerstle-WMF might be able to confirm that.
However I see https://github.com/codecov/support/issues/20 is still open.

The overview of changes across files is already available in coveralls (ex: https://coveralls.io/builds/3792631), but the thing I'm really missing is the ability to locate the lines with changed coverage.

Krinkle removed a subscriber: Krinkle.Feb 23 2016, 4:51 PM
ssastry moved this task from Backlog to Testing on the Parsoid board.Dec 18 2017, 10:09 PM
Pchelolo closed this task as Declined.Jul 17 2019, 4:50 PM
Pchelolo added a subscriber: Pchelolo.

For now I don't think it worths doing so. The development is continuing in github, and for deployment the tests will be run in Jenkins now.