Page MenuHomePhabricator

[RFC] Optional Travis integration for Jenkins
Closed, DeclinedPublic

Description

We currently have several projects which can not be tested with our current Jenkins test infrastructure, and as a consequence are hosting their primary code repositories on github:

  • iOS Mobile app
    • missing OS X platform, it was suggested that travis as is isn't enough and that SauceLabs might be able to provide that
    • signed builds with a private key we don't want to give is not something that travis can support either, see T114569 for that

An alternative to allow these apps to be hosted on Wikimedia infrastructure (gerrit, eventually phabricator) is to allow travis integration with jenkins as an optional service.

npm-travis is a tool which will trigger Travis builds from NPM by pushing to a throwaway branch, which is then cleaned up after the tests complete. It integrates well with the Gerrit access control mechanism: the "Travis Bot" user can be granted push access only, and only to branches prefixed with npm-travis/, so it cannot be used to push changes to the master or deployment branches.

This isn't a replacement for our jenkins test infrastructure, but it allows us to accommodate oddball repositories without taxing our infrastructure team or resorting to offsite repository hosting.

There are WIP patches for integrating npm-travis with our jenkins infrastructure (https://gerrit.wikimedia.org/r/173045, https://gerrit.wikimedia.org/r/173046) but they seem to be blocked on policy disagreements. This RFC aims to resolve the policy issues.

Mailing list discussion at https://lists.wikimedia.org/pipermail/wikitech-l/2015-October/083446.html


These can be done on WM CI as the required features are available:

  • RESTBase -- Cassandra is used in production, it can be set up in WM CI
  • mw-ocg-latexer -- requires LaTeX from PPA, image utilities (jpegtran). This can be done with the VMs that are spawned per job run by the WM CI.
  • citoid - requires zotero; jenkins only does jshint checking.; The mail already says travis doesn't help here and that it would be nice to be able to spawn a VM, which is now possible.

Event Timeline

cscott created this task.Oct 1 2015, 8:36 PM
cscott raised the priority of this task from to Needs Triage.
cscott updated the task description. (Show Details)
cscott added a project: TechCom-RFC.
cscott added a subscriber: cscott.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2015, 8:36 PM

Change 173046 had a related patch set uploaded (by Cscott):
Add travis-set-env.sh helper for travis integration.

https://gerrit.wikimedia.org/r/173046

Change 173045 had a related patch set uploaded (by Cscott):
Experimental travis integration.

https://gerrit.wikimedia.org/r/173045

cscott updated the task description. (Show Details)Oct 1 2015, 8:42 PM
cscott set Security to None.
cscott updated the task description. (Show Details)
cscott updated the task description. (Show Details)
greg added a comment.Oct 1 2015, 8:46 PM

The bulk of it:

The default location for code hosting and code-review is on the WMF-hosted platform (currently Gerrit, soon to be Phabricator); exceptions to this default fall into two categories:

  • When the Continuous Integration services provided by WMF Release Engineering are not sufficient for the project (example: iOS app building and testing)
  • When the project is maintained by community members or staff in their volunteer capacity or if it was started before the staff member joined the Foundation (example: gdnsd).

This addresses the iOS/RESTBase issues/concerns (explicitly).

To be extra explicit: That document linked was written only about code-review, which I think is actually what you're trying to do here (ie: keep code review in Gerrit, but call out to Travis for certain jobs not supported by WMF CI Infrastructure). There is not a similar document about that (yet).

cscott renamed this task from Optional Travis integration for Jenkins to [RFC] Optional Travis integration for Jenkins.Oct 1 2015, 8:47 PM

Yes, I'm trying to explicitly limit the scope of that first exception by expanding the capabilities of WMF's CI services.

jayvdb added a subscriber: jayvdb.Oct 2 2015, 1:20 AM

Yes please! Pywikibot would love this capability.

Except for OS X the requirements needed are also needed in the production cluster, so I see no problems with also having them around on CI slaves.

gallium:~$ grep '<Ref 0x.* deletes npm-travis-.*' /var/log/zuul/zuul.log.2015-09-29                                                                                                                                                                                 
2015-09-29 13:42:31,910 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc00a8d2950 deletes npm-travis-c09fade2 from c09fade2e4ff0a0a683c6742d14d699b378e7b2f to <Pipeline post>                                          
2015-09-29 13:54:56,199 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/text_renderer, <Ref 0x7fc00bafbcd0 deletes npm-travis-1a517c88 from 1a517c88444d4e1f39aaf654ddfc368651b3dff5 to <Pipeline post>                                    
2015-09-29 14:03:11,126 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc008953750 deletes npm-travis-fe90cd26 from fe90cd261a150bd2c282e7300ac65666f28665ad to <Pipeline post>                                     
2015-09-29 14:09:55,403 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc00939aa90 deletes npm-travis-2b336a38 from 2b336a38c83a77c525e7f056eff67ed3c93ab662 to <Pipeline post>                                     
2015-09-29 14:10:09,526 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc00a67a550 deletes npm-travis-c4700bd0 from c4700bd097c04b651ddbde2202626ea8be38f950 to <Pipeline post>                                     
2015-09-29 14:10:16,844 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc00a48bbd0 deletes npm-travis-e893e00c from e893e00c514d6285d84453a1f2b013b8b1686a7a to <Pipeline post>                                     
2015-09-29 14:24:13,054 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc009b189d0 deletes npm-travis-fdca59c1 from fdca59c1b6560a4ec864dc0264425199d4bfdea8 to <Pipeline post>                                     
2015-09-29 14:24:44,649 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/latex_renderer, <Ref 0x7fc00802c250 deletes npm-travis-3a3420dd from 3a3420dd648952d095fcf6683baf9910de29b9a2 to <Pipeline post>                                   
2015-09-29 14:24:52,661 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/latex_renderer, <Ref 0x7fc00a104950 deletes npm-travis-f21680a9 from f21680a9aeb64829c7b16d9cdc56a7f7b1a33409 to <Pipeline post>                                   
2015-09-29 14:27:07,314 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc009cf4ad0 deletes npm-travis-8199e6ce from 8199e6ce97b205b78337bbd7390c6c9f238a6e54 to <Pipeline post>                                     
2015-09-29 14:41:37,067 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc00bf6f450 deletes npm-travis-77269390 from 77269390ce9c4513a65b5b78b7f813e39479bf4e to <Pipeline post>                                     
2015-09-29 14:47:29,217 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/zim_renderer, <Ref 0x7fc008947990 deletes npm-travis-8cda89db from 8cda89dbd63317629f0654d5fe9ca6990e15eaae to <Pipeline post>                                     
2015-09-29 15:24:34,017 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc0099344d0 deletes npm-travis-2c962c96 from 2c962c969fcb3721f3b2e39eff23149b0468a856 to <Pipeline post>                                          
2015-09-29 15:32:39,549 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc00ac65d90 deletes npm-travis-63bf7f78 from 63bf7f785d828ea8a48e433238ce0ed290a45730 to <Pipeline post>                                          
2015-09-29 15:39:16,699 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/text_renderer, <Ref 0x7fc0080461d0 deletes npm-travis-492feb9b from 492feb9b1bdebda7af090f63250dc6cb9fa8cacc to <Pipeline post>                                    
2015-09-29 15:56:39,822 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc008b40510 deletes npm-travis-bbbc79ac from bbbc79acd4fc6b8d098b7aa4f50cd9423d030700 to <Pipeline post>                                          
2015-09-29 17:51:28,988 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc0093c4490 deletes npm-travis-60bbe357 from 60bbe35755b765a4fc05afdee0db241e04214f4c to <Pipeline post>                                          
2015-09-29 21:08:09,173 INFO zuul.Scheduler: Adding mediawiki/extensions/Collection/OfflineContentGenerator/bundler, <Ref 0x7fc008a2ba50 deletes npm-travis-70652f00 from 70652f00e15401ff8c1331bc62167b624f25034f to <Pipeline post>

Not a new occurance.

For mw-ocg-latexer what hinders you from adding the required tools to Wikimedia CI instead of using travis?

Except for OS X the requirements needed are also needed in the production cluster, so I see no problems with also having them around on CI slaves.

FYI, we have a Mac Mini on an internal network for building, testing, & deploying merges to master. Adding it to CI infrastructure is a possibility, but—as we've discussed previously—not trivial to maintain. That said, using our own Mac Mini does enable us to (eventually) run different kinds of tests which aren't always practical to run on Travis due to VM performance (e.g. UI tests). We could even run them on physical devices, but it's probably preferable to use a provider like Sauce Labs for that. Maintaining OS X machines is one thing, having someone maintain test devices plugged into them is an order of magnitude more labor-intensive—not to mention requiring them to be co-located with the machines.

cscott updated the task description. (Show Details)Oct 2 2015, 5:23 PM
cscott added a comment.Oct 2 2015, 5:28 PM

@JanZerebecki -- I'm not sure what your excerpt from the zuul log is intended to mean? Yes, I've been using npm-travis on the mw-ocg-* repos for some time now. If you look through the logs you'll probably find entries dating back to 2014. The log entries look like what I'd expect to see when npm-travis cleans up its working branch.

And for mw-ocg-latexer, the main issue is that Extension:Math and mw-ocg-latexer require different versions of the tex tools, and so they can't both be tested on the same CI hardware, IIRC. CI staff didn't want to touch the latex install on the CI machines for fear of breaking Extension:Math.

As the Wikimedia CI now can provision a new VM that is used for only one job, it is possible to have different globally installed software inside that VM for different jobs. (See https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation which is now implemented sufficiently for it to run the job that tests the CI config.)

greg added a comment.Oct 2 2015, 6:06 PM

Correct, the limitations on different versions should be addressed by the Isolation project that is going very well.

See the two CI related quarterly goals here (in Q1 and Q2):
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Year
(we changed the name of the goal after talking with Lila and others to CI Scaling/"Reduce CI wait time").

Except for OS X the requirements needed are also needed in the production cluster, so I see no problems with also having them around on CI slaves.

FYI, we have a Mac Mini on an internal network for building, testing, & deploying merges to master. Adding it to CI infrastructure is a possibility, but—as we've discussed previously—not trivial to maintain.

I'm not sure anymore what our exact conclusion was. I don't think we made exact notes. So maybe we need to revisit this. Is there any reason this would still be needed if connecting the CI to Sauce Labs would work?

@JanZerebecki we'd still need to cut release builds, which we're currently prefering to execute on our own hardware

greg added a comment.Oct 2 2015, 9:13 PM

@JanZerebecki we'd still need to cut release builds, which we're currently prefering to execute on our own hardware

...due to signing with private keys we don't want to just throw around anywhere :)

cscott added a comment.Oct 2 2015, 9:21 PM

WRT CI Isolation -- yes, eventually our CI team can reimplement all of travis. But should we?

This RFC proposes an escape mechanism to allow us to cut corners where necessary and outsource so we can concentrate on core mission tasks. The travis team will always be better/faster at building travis than we will be at copying them. OS X support is one example. The zotero instance for Citoid is another. Sure, perhaps we can eventually tackle some of the low-hanging fruit and bring more of the travis-using jobs in house. That would be great!

But are we actually going to commit to complete feature parity with travis?

greg added a comment.Oct 2 2015, 9:41 PM

WRT CI Isolation -- yes, eventually our CI team can reimplement all of travis. But should we?
...
But are we actually going to commit to complete feature parity with travis?

Red herring/false dichotomy. No one is saying we should reimplement all of travis. I'm just telling you what we're doing now and what is coming very soon (next quarter).

Also, you seem to be arguing against a strawman somewhere that we'll never allow a subset of jobs on travis; this is demonstrably false.

cscott added a comment.EditedOct 2 2015, 9:54 PM

Sorry, @greg, I guess we have a misunderstanding. That "strawman" is precisely what this RFC is about: my patches to add the integration between jenkins and npm-travis were C-2'ed because "[m]aking Wikimedia CI depend on travis will not be accepted." So https://gerrit.wikimedia.org/r/173045 and https://gerrit.wikimedia.org/r/173046 are my "demonstrably true" proofs that my strawman is not invented.

This RFC exists to settle the policy question. If we all agree that "a subset of jobs [can run] on travis", then we can easily resolve this RFC --- and go back to working out the technical issues with the patches above to make this happen.

(Unless of course we *do* want to allow optional integration with travis but by some other means, *not* npm-travis. That discussion would be in scope here, I suppose. But I don't think anyone is actually proposing that?)

greg added a comment.Oct 2 2015, 10:01 PM

Sorry, @greg, I guess we have a misunderstanding. That "strawman" is precisely what this RFC is about: my patches to add the integration between jenkins and npm-travis were C-2'ed because "[m]aking Wikimedia CI depend on travis will not be accepted." So https://gerrit.wikimedia.org/r/173045 and https://gerrit.wikimedia.org/r/173046 are my "demonstrably true" proofs that my strawman is not invented.

I just find it odd that you decided to start an RFC on the topic, with unhelpful (and untrue) criticisms of WMF RelEng work, instead of simply talking with WMF RelEng about it. I guess you were just in RFC mode (I haven't looked at your others yet).

I made separate tasks for the iOS Mobile app requirements, it is not fully clear if the WM CI has the features that they need, so we should try to find out. It was also said that travis is not sufficient, so travis integration wouldn't help. But lets see what we learn in T114570: ability to run CI for iOS Mobile app. The other cases mentioned here can all be done in the WM CI.

Yes please! Pywikibot would love this capability.

What are you missing for pywikibot?

cscott added a comment.Oct 3 2015, 2:24 PM

@greg Well, I'm sorry it's been taken personally, and yes I was clearly in RFC mode. But the RFC process is also a reasonable way to get broader input on technical issues, since it should be obvious by some of the task editing here that some folks clearly feel "we'll never run a subset of jobs on travis".

I don't want to engage in a task editing war here, nor a gerrit patch war. I'd prefer to wait for an RFC meeting, and I'll accept whatever conclusions come from it.

hashar added a comment.Oct 5 2015, 1:16 PM

I have seen the npm-travis change and vetoed it on the basis that relying on Gerrit/Github replication to trigger build in Travis and then reporting back to Gerrit is too long of a pipe. Additionally that relies on a couple third parties I am not confident to support for the WMF CI infrastructure. I am 100% sure it will add burden to the little less 2 full time equivalent folks we are.

I have informed repeatedly the WMF we are shot on resources but other priorities are deemed more important (I am not judging). So I am extremely picky when it is about to add more complexity to an already complicated workflow.

That being said, lets look at the projects mentioned in the task detail:

iOS mobile

The iOS mobile app is a unique case. Brian Gerstle in March/April identified all the requirements and write down the basis of a MVP that would use Gerrit/WMF Jenkins/CI. We met during the Lyon hackathon in a very enterprise like meeting (i.e.: managers were around), and after a month or so eventually we agreed that WMF could not maintain an iOS ecosystem (Mac machines, Xcode etc). So they migrated to GitHub / Travis / some 3rd party offering Mac testing.

RESTBase

RESTBase -- Cassandra is used in production, it can be set up in WM CI

I can not remember the details when the project started. But I think Gabriel started it on his private GitHub account and then renamed it to the wikimedia account. I am not sure what is/was the reason to have it on GitHub instead of Gerrit.

As you said, we can setup Cassandra on the CI Jenkins slaves. Up until recently, the blockers I have in mind were:

  • the Jenkins slaves being permanent and rather small. Hence adding more backend services starve memory and disk space.
  • lack of a way to easily setup and teardown the services

Nowadays we have Nodepool which spawn fresh Jessie instances for us. If we figure out out to spawn Cassandra for jobs that need it, it is essentially solved. The job will run on a fresh instance, spawn Cassandra, use whatever default settings and run the tests happily.

OCG Latex

mw-ocg-latexer -- requires LaTeX from PPA, image utilities (jpegtran). This can be done with the VMs that are spawned per job run by the WM CI.

Surely it can. I guess it runs on production and thus we must have the appropriate LaTeX package in apt.wikimedia.org to install it on CI slaves. We can run the job on either of Precise, Trusty or Jessie to match the production setup.

jpegtran is installed by the Debian package libjpeg-progs, so it is a oneline change in the puppet.git module for contint.

Citoid

citoid - requires zotero; jenkins only does jshint checking.; The mail already says travis doesn't help here and that it would be nice to be able to spawn a VM, which is now possible.

It only does JSHint because nobody cared to get test running. zotero seems to have the requirements cassandra does, i.e. identify job depending on it and spawning the service.

Summary

Lets exclude iOS which is its own little ivory tower due to the closed environment (Apple/Mac).

The others will depend on adding the relevant puppet recipes in the contint module so we can get the backend enabled. Ideally we will want the service to be shutdown on start, and figure out a way to dynamically spawn the service on demand. Jan told me systemd is able to listen on a TCP port and spawn the service when it see traffic, that would be rather nice to have.

cscott added a comment.Oct 5 2015, 3:50 PM

I understand that CI resources are scarce, I'm just saying that turning on
npm-travis would in the end require fewer resources than all the things
you're describing here to avoid doing so.

greg added a comment.Oct 5 2015, 4:14 PM

@cscott: You only describe what you see as things that'd be "fixed" with Travis integration while not taking a critical look into what costs we'd incur (stability, maintainability, etc). So your conjecture that "(we) would in the end require fewer resources" is only that, conjecture. For example, the issue for ocg-latex itself is pretty easy to solve.

cscott added a comment.Oct 5 2015, 7:02 PM

It is indeed "pretty easy to solve" -- I just run npm run travis locally before I C+2 any patch. And other projects just use github.

Sorry to be blunt, but it's pretty clear we're at an impasse here. The arguments are getting repetitive. You can cherry pick specific features and say that "it should be easy to solve", but the fact is: they aren't easy, since I wrote npm-travis fully a year ago and the "easy problems" are still the same and haven't been solved since that time.

And I'm not saying that *all* projects should use travis, or that *any* project should use travis "forever". I'm just trying to provide one more arrow in our CI quiver than could be used to bridge some gaps in our CI capabilities, so we don't waste our limited CI effort on corner cases. We could enable it in jenkins and then work hard to ensure that no projects actually have to use it, or use it only for a short period of time. That's what I would prefer to happen.

But it's rather poor form to simultaneously say that we shouldn't need to use it, then not actually have enough CI resources to implement the alternatives. I have provided patches and working code. Projects *already* have working travis.yml files, we're ready to go. I think those arguing the other side should post some patches to buttress their points. @hashar has listed a good set of issues if you want to start working on that.

Or you know, you could just turn on the thing that already works.

greg added a comment.Oct 5 2015, 7:09 PM

It is indeed "pretty easy to solve" -- I just run npm run travis locally before I C+2 any patch. And other projects just use github.

Right, which has a string of dependencies which we can't control.

Sorry to be blunt, but it's pretty clear we're at an impasse here. The arguments are getting repetitive.

Ditto.

or use it only for a short period of time. That's what I would prefer to happen.

I think I can safely say (again) that I agree in principle to this.

But it's rather poor form to simultaneously say that we shouldn't need to use it, then not actually have enough CI resources to implement the alternatives.

I think you are continually confusing one person who doesn't work for WMF Release Engineering with Release Engineering. The examples of when WMF RelEng was involved in the decision making directly with the team (iOS) the outcome was them using Travis.

I have provided patches and working code. Projects *already* have working travis.yml files, we're ready to go. I think those arguing the other side should post some patches to buttress their points. @hashar has listed a good set of issues if you want to start working on that.

Which is what we're doing...

Or you know, you could just turn on the thing that already works.

Only after a reasonable evaluation happens. There are always trade-offs, including, but not limited to: ToU/ToS implications for our community, effects on our ability to safely trust and deploy code from third-parties, and how many hops of responsibility are added.

greg added a comment.Oct 5 2015, 7:11 PM

So, summary: For this RFC to go forward, as any RFC, there should be a breakdown of costs/benefits by the proposer.

cscott added a comment.Oct 5 2015, 7:18 PM

FWIW, a "travis bot" like the existing jenkins-bot could also be run as an unsupported service on labs. My assumption is that the issue isn't with jenkins integration per se, it's the idea of using a third-party test service. But if I'm wrong about that, I'm happy to discuss alternate integration approaches from a technical perspective.

I think you are continually confusing one person who doesn't work for WMF Release Engineering with Release Engineering. The examples of when WMF RelEng was involved in the decision making directly with the team (iOS) the outcome was them using Travis.

It is certainly the case that there is ambiguity about when folks as speaking ex cathedra and when they are just opining. C-0, C-1, versus C-2, if you will.

You asked before why I submitted this as an RFC; this is the reason. We can all hash out all the arguments, but there is an official process for an RFC committee to review the issues and reach a decision. But...

Only after a reasonable evaluation happens. There are always trade-offs, including, but not limited to: ToU/ToS implications for our community, effects on our ability to safely trust and deploy code from third-parties, and how many hops of responsibility are added.

+1 to this. My goal creating this RFC was to participate in this evaluation, and offer some working alternatives for consideration. If there is a better process for this, let me know.

cscott added a comment.Oct 5 2015, 7:40 PM

So, summary: For this RFC to go forward, as any RFC, there should be a breakdown of costs/benefits by the proposer.

{{Citation needed}}

That's not a step I see on https://www.mediawiki.org/wiki/Requests_for_comment/Process

The burden of proof is on both sides in any rational argument; pushing it on the proposer unfairly advantages the status quo ante.

But sure, I'll play:

Costs to adopting npm-travis:

  1. cscott will continue to maintain his npm-travis toy in his copious free time.
  2. When travis goes down, some projects may not be able to review code, or will have to use manual submit.
  3. Jenkins will be able to push to temporary branches prefixed with npm-travis/ in git. One might imagine some scenario where this is used to confuse the user about the most recent version of a software package.

Benefits:

  1. Efforts on supporting corner cases can be scaled back. In particular, no more WMF resources need to be expended on supporting the testing needs of mw-ocg-latexer, citoid, or restbase. (Perhaps benefits to iOS app testing as well.)
  2. Citoid and restbase can move back to WMF hosted git and code-review infrastructure. (Perhaps benefits to iOS as well.)

On the other side, if we don't allow travis for testing, we end up putting more pressure on the Isolation project and adding a bunch of perhaps-otherwise-unnecessary features, just to chase travis' feature set for corner cases. I can't quantify the engineering-hours required for this, but it's not zero.

If there is agreement that a certain need woun't be met by WM CI, like the iOS case, then I don't have anything against using something like npm-travis for that exception only.


  1. Efforts on supporting corner cases can be scaled back. In particular, no more WMF resources need to be expended on supporting the testing needs of mw-ocg-latexer, citoid, or restbase. (Perhaps benefits to iOS app testing as well.)

This is not true, as the effort put into using travis are also "resources" expended. See the above argument about status quo.

The burden of proof is on both sides in any rational argument; pushing it on the proposer unfairly advantages the status quo ante.

(status quo ante instead of status quo? Before what? Before we started to prefer free software? Are you portraying that as ante bellum?)

If we look at how the testing needs of mw-ocg-latexer are met by travis, we see that someone put work into instructing it to run a few apt commands before executing tests. The same work could be done for WM CI. So if one were to argue that because someone already put the work into configuring travis it should be preferred, that would be arguing for the status quo. I agree non-free software shouldn't be given that unfair advantage, lets try for a more rational discussion.


The main cost of allowing npm-travis is that it would make it easier to work on travis instead of WM CI. Which causes the status quo to change towards more travis use, even if those needs could be met by WM CI.

Also a cost of travis is that it costs money, the free offer is usually maxed out once a day. Money that is then supporting more non-free software, so we would be giving money to work against our cause.

Which leads us to a benefit of spending effort on WM CI: it helps our cause.


But it's rather poor form to simultaneously say that we shouldn't need to use it, then not actually have enough CI resources to implement the alternatives.

Patches to WM CI can be sent in by everyone, the ldap/wmf group has permission to change Jenkins jobs. Patches sent in for the WM CI are merged and deployed regularly. In fact the waiting times are much shorter than for mediawiki/core. But still many repositories do not have any CI configured. Some people paid by Wikimedia work on making more repositories use travis, even though the exact job needed for it is available in the WM CI.

I think you are continually confusing one person who doesn't work for WMF Release Engineering with Release Engineering. The examples of when WMF RelEng was involved in the decision making directly with the team (iOS) the outcome was them using Travis.

@greg Is it the job of WMF Release Engineering to configure CI for repositories, not every engineers one? (Like adding and merging linter, entry point, unit test configuration to repos. Like sending in patches to jjb and zuul configuration to enable e.g. the phpunit or tox job on a repo.)

jayvdb added a comment.Oct 8 2015, 4:58 AM

Yes please! Pywikibot would love this capability.

What are you missing for pywikibot?

Lifting of restrictions placed on the project.

The Jenkins code linting checks were deactivated for non-whitelisted users since January. T87169. It looks like that might be resolved soon.

We've been told our test suite mustn't contact live wikis from Jenkins, which means a very large proportion of the test suite isnt run in Jenkins. We have a little more work to do to be able to workaround this restriction by using a dedicated test site on labs: T100802 .

However that will still be a long way short of the Travis testing, which includes running the whole test suite against six Wikimedia production wikis, Wikia, MusicBrainz, and WMF beta and test wikis ( previously also Orain, until they went under), and selected tests targetting specific wikis as necessary to test certain features, and testing our 'wiki detection' logic against ~40 non-WMF wikis (involves only a few API calls for each wiki).

And we also test on Win32 ( https://ci.appveyor.com/project/wikimedia/pywikibot-core/history ), which a gerrit->github branch would trigger for us, but is beyond the scope of this RFC.

jayvdb added a comment.Oct 8 2015, 5:14 AM

Also a cost of travis is that it costs money, the free offer is usually maxed out once a day. Money that is then supporting more non-free software, so we would be giving money to work against our cause.

Is WMF paying Travis-CI so "we" can exceed the 'free' quota allocated to the 'wikimedia' account?

If it is, this RFC is a bit misleading if it doesnt explain those costs.

If not, we only need to manage how to spread the free quota among the projects needing/wanting/choosing to use it.

One way to ease the pressure on the WMF travis costs/quota is for the npm-travis bot to trigger the builds using the developers github accounts. i.e. when I upload to gerrit, the npm-travis bot could recognise that it is uploaded by github's jayvdb, and if I have given npm-travis bot permission to do pushes into my repo, the travis build is run against my travis quota, thus not affecting the WMF quota.

tstarling moved this task from Inbox to Under discussion on the TechCom-RFC board.Oct 14 2015, 9:00 PM
DStrine moved this task from (unused) to Backlog on the TechCom-RFC board.Mar 2 2016, 10:26 PM
RobLa-WMF changed the task status from Open to Stalled.Mar 2 2016, 10:26 PM
RobLa-WMF added a subscriber: RobLa-WMF.

move it to "blocked" per E146. cscott, what's the priority of this?

Change 173045 abandoned by Hashar:
Experimental travis integration.

Reason:
Abandoning again, can be restored if needed later. The discussion on T114421 is basically: no we are not going to rely on Travis.

https://gerrit.wikimedia.org/r/173045

The lack of pre-merge CI happening on the pywikibot repo is causing regular regressions, and was discussed briefly on the pywikibot list : https://lists.wikimedia.org/pipermail/pywikibot/2016-March/009412.html

Currently +2's need to schedule their own Travis builds, which they are not doing because the regressions are happening thick and fast.
If this isnt fixed, either

  • We need to set up a tool that emulates "Travis integration for Jenkins", i.e. automatically polls Gerrit looking any changeset that passes the very limited Jenkins tests, triggering a Travis build for it and then posting a comment back on the Gerrit task.
  • We give up on using Wikimedia SCM/SCR/CI, and use Github as our main repo so that committers have the information they need at hand before approving merges.

Would solving this like said in T100802#1718355 fix the issue for you?

Would solving this like said in T100802#1718355 fix the issue for you?

No. See my previous answer to you earlier in this task.

@jayvdb Well, the npm-travis tool works well, and I've used it for several projects. Since ops doesn't want to do this the easy way, I'd suggest turning off the automatic jenkins V+2 and writing a standalone bot that runs the npm tests and submits V+2 for your project is that way forward.

Currently I lack knowledge of the exact restrictions you spoke of. Is there any record about or related to it?

If the person pronouncing those restrictions doesn't step forward and as you said they were verbally communicated, we could just assume they don't exist and detail how pywikibot is tested and how those tests should be executed in the Wikimedia CI. Starting:

I don't know the pywikibot testsuite nor how it is run. What steps are necessary to run its integration tests against a specific remote wiki?

mmodell added a subscriber: mmodell.Apr 4 2016, 4:17 PM

Change 282323 had a related patch set uploaded (by Hashar):
contint: move npmtravis out of prod slave

https://gerrit.wikimedia.org/r/282323

Currently I lack knowledge of the exact restrictions you spoke of. Is there any record about or related to it?

There are lots of misc. records of it, but it also involved communications with @hashar & @Legoktm on IRC.

http://comments.gmane.org/gmane.comp.python.pywikipediabot.general/14345
http://marc.info/?t=140837699900004&r=1&w=2

As you may recall, until recently it was necessary for a CI team member to add a job to CI, and we didnt obtain approval for running the test suite on CI until we had enhanced the test suite so that the jenkins job could disable the tests that used the network. There are probably some additional records in the tasks from that era, and maybe in gerrits under projects pywikibot-core or CI.

As you will recall from T87169, until recently we even had syntax checking disabled for non-whitelisted users.

Regarding the rest of your comment, I've created T132138: Perform full test suite using Wikimedia CI.

Change 282323 merged by BBlack:
contint: move npmtravis out of prod slave

https://gerrit.wikimedia.org/r/282323

RobLa-WMF mentioned this in Unknown Object (Event).May 4 2016, 7:33 PM
RobLa-WMF triaged this task as Low priority.Jun 8 2016, 7:13 PM

Belated priority update discussed in E187: RFC Meeting: triage meeting (2016-05-25, #wikimedia-office) (see log at P3179).

21:41:50 <robla> T114421
[...]
21:42:00 <robla> also cscott :-)
21:42:35 <cscott> i still use my npm-travis tool, but ops seems violently opposed.
21:42:54 <cscott> containerized test jobs is always right around the corner, and will eliminate any need for travis, i'm told.
21:43:14 <robla> cscott: is that going to be in perpetual "agree to disagree" state, or do you see a way forward?
21:43:56 <cscott> well, i don't have the appetite to fight that particular battle myself. it seems that teams are just going around ops and using travis if they need it.
21:44:22 <cscott> that seems dysfunctional, but it's a dysfunction i'm not in a good place to address. it's not entirely clear that the RFC process is a good way to address it either.
21:44:22 <gwicke> cscott: is *ops* really opposed? I only see releng on the task
21:44:27 <cscott> so i'd say status "stalled"
21:45:02 <robla> priority is "low" for ArchCom, I think....it may even be one that we take ourselves off of
21:45:26 <gwicke> hehe
21:45:26 <cscott> yeah.

hashar closed this task as Declined.Sep 26 2016, 11:02 AM

There is no champion for it and almost all use cases are covered by our current CI setup.

IIRC one of the main reason for using Travis was to get Debian packages installed on the WMF CI slaves. For the various Services, Marko has created a puppet define that is used to define Debian packages needed for development and which get installed on the WMF CI infra. For example:

modules/graphoid/manifests/packages.pp
class graphoid::packages {
    service::packages { 'graphoid':
        pkgs     => ['libcairo2', 'libgif4', 'libjpeg62-turbo', 'libpango1.0-0'],
        dev_pkgs => ['libcairo2-dev', 'libgif-dev', 'libpango1.0-dev',
        'libjpeg62-turbo-dev'],
    }
}

That will get the dev packages installed on the CI box for Graphoid. That was the main complaint iirc.

If some corner cases are still not covered, we can open it up and sprint a solution. I am closing this task again.

Change 173046 abandoned by Krinkle:
Add travis-set-env.sh helper for travis integration.

Reason:
This repo isn't used for much (if anything). Closing out old patches.

https://gerrit.wikimedia.org/r/173046