Page MenuHomePhabricator

Provide a reliable test environment that mimics production for running integration tests
Open, MediumPublic


Popups has browser tests. They run in mwext-mw-selenium-jessie e.g.

These were passing despite the fact the code we had written was not compatible with jQuery 2. It seems the environment they run in enables $wgUsejQueryThree (per
When disabling this, interestingly, despite failing locally it can be seen that our browser tests still pass - so the enablement must be happening somewhere else.

Given the beta cluster also enables $wgUsejQueryThree and we do locally, we failed to notice we'd introduced code that would cause an UBN (T174724) until the train had finished rolling out ( a Friday).

This seems like a big lesson learned for us and a reminder that the beta cluster or some other environment is expected to reliably mimic the code we run on production for the benefit of running these integration tests....

How can we prevent this from happening again?
Why are the browser tests passing when they should not be? Is our job not working correctly?

Event Timeline

Change 375377 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[integration/config@master] Add a Jenkins job for Popups browser tests

(pulling in to add preventative measures on the back of T174724)

This seems like a big lesson learned for us and a reminder that the beta cluster or some other environment is expected to reliably mimic the code we run on production for the benefit of running these integration tests....

To me it sounds like the beta cluster is doing what its supposed to do. $wgUsejQueryThree is already enabled on some production wikis, and I think @Krinkle is close to doing a wider rollout. If Popups is incompatible with jQuery 3, then it's already broken on group0 wikis, plwiki, svwiki, and all nl wikis, *in production*.

Ignore what I said, I read the linked task backwards.

Thanks for creating this task. From my perspective, I see this as a general question of how we want to do quality assurance and verification for weekly deployments, SWAT, and how (if) Beta Cluster fits into that picture.

First and foremost, we need to embrace that there are many different factors to this. I don't think it's realistic or appropiate for any "one" environment to be a "truly" realistic testing environment for production. Whether it's "stock mediawiki" (as used in most Jenkins jobs), "MediaWiki-Vagrant", "Beta cluster", or even "" there are bound to be differences.


Logically, the only way to reliably test changes in an environment that is not "production", is if the environment you're testing is what will "literally" become production. In other words: Containers. Well, not necessarily, but at least some concept in which we have stable images that capture a fixed snapshot all the entire MediaWiki deployment (MediaWiki core, extensions, skins, wmf-config). An image of sorts that is addressable by HTTP with different hostnames, and then movable to production through a promotion process.

This could be done without containers, and would be useful even if the "image" is limited to "just" the MediaWiki deployment. E.g. /srv/mediawiki, without PHP, HHVM, or Apache. Of course, having everything included would be nicer (from a testing perspective), but the immediate problem is reliable testing of MediaWiki deployment code changes. HHVM, PHP-config, Apache, Apache-config, etc. are all managed by Puppet for now, which might be hard to migrate.

The on-going effort towards the use of Docker images and Kubernetes will hopefully reach MediaWiki at some point, but that's orthogonal.

  • Stock MediaWiki (as used in Jenkins jobs for unit testing):
    • Default settings. No farm (one wiki). No wmf-config.
  • MediaWiki-Vagrant:
    • Default settings (mostly)
    • Has multi-wiki farm (but maintained separately, with no production-like wikis or settings).
    • Some wmf extensions, with some wmf-like settings (but no "wmf" role to enabled them all, and couldn't unless one first decides which wiki to mimmick)
  • Beta Cluster
    • Production settings (with overrides)
    • Production wiki farm (note: only a subset of wikis)
    • Production extension list (with overrides)
    • Production settings (with overrides, mostly like enwiki)
    • Production wiki farm (note: only one wiki, mostly like enwiki)
    • Production extension list (with overrides, mostly like enwiki)

Beta Cluster is pretty close to a reliable test environment. There are some infrastructure and backend service differences (T87220), but those are being worked on. The main problem is the overrides. Specifically, how we satisfy conflicting testing needs:

  1. Feature A wants to gradually roll itself out. First to all Beta Cluster wikis, then to production. This makes sense as you'd want to know how it acts on enwiki-like configuration in Beta before trying in prod.
  2. Feature B wants to deploy individual changes to all wikis (mw code, or puppet), but first test them on the Beta Cluster before going to production.

Use case 2 typically takes a few hours at most. Use case 1, however, can take weeks. Given we usually have at least half a dozen or type 1 changes on-going, this means testing changes of type 2 is hard given the differences from the prod wikis.


Ideally, we'd have a way to separate long-term beta overrides from testing changes that are happening "now". For use case 2, you'd want a way to view Beta without any temporary overrides. Basically sync back to the prod equivalent of any given wiki.

A few ideas come to mind:

  1. Create an Staging Cluster (or "Alpha Cluster"). Separate cluster from Beta. This cluster would be like Beta Cluster without overrides. Instead, it would be a place to quickly test changes to mediawiki, mediawiki-config, or puppet, before applying to production. (Would also run near-prod wmf branches?).
  2. Instantiable Beta Cluster. (Originally pitched by @Ryan_Lane?) Beta Cluster could be parametised sufficiently that the strings "beta" and "" only appear in one place in Puppet and mediawiki-config. People would be able to spin up a group of VMs like Beta Cluster to test changes one-off before going to production. The current "Beta Cluster" would become a persistent version of this for long-term testing (and would run master-branch instead of wmf-branch). This would be ideal and worth considering regardless of what we decide for this task because it makes it accessible to everyone, would work locally (outside Wikimedia Cloud), and allows for Puppet-level and cluster-level experimentation.
  3. A simplified version of Staging Cluster could potentially be implemented as a simple run-time variation of Beta Cluster. E.g. the same way we group together overrides for specific wikis, and apply them based on host name, we could group together all temporary overrides and skip them if viewing through "" instead of "".
  4. Have "Test wikis" in Beta Cluster. Alternatively, instead of creating a different Beta, we could turn Beta into this and change the way we do long-term testing in Beta. E.g. instead of applying long-term testing of config changes and new extensions to all Beta Cluster wikis at once, apply them only to a test wiki in Beta Cluster (e.g. The policy would be to only apply it to all wikis (or any non-test wiki) in Beta if you're minutes/hours away from doing the same in production (and expire or rollback when deciding otherwise).

In the long term, this might all be obsolete with the "image promotion" idea. E.g. something like en.wikipedia.appserver-4e18f56.staging.wmnet. But that could take a while. Something simple (like #3 or #4 above) might be worth considering for the mean time.

Thanks @Krinkle for laying out all that background for everyone. I don't have any real edits to that :)

I agree with both parts of your conclusion: any work here (on the current Beta Cluster infrastructure) will most likely be obsoleted by the work on "streamlined service delivery"/"deployment pipeline" work (naming is hard) AND that #3 or #4 are relatively easy next step options. Given the prior we should scope and estimate any solutions before beginning.

#3 has been going around in my head, and during the RelEng team planning/'what's the future look like?' meetings, for a while. Adding to what you've said above there is also the possibility of using multiversion in Beta Cluster and thus provide even more variants if that makes sense:

prod settingsbeta/long running/use case 1 settings
masterintregration - What @Jdlrobson proposed?volatile - aka: current Beta Cluster
current_wmfpre-prod - most like prodtransitions - (name help!) an environment to track in-progress transition work

I add this complexity to illustrate one option of a code/test/deploy workflow:

  1. unit tests/per patchset in gerrit: Stock mediawiki env ala current jenkins unit tests
  2. "continuously" run integration tests against integration
    • "continuously" interval = however long it takes to do the update + how long it takes to run unit+@integration tests
  3. use pre-prod as a pre-train sanity check (iow: update current_wmf right before we deploy to group0 on Tuesday)

But that ^ whole thing is probably too far out of scope given the current work we're doing for SSD/pipeline. :/

Change 381212 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[mediawiki/extensions/Popups@master] Run Selenium Ruby tests daily targeting Beta cluster

Change 375377 abandoned by Zfilipin:
Add Jenkins job for browser tests

Implemented as two separate commits.