[EPIC] (WIP) End-to-end tests and deploys
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	greg
	Mar 14 2016, 10:38 PM

Description

This is a Work In Progress proposal that the Release-Engineering-Team team is working through. This is not done and it should not be considered an official plan at this time.

End-to-end tests and deploys

General hope

That we can find a way to leverage the browser (end-to-end) tests to increase confidence in what we are deploying (more than they do now).

Current status

We have some end-to-end tests that run on patch submission thanks to Dan's work on the https://integration.wikimedia.org/ci/job/mwext-mw-selenium/ job. These are generally voting and provide useful/actionable feedback to developers.

But we also have our "daily" browser tests (which is, I believe, a super-set of the ones that run on patch submission). These run against the Beta Cluster and that itself introduces problems (code updating during a test run, for one). They are also impossible to tie directly to a specific problematic commit.

Proposals

Generally, using the browser tests as targetted runs against the group0 wikis after the new branch has been cut, and test failure for core and each extensions during the run will be met with escalation to project owners and potentially, barring resolution of the failure by each owner, some sort of rollback action (details of this depend on how we refactor the branching process).

All proposals have these pre-reqs:

Increase adoption of end-to-end tests pre-merge (voting, mwext-mw-selenium)
- Note: one way of increasing adoption is to increase performance of the test, so adopting the use of smoke-tests that can fail early and prevent the running of more expensive tests could help. - T130037: Implement a smoke + parallel strategy for running end-to-end tests
Keep daily browser tests
- Continue the work on developer self-maintenance of the tests
Setup enough testwikis (iow: expand group0 with more testwikis) to thoroughly test centralauth/sessionamanger/authmanager interwiki behavior.
Get to a place where we can consistently run all e2e tests against group0 without failure
Move group0 deploy to Monday. Keep group1/group2 where they are (Wed/Thurs).
- A sub-proposal here is to move group0 to Thursday the week before. Main downside is it would increase the age of code being deployed to production by about 5 days.

Proposal 1

The "Freeze until Green" proposal. This is the simplest proposal wrt where we are now.

Cut new branch as we do now
Deploy to group0 on Monday
Run all e2e tests
If any test fails, freeze until are all green
1. Fixes are driven by code owners, they can revert or fix if needed
After green, go to group1 and group2

Positive: this puts the onous on the developers to fix this fast. They have 48 hours ish (if we deploy Monday morning to group0 and then have group1 scheduled for Wednesday).

Proposal 2

The long-lived branch version of the above.

Main difference is that we need to figure out a way of not pulling problematic code from the group0 branch to group1. This is hard because of cross repo dependencies (common ones are MobileFrontEnd and Flow, VE and Flow, etc). I've yet to hear a sane proposal here (not that one doesn't exist, just, please do share if you have it!).

Proposal 3

all of the same pre-reqs as above but,
we don't pull new updates from repos who's tests are failing into group0, if they want to even get that far they have to have their basic tests passing.
- This is basically the requirement that we need if we start to allow some projects to be post-merge reviewed. So, really, the requirement is: all your tests passing AND all code is reviewed.
Then we pull in those green repos into group0
Run the end-to-end tests as before
Figure out a way of pulling only those who don't break the end-to-end tests into group1 and 2