MediaWiki / MediaWiki 1.28
Event Details
20-23 June 2016
Other Deployments:
T136971: MW-1.28.0-wmf.6 deployment blockers T137492: MW-1.28.0-wmf.8 deployment blockers
20-23 June 2016
T136971: MW-1.28.0-wmf.6 deployment blockers T137492: MW-1.28.0-wmf.8 deployment blockers
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Group0 to 1.28.0-wmf.7 | operations/mediawiki-config | master | +5 -5 |
rOMWC Wikimedia - MediaWiki Config | |||
rOMWCa3549bb0e3f6 Revert "all wikis to 1.28.0-wmf.7" |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • mmodell | T136973 MW-1.28.0-wmf.7 deployment blockers | |||
Resolved | aaron | T138550 1.28.0-wmf.7 save time regression | |||
Resolved | Krinkle | T138586 Skin stylesheet no longer unaffected by broken Common.css as of 1.28.0-wmf.7 | |||
Resolved | Krinkle | T138931 JavaScript crashes frequently due to stashEdit calls |
Mentioned in SAL [2016-06-21T12:30:39Z] <hashar> T136973 started cut of branch wmf/1.28.0-wmf.7
Due to conflict with personal duties, I cant conduct the train. Since I was sick yesterday we already had Mukunda as backup for branch cut and Tyler for actual deployment. We agreed I would cut the branch (it is in process) and Tyler confirmed he would be able to handle group0/group1 switches.
Mentioned in SAL [2016-06-21T13:15:09Z] <hashar> T136973 applied all security patches to 1.28.0-wmf.7
Mentioned in SAL [2016-06-21T13:48:51Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 T136973
Mentioned in SAL [2016-06-21T13:53:09Z] <hashar@tin> scap aborted: testwiki to 1.28.0-wmf.7 T136973 (duration: 04m 17s)
Mentioned in SAL [2016-06-21T13:53:45Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 (take two) T136973
Mentioned in SAL [2016-06-21T13:55:20Z] <hashar@tin> scap aborted: testwiki to 1.28.0-wmf.7 (take two) T136973 (duration: 01m 35s)
Mentioned in SAL [2016-06-21T13:55:35Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 (take three) T136973
scap to testwiki fails though:
14:06:23 Started scap: (no message) 14:06:47 Copying to tin.eqiad.wmnet from deployment.eqiad.wmnet 14:06:47 Started rsync common 14:08:43 Finished rsync common (duration: 01m 55s) 14:08:44 Started l10n-update 14:08:44 Updating ExtensionMessages-1.28.0-wmf.6.php 14:08:45 Updating LocalisationCache for 1.28.0-wmf.6 using 4 thread(s) 14:09:20 Generating JSON versions and md5 files 14:09:21 Bootstrapping l10n cache for 1.28.0-wmf.7 14:09:22 Last output: Warning: require_once(/etc/mediawiki/WikitechPrivateSettings.php): failed to open stream: No such file or directory in /srv/mediawiki-staging/wmf-config/wikitech.php on line 183 Fatal error: require_once(): Failed opening required '/etc/mediawiki/WikitechPrivateSettings.php' (include_path='/srv/mediawiki-staging/php-1.28.0-wmf.7:/usr/local/lib/php:/usr/share/php') in /srv/mediawiki-staging/wmf-config/wikitech.php on line 183 14:09:22 Finished l10n-update (duration: 00m 37s) 14:09:22 Unhandled error: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 242, in run exit_status = app.main(app.extra_arguments) File "/usr/lib/python2.7/dist-packages/scap/main.py", line 304, in main return super(Scap, self).main(*extra_args) File "/usr/lib/python2.7/dist-packages/scap/main.py", line 46, in main self._before_cluster_sync() File "/usr/lib/python2.7/dist-packages/scap/main.py", line 326, in _before_cluster_sync version, wikidb, self.verbose, self.config) File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 303, in context_wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 525, in update_localization_cache lang='en', quiet=True) File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 477, in _call_rebuildLocalisationCache 'quiet': '--quiet' if quiet else '' File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 303, in context_wrapper return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 400, in sudo_check_call raise subprocess.CalledProcessError(proc.returncode, cmd) CalledProcessError: Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_87423667" --threads=4 --lang en --quiet' returned non-zero exit status 255 14:09:22 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_87423667" --threads=4 --lang en --quiet' returned non-zero exit status 255 (duration: 02m 58s)
Or rebuildLocalisationCache.php --wiki="labtestwiki" fails due to /etc/mediawiki/WikitechPrivateSettings.php not being on tin..
Wrong testwiki:
- "labtestwiki": "php-1.28.0-wmf.6", + "labtestwiki": "php-1.28.0-wmf.7",
deploying wmf.7 to group2 wikis likely caused a pretty big regression in save timing https://grafana.wikimedia.org/dashboard/db/save-timing back to wmf.7 on group1 only for the time being.
I would blame stashEdit. The rate of api POST went from 22-25k / minutes to 40k.
On https://grafana.wikimedia.org/dashboard/db/api-requests a list let you select the API module to filter on (edit or stashedit) and the graph at the bottom shows the distribution of times per percentiles.
The edit module is barely impacted. The stashedit 75p doubled from ~700 to 1.3 seconds.
My intuition before I actually sleep is that the save-timing board takes in account the stashedit which regressed and I dont think that one as an user effect. API calls to edit show a flat line.
Rollback is probably the safest yeah :-} We would want a new blocking task and figure out who knows about stashEdit.
Current state is:
The stylesheet issue that was discovered overnight is solved (T138586).
Wiki versions:
group0 | 1.28.0-wmf.7 |
group1 | 1.28.0-wmf.7 |
rest | 1.28.0-wmf.6 |
The train is now blocked on the save time regression T138550: 1.28.0-wmf.7 save time regression. We are going to leave it as is over the week-end so people can attempt to figure out the root cause.
If we get a fix available on Monday we will push 1.28.0-wmf.7 on all wikis and then resume the usual train with 1.28.0-wmf.8 cut on Tuesday.
Else, we will most probably freeze the train and postpone the next branch for a week.
RelengTeam is having its weekly meeting on Monday at 4pm UTC and we will definitely talk about this / take a decision.
Mentioned in SAL [2016-06-28T20:09:29Z] <twentyafterfour> deploying https://gerrit.wikimedia.org/r/#/c/296440/ to hopefully unblock wmf.7 deployments. refs T138550, T136973
Mentioned in SAL [2016-06-28T20:09:52Z] <twentyafterfour@tin> Synchronized php-1.28.0-wmf.7/extensions/AbuseFilter/: deploying https://gerrit.wikimedia.org/r/#/c/296440/ refs T138550, T136973 (duration: 02m 06s)
Mentioned in SAL [2016-06-28T20:24:28Z] <twentyafterfour@tin> rebuilt wikiversions.php and synchronized wikiversions files: once again rolling back to wmf.6 refs T136973 T138550
Mentioned in SAL [2016-06-28T21:24:51Z] <twentyafterfour> deploying wmf.7 yet again, once CI finishes testing https://gerrit.wikimedia.org/r/#/c/296464/ refs T138550 T136973
Mentioned in SAL [2016-06-28T21:31:47Z] <twentyafterfour@tin> Synchronized php-1.28.0-wmf.7/extensions/AbuseFilter/: deploy https://gerrit.wikimedia.org/r/#/c/296464/ refs T138550 T136973 (duration: 00m 36s)
Change 295339 abandoned by Jforrester:
Group0 to 1.28.0-wmf.7
Reason:
Didn't get used.