Status	Assigned	Task
Resolved	• mmodell	T136973 MW-1.28.0-wmf.7 deployment blockers
Resolved	aaron	T138550 1.28.0-wmf.7 save time regression
Resolved	Krinkle	T138586 Skin stylesheet no longer unaffected by broken Common.css as of 1.28.0-wmf.7
Resolved	Krinkle	T138931 JavaScript crashes frequently due to stashEdit calls

Event Timeline

greg created this task.Jun 3 2016, 7:08 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 3 2016, 7:08 PM

greg mentioned this in T136971: MW-1.28.0-wmf.6 deployment blockers.Jun 3 2016, 7:08 PM

greg triaged this task as Medium priority.Jun 3 2016, 7:15 PM

greg added a project: Release-Engineering-Team (Deployment-Blockers).

greg added a project: Release.Jun 3 2016, 7:16 PM

Per Releng meeting, I am conducting that one.

greg mentioned this in T137492: MW-1.28.0-wmf.8 deployment blockers.Jun 9 2016, 8:53 PM

greg updated the task description. (Show Details)

Mentioned in SAL [2016-06-21T12:30:39Z] <hashar> T136973 started cut of branch wmf/1.28.0-wmf.7

Due to conflict with personal duties, I cant conduct the train. Since I was sick yesterday we already had Mukunda as backup for branch cut and Tyler for actual deployment. We agreed I would cut the branch (it is in process) and Tyler confirmed he would be able to handle group0/group1 switches.

Mentioned in SAL [2016-06-21T13:15:09Z] <hashar> T136973 applied all security patches to 1.28.0-wmf.7

Change 295339 had a related patch set uploaded (by Hashar):
Group0 to 1.28.0-wmf.7

https://gerrit.wikimedia.org/r/295339

gerritbot added a project: Patch-For-Review.Jun 21 2016, 1:16 PM

Mentioned in SAL [2016-06-21T13:48:51Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 T136973

Mentioned in SAL [2016-06-21T13:53:09Z] <hashar@tin> scap aborted: testwiki to 1.28.0-wmf.7 T136973 (duration: 04m 17s)

Mentioned in SAL [2016-06-21T13:53:45Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 (take two) T136973

Mentioned in SAL [2016-06-21T13:55:20Z] <hashar@tin> scap aborted: testwiki to 1.28.0-wmf.7 (take two) T136973 (duration: 01m 35s)

Mentioned in SAL [2016-06-21T13:55:35Z] <hashar@tin> Started scap: testwiki to 1.28.0-wmf.7 (take three) T136973

scap to testwiki fails though:

14:06:23 Started scap: (no message)
14:06:47 Copying to tin.eqiad.wmnet from deployment.eqiad.wmnet
14:06:47 Started rsync common
14:08:43 Finished rsync common (duration: 01m 55s)
14:08:44 Started l10n-update
14:08:44 Updating ExtensionMessages-1.28.0-wmf.6.php
14:08:45 Updating LocalisationCache for 1.28.0-wmf.6 using 4 thread(s)
14:09:20 Generating JSON versions and md5 files
14:09:21 Bootstrapping l10n cache for 1.28.0-wmf.7
14:09:22 Last output:
Warning: require_once(/etc/mediawiki/WikitechPrivateSettings.php): failed to open stream: No such file or directory in /srv/mediawiki-staging/wmf-config/wikitech.php on line 183
Fatal error: require_once(): Failed opening required '/etc/mediawiki/WikitechPrivateSettings.php' (include_path='/srv/mediawiki-staging/php-1.28.0-wmf.7:/usr/local/lib/php:/usr/share/php') in /srv/mediawiki-staging/wmf-config/wikitech.php on line 183
14:09:22 Finished l10n-update (duration: 00m 37s)
14:09:22 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 242, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 304, in main
    return super(Scap, self).main(*extra_args)
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 46, in main
    self._before_cluster_sync()
  File "/usr/lib/python2.7/dist-packages/scap/main.py", line 326, in _before_cluster_sync
    version, wikidb, self.verbose, self.config)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 303, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 525, in update_localization_cache
    lang='en', quiet=True)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 477, in _call_rebuildLocalisationCache
    'quiet': '--quiet' if quiet else ''
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 303, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 400, in sudo_check_call
    raise subprocess.CalledProcessError(proc.returncode, cmd)
CalledProcessError: Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_87423667" --threads=4 --lang en  --quiet' returned non-zero exit status 255
14:09:22 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_87423667" --threads=4 --lang en  --quiet' returned non-zero exit status 255 (duration: 02m 58s)

Or rebuildLocalisationCache.php --wiki="labtestwiki" fails due to /etc/mediawiki/WikitechPrivateSettings.php not being on tin..

Wrong testwiki:

-    "labtestwiki": "php-1.28.0-wmf.6",
+    "labtestwiki": "php-1.28.0-wmf.7",

deploying wmf.7 to group2 wikis likely caused a pretty big regression in save timing https://grafana.wikimedia.org/dashboard/db/save-timing back to wmf.7 on group1 only for the time being.

I would blame stashEdit. The rate of api POST went from 22-25k / minutes to 40k.

On https://grafana.wikimedia.org/dashboard/db/api-requests a list let you select the API module to filter on (edit or stashedit) and the graph at the bottom shows the distribution of times per percentiles.

The edit module is barely impacted. The stashedit 75p doubled from ~700 to 1.3 seconds.

My intuition before I actually sleep is that the save-timing board takes in account the stashedit which regressed and I dont think that one as an user effect. API calls to edit show a flat line.

Rollback is probably the safest yeah :-} We would want a new blocking task and figure out who knows about stashEdit.

thcipriani added a subtask: T138550: 1.28.0-wmf.7 save time regression.Jun 23 2016, 10:24 PM

thcipriani mentioned this in T138550: 1.28.0-wmf.7 save time regression.Jun 23 2016, 10:26 PM

Peachey88 added subtasks: T138579: nl.wikisource CSS loading is broken, T138537: checkuserwiki doesn't have any skin.Jun 24 2016, 11:49 AM

wmf.7 needs to be fully rolled back due to the CSS/skin loading issues.

hashar added a subtask: T138578: [Regression] en.wikipedia.org Mobile main page no longer being transformed.Jun 24 2016, 12:10 PM

dr0ptp4kt mentioned this in T138578: [Regression] en.wikipedia.org Mobile main page no longer being transformed.Jun 24 2016, 12:16 PM

dr0ptp4kt added a subtask: T138585: Spike of "unknown" errors experienced by users of UploadWizard after wmf.7 deployment.Jun 24 2016, 12:27 PM

• Mholloway subscribed.Jun 24 2016, 12:29 PM

Moved bunch of apparently related blockers to a new task T138586

matmarex added a subtask: T138585: Spike of "unknown" errors experienced by users of UploadWizard after wmf.7 deployment.Jun 24 2016, 1:29 PM

Krinkle closed subtask T138586: Skin stylesheet no longer unaffected by broken Common.css as of 1.28.0-wmf.7 as Resolved.Jun 24 2016, 2:05 PM

Current state is:

The stylesheet issue that was discovered overnight is solved (T138586).

Wiki versions:

group0	1.28.0-wmf.7
group1	1.28.0-wmf.7
rest	1.28.0-wmf.6

The train is now blocked on the save time regression T138550: 1.28.0-wmf.7 save time regression. We are going to leave it as is over the week-end so people can attempt to figure out the root cause.

If we get a fix available on Monday we will push 1.28.0-wmf.7 on all wikis and then resume the usual train with 1.28.0-wmf.8 cut on Tuesday.

Else, we will most probably freeze the train and postpone the next branch for a week.

RelengTeam is having its weekly meeting on Monday at 4pm UTC and we will definitely talk about this / take a decision.

I have posted the above status update to both wikitech-l and engineering lists.

Jdforrester-WMF subscribed.Jun 24 2016, 4:26 PM

• mmodell claimed this task.Jun 27 2016, 4:17 PM

• mmodell added a subscriber: thcipriani.

greg removed a subtask: T138585: Spike of "unknown" errors experienced by users of UploadWizard after wmf.7 deployment.Jun 27 2016, 10:31 PM

Mentioned in SAL [2016-06-28T20:09:29Z] <twentyafterfour> deploying https://gerrit.wikimedia.org/r/#/c/296440/ to hopefully unblock wmf.7 deployments. refs T138550, T136973

Mentioned in SAL [2016-06-28T20:09:52Z] <twentyafterfour@tin> Synchronized php-1.28.0-wmf.7/extensions/AbuseFilter/: deploying https://gerrit.wikimedia.org/r/#/c/296440/ refs T138550, T136973 (duration: 02m 06s)

• mmodell added a commit: rOMWCa3549bb0e3f6: Revert "all wikis to 1.28.0-wmf.7".Jun 28 2016, 8:21 PM

Mentioned in SAL [2016-06-28T20:24:28Z] <twentyafterfour@tin> rebuilt wikiversions.php and synchronized wikiversions files: once again rolling back to wmf.6 refs T136973 T138550

Mentioned in SAL [2016-06-28T21:24:51Z] <twentyafterfour> deploying wmf.7 yet again, once CI finishes testing https://gerrit.wikimedia.org/r/#/c/296464/ refs T138550 T136973

Mentioned in SAL [2016-06-28T21:31:47Z] <twentyafterfour@tin> Synchronized php-1.28.0-wmf.7/extensions/AbuseFilter/: deploy https://gerrit.wikimedia.org/r/#/c/296464/ refs T138550 T136973 (duration: 00m 36s)