This stopped abruptly 21 days ago:
Tue, Mar 31
Mon, Mar 30
Tentatively assigning to dduvall per IRC discussion
Is option 1 the only option that won't require some additional code that glues the moving parts together? That is, is a merge into master of the origin repo a deployment without any additional glue needed?
Fri, Mar 27
Thu, Mar 26
Wed, Mar 25
We have this switched off linting for the moment, planning on moving these plugins back into scap core Soon™. Lowering priority for the time being.
FWIW, the pausing and the error are two different pieces. That is, the pause is cdb rebuild IIRC. The error is sudo differences in beta vs prod.
Mon, Mar 23
Thank you @Aklapper
Fri, Mar 20
Hi @Soda sorry this took so long :(
Wed, Mar 18
Tue, Mar 17
Mon, Mar 16
Added wmf.23 as a blocker for wmf.24. If wmf.23 isn't fully deployed by the wmf.24 Tuesday train window then we should go ahead and cut the branch for wmf.24, but NOT deploy it to any wikis. This unblocks folks waiting to merge to master until after wmf.24 is cut, but spares us the confusion of 3 branches in production which I definitely want to avoid.
Also the things I see in the changelog that mention "cache" are...many things: https://www.mediawiki.org/wiki/MediaWiki_1.35/wmf.23/Changelog
I'm trying to figure out if this is an actual code change or a memcached server problem with bad timing. Or maybe it's a code change that causes memcached to have a bad time? Does anyone have a theory there?
Fri, Mar 13
FWIW, ran some tests with apachebench today and the patchset on this task and it didn't seem to make requests significantly slower or load significantly higher in my rudimentary testing:
Thu, Mar 12
Updated chart, calling this complete.
This hasn't happened today afaict:
Wed, Mar 11
Updated post team meeting with actual blockers.
Tue, Mar 10
Lowering priority since this hasn't happened once in the hour I've spent investigating according to the access logs.
Looking specifically for upload-pack errors that happened over https in the past 30 days (this is what zuul would be using)
Strangely, I'm not seeing the graphs in gerrit monitoring reporting a rise in error rate (this counts 4xx as well as 5xx so maybe it's lost in the noise). Logs are also not showing a massive increase in the error rate @Lucas_Werkmeister_WMDE did you run into this on one repo in particular? One job in particular?
Sun, Mar 8
Back after restart. I didn't get an alert aside from this task. Filed as T247186