Sun, May 20
ChronologyProtector uses MySQLMasterPos, which can work both with a GTID-based master position or with the old binlog-based master position.
Sat, May 19
I will add to this list any other development policy I find on wikitech (and check them) as it could be interesting to have a single entry-point for our development policies.
To explain my reasoning further: mcrouter needs a non-negligible amount of memory to run as it maintains an internal queue of messages whenever you use something like AllFastRoute or any other route handler that does distribution of keys. This means it can use a significant amount of memory from time to time, and I'd prefer to avoid having any process using a variable amount of memory on the memcached nodes.
Please see https://phabricator.wikimedia.org/T192771 which has a lot of considerations about the mcrouter architecture in production.
Wed, May 16
The X-Powered-By part is actually useful for us in order to discern the source of rendering of a page - be it hhvm or php.
Tue, May 15
Things to watch out for:
I would suggest we do NOT disable/depool anything but the obvious outlier in the databases (we already know that timeouts on the databases would cause a serious outage, because of bugs in MediaWiki).
An empty string should do the trick; or (better) you could convert that whole thing to use systemd::service instead, as proposed in the TODO.
Mon, May 14
The following servers:
Mon, May 7
After some tests:
Sun, May 6
The decision to migrate the WMF production back to PHP 7.x is long taken and is not something we'd have done by choice: the HHVM platform after the 3.24 release doesn't guarantee full PHP compatibility, with differences that will make it very hard for anyone to write code that works on both.
Wed, May 2
Is anyone working on this? If not, I guess this should be expedited to enable us to test running the maintenance scripts on php 7 in production as well, as hhvm is dog slow at running cli scripts and I see this as a priority.
The compliler has little to do with @EddieGP's request, which seems sensible, and has to do with the jenkins permissions. I am not even a jenkins administrator anymore - this ticket must be handled by the Release Engineering team.
Mon, Apr 30
Dumps are already partially running on php 7 and have been thoroughly tested in the past months, so I'd leave them out of the equation.
After some consideration, I see three options moving forward:
Thu, Apr 26
The queue is getting back to normal sizes, and the job production is almost what I'd expect. Unfreezing writes solved the issue; I'll restore the correct state of redis replication once the queue has shrank enough.
Tue, Apr 24
The rolling restart of all memcacheds is done. This ticket might be considered resolved.
Mon, Apr 23
Apr 19 2018
I would strongly suggest that any system that wants to archive geoip data from maxmind should create its own repository of data and NOT use puppet for it in any ways.
We've had 3 mdadm checkarray full runs since we merged the change in february, and no alert went off in the meantime. I would be inclined to consider this successful.
Apr 18 2018
Just to clarify, this is a bare support for kubernetes. In theory, it would be nice to gather all information about services we have to configure from kubernetes from the kubernetes API itself. That would mean to be able to dynamically define new pools, a feat now only done at startup. I think that should be left for a later time, as it would require deeper changes to pybal's architecture.
Apr 17 2018
I performed various functional tests in deployment-prep.
Apr 16 2018
@Niedzielski interstingly, When requiring the /summary/precambrian page, I see a successful request to the API cluster, so the error is not a 503 on the part of MediaWiki, but rather some error in the api query or in the data.
All the main servers have been substituted with stretch VMs; the only one remaining turned on is deplyment-mediawiki06 which was used for some audit. I'll resolve the task now, and permanently delete the old instances during this week.
Enwiki finished it run at 14:40 UTC on saturday april 14th.
Apr 13 2018
Apr 12 2018
Apr 11 2018
What are the blockers for the use of PHP7?
Apr 10 2018
I built and uploaded two packages for 0.37.0, both in jessie and stretch. I will try to document the build process and automate it as much as possible.
There is no way in puppet 4.x to do it better as far as @ema and I determined when we looked into it. So that notify is there for a good reason. It's a hack, it would be nice to be able to remove it, but I don't think this is a low-hanging fruit by any means.
See my comments - I like the general idea but I think the behaviour of the code should be changed.
Also a note on beta not running php7: when we migrated to HHVM it was made very clear to me and to Ori that we could not use beta for testing the migration; so I'm assuming we'll have to do the same this time as well and do the tests in production.
Apr 9 2018
Since I've noticed a 45% speed increase when running the updateCollation.php script with php 7.0 versus HHVM, I'm temporarily setting up mw1338 to run the scripts; I will stop the videoscaler and puppet there for the time being.
Please note that since we can't run on more than one wiki in parallel per section, most wikis will complete their work within one hour, but they'll start after a long queue has completed. So expect the migration to be over in ~ 2 days on s2 and ~ 1.5 days on the other sections, with the notable exception of s1 (enwiki) - see https://phabricator.wikimedia.org/T189295#4115676 for a rough estimation of the total number of rows per section.
Total number of rows to sort through per shard: