Thu, Sep 13
@Legoktm I'm ok delaying this into next quarter, or even the one after that; but I think php 7.2 is indeed a possibility; there are packages that should be easy to backport so if support in MediaWiki is there by next quarter, I'd be happy to work on this :)
Wed, Sep 12
Ok, I think I found the issue:
Tue, Sep 11
So the only config difference is that in codfw we call the mediawiki API via HTTPS, while in eqiad we call it via HTTP and I think there are some subtle differences to how we do it, that might explain why wikitech would fail via https - it's not hosted on the main cluster.
AIUI, the reason why we're not using MySQL (which would probably fit this storage model as well, if not better than cassandra) is just that we don't have libraries and abstractions for accessing MySQL from our nodejs services. Is that correct?
Mon, Sep 10
We're probably not going to get to the stretch goals, but it should be noted that MediaWiki is still not ready to run on PHP 7.2 itself, so we don't really have an alternative and we need to stick to 7.0 for now.
Fri, Sep 7
Thu, Sep 6
The problem is that labswebtest machines are configured to use labstestwiki, and that we didn't configure those to use their local nutcracker, but the global mcrouter, which doesn't make any sense.
Clearly I just forgot to merge a change at the time of the mcrouter rollout, sorry about that.
Thu, Aug 23
Wed, Aug 22
Aug 17 2018
Aug 16 2018
For the record: I removed the file (still on disk at /srv/cassandra-a/commitlog/CommitLog-5-1530620590775.log.bak once I noticed it was all zeroes.
Aug 14 2018
We also need internal requests to be traced, so I would assume we need all services to generate a request Id whenever they receive a request that has none.
Aug 13 2018
There was one case of failure - one of the dumps scripts failed (see T201772). I would call this a success!
Aug 7 2018
Sometimes we get 503 peaks from a cache_misc application like phabricator or gerrit; knowing the origin of the 5xxs in broad categories ("public traffic for the sites" vs "miscellanea") was very useful IMHO; do we have a way to preserve such information?
Aug 3 2018
I have a few comments on this topic. Specifically:
Looking at the modules tagged php on puppetforge:
Aug 2 2018
We ended up generating the dsh lists in production from etcd, which is ok as a solution without asking scap to know about its details. I think we can close this ticket.
Aug 1 2018
Jul 31 2018
@Krenair I think I will just reproduce the patches I did to the mediawiki_test environment in the main one, that looks safer given we already know those patches are ok.
I guess we should add a command line switch to jump between the two behaviours.
Hi, I'm not sure I understand what's the behaviour you would prefer.
Jul 26 2018
Jul 17 2018
Jul 13 2018
Jul 9 2018
Jul 6 2018
Closing as declined as we've removed the redis-based jobqueue.
Jul 5 2018
Jul 4 2018
Jul 2 2018
+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.
Jul 1 2018
Jun 27 2018
Reopened as this is still not fixed, see https://wikitech.wikimedia.org/wiki/Incident_documentation/20180626-LoadBalancers
Jun 25 2018
Jun 20 2018
We also need @greg approval for adding people to deployers.
Jun 19 2018
This seemed to be an issue with the smartarray controller; a simple hard reboot fixed the issue.
So there isn't much I can do right now, the situation recovered; I don't think it's reasonable to keep the old versions caches indeed, but we can manage the situation.
Jun 18 2018
So after some reasoning: