@Dzahn do you know the status of this? can we mark it as resolved?
Version 1.10.0-1~wmf1 has been deployed to deployment-mediawiki-09 and deployment-mediawiki-07. Please let me know if it works as it should so we can proceed with the canaries in prod.
Tue, Nov 19
@ssastr php-fpm will be restarted during scap deployments only when a server's opcache free is below 100MB, I can check the code if there is an exception to that
Fri, Nov 15
@Theklan We suspect it is something on the production side since we have noticed this behaviour in the past. Moreover, this is not the only wiki where templates are changed, but it manages to standout in our investigations, which shouldn't happen. Nevertheless, thanks for letting us know :)
Thu, Nov 14
Currently we have 1.9.0 on releases.w.o and on the servers. Please upload the new version on releases, and I can go ahead with the build.
Wed, Nov 13
I have downtimed some of the alerts, but it will expire in a couple of hours from now
@Dzahn I have left only phab* and scandium, can you take care of them? :)
I am not sure this is related, but we get many alerts of
Tue, Nov 12
@Krinkle Please let me know if the issue on mwdebug1002 persists, as the server has been reimaged :)
Mon, Nov 4
Fri, Nov 1
We really need to figure out what to do with the elastic1025 alert, it has been alerting even more aggressively lately I think
Thu, Oct 31
@Krinkle we are looking into it, tx
Please ping if there are more things to be done for this task:)
Cherry picked and packaged. Please ping when you test it so I can upload it:)
Wed, Oct 30
@Mholloway Can we mark this as resolved?
I couldn't reproduce this slowness (from Europe) right the task was opened, I am marking it as resolved for now, please reopen if you experience it again, thank you!
Tue, Oct 29
Mon, Oct 28
Sun, Oct 27
Looks like it started on Friday ~12:00-13:00 UTC
Fri, Oct 25
I believe this is done:)
Removing HHVM is continuing under T229792
Thu, Oct 24
I will take a look tomorrow, sorry for delaying this
Wed, Oct 23
Frankly, I am still trying to find out, it had some sort of a pattern for a bit, but then it wouldn't. What I have been noticing during the day, is that suddenly our appserver POST requests avg doubles or tiples for a while, and then goes back to "normal". I was just looking for a second opinion on this, I will keep monitoring it. T235872 regards GET requests, so it is another mystery we should solve.
mw1317 will be reimaged, but not yet. We will keep it around (but off production) until someone can have a closer look
ps1-a6-eqiad is shown as down in icinga, I believe that is expected?
Oct 22 2019
It appears we are also having fetch errors, possibly due to timeouts as well mostly on two servers where we have enabled ats-be, cp1075 and cp4027
@hashar I will look into it, sorry for not getting to it sooner
Oct 21 2019
Server will be reimaged, I will ping here when it is done :)
@Joe should we Resolve this?
Oct 20 2019
This is polluting our messages on mw* servers as well:
Oct 18 2019
Oct 17 2019
I will roll out to production tomorrow. The process is to upload the package to our wikimedia-stretch repo, update all servers, and then rolling restart php-fpm.
Oct 16 2019
mw* servers will be reimaged as part of T229792, this is resolved
Since we will be reimaging all mw* servers, those packages will need to be removed actually from snapshot*, deploy*, mwmaint* and labweb*