Sun, Jan 19
Fri, Jan 17
@Krinkle I backported https://github.com/krakjoe/apcu/pull/384/commits/c8849b0894712d0ede63552fb99ca8dc4d3d884f and packaged it in php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2. You can find the package in your home directory on deployment-mediawiki-07 and install it. If we are happy with it, we can proceed with rolling it out to the canaries and then production.
Thu, Jan 16
Yeah, we need at least a total of 4 api and 4 app canary servers in codfw. In eqiad our canary app (5) and api (4) servers are in the same rack actually, we can spread them a bit when we install the new servers
Wed, Jan 15
@thcipriani as per our discussion, we can consider merging and testing first for syncing files and then on the train. How does that sound?
Tue, Jan 14
@Krinkle, I am fine to merge any time, as soon as you have tested it:)
@Jclark-ctr Can you provide a date that is convenient for you for racking these? Thank you!
Mon, Jan 13
prometheus-trafficserver-tls-exporter.service initially failed to start on both cp3065 and cp3061 after reboot
Fri, Jan 10
Thu, Jan 9
Hey @Jclark-ctr ! Do we have an update about when we will have those servers ready? Thank you!
Tue, Jan 7
There is was bug in the monitoring script, I will reopen in case it misbehaves again.
Mon, Jan 6
Sun, Jan 5
Given we have installed redis servers in stretch, I think we can mark this as resolved
Dec 20 2019
Dec 19 2019
Dec 18 2019
@Jclark-ctr if you feel this will work better, we are happy. Either way, this racking is still better than the original one (30 servers in D 5). Thank you!
Dec 16 2019
[06:58] <_joe_> !log clearing apcu across multiple api servers to allow metrics to be collected again (task coming soon)
It appears that processing backlog started increasing on 11 Dec between 21:15-21:30 UTC
Dec 13 2019
Dec 12 2019
I think we can mark this as resolved, our solution seems to be working. If something breaks, we will open a new task or reopen this one. Thank you!
First of all, we found that this fatal error is present on mwdebug1001, not only mwdebug1002 like we have been assuming in this task.
Dec 11 2019
Dec 9 2019
@Pcoombe Yes, they had visited Canada about a month ago, that explains it:)
Dec 7 2019
Dec 6 2019
@bd808 is this task still valid?
Dec 5 2019
@brion thank you! You can mark this as resolved if there is nothing else to be done
That would be lovely, thank you!
@wiki_willy given we are responsible for this delay, we would like to check with your team and tell us what is the earliest we can do, and we can in turn can plan around that. Sorry for the delays on our end. I would like to avoid requesting a date that is unrealistic, given that xmas is round the corner. Thank you!
@Jclark-ctr Do we have an estimate when we are able to have those servers racked?
@brion It looks like all our videoscalers were lacking VP9 codec support. What do you think we should do ?
Dec 4 2019
@bd808 Thank you very much for the ping, I can deal with destroying the jessie instances next week, would that work?
@Krinkle I am afraid we are still seeing those errors in https://logstash.wikimedia.org/goto/dd9770ce7a81dc3c7614e03cf7b9de15
Dec 2 2019
@elukey Hey luca, I think I will need one too :) Thank you very much