I like black too but from but from https://black.readthedocs.io/en/stable/installation_and_usage.html it tied to having python 3.6 installed.
Add codfw in the mix as well, no reason to cap this to eqiad. Everything else LGTM
Mon, Dec 17
Fri, Dec 14
Oops, closed this by mistake. Re-opened, feel free to close when the issue is indeed resolved.
Thu, Dec 13
Chart merged and is available at https://releases.wikimedia.org/charts/
The migration uncovered a number of issues in graphoid that make it worthwhile to consider a Code Stewardship request. That is done in T211881, stalling this until it is resolved.
Stalling until T211811 is done
I 've had to deannotate the zotero namespace with commands like the one below
Wed, Dec 12
Tue, Dec 11
I 've slightly amended the patch to remove the now defunct sc-admins group and merged the patch per the SRE meeting's approval. Resolving this
Wed, Dec 5
FWIW we 've had a number of minor outages and alerts resulting in increased latency for results. The corresponding graph can be seen here https://grafana.wikimedia.org/dashboard/db/restbase-external-overview?panelId=17&fullscreen&orgId=1&from=1544017415835&to=1544026700314
Tue, Dec 4
I 'll do so, thanks
Mon, Dec 3
I can reproduce it as well. Received sizes and execution times are not consistent, ranging from a few hundreds of byes to a couple of megabytes and a few secs respectively. This and more importantly the test done above by @fgiunchedi indicate something going awry in the communication between varnish and swift.
Fri, Nov 30
I did just do a quick check on wikimedia-stretch image for this
Thu, Nov 29
I am afraid we can't really change it. It's been at 06:25am (UTC in our case) forever and people expect that. Changing it would break the current expectations of people. Note that this is true for all services and software and it hasn't really caused an issue for a long time. So we should make a better job of surfacing and fixing the issues, not changing the logrotate schedule
Does this mean he have a hard deadline of 2019-04-01 for completing the migrations? Or per the "I can backport security fixes for a while" we have a couple of more months? The current goal is that by July 2019 all scb services, restbase (and probably aqs as well), proton, parsoid will be in kubernetes. That will leave turnilo and aphlict I guess.
Wed, Nov 28
This has already been implemented (albeit not in celery but redis). https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/394022/2/modules/ores/manifests/redis.pp
@Cmjohnson, I think we can proceed with this. I did just try to reimage the server but mgmt is not responding
Tue, Nov 27
@akosiaris for ores2008
ores2001, 2*ganeti, 15*mw
cc @akosiaris to know what specific actions need to be taken for Ores and Ganeti
Box is reimaged and is up and running. megacli seems the controller and the disks
Solved the wrong task. I mean to resolve T196477
Child task resolved, resolving this as well
I 've helped with the debugging. Starting from apertium it was clear something automated was POSTing a lot of requests to it. It turned out they were mostly for the rus|bel langpair but that was a red herring as it was just the snapshot in time I looked at. Moving from apertium to cxserver it became clear something was POSTing to /v2/translate endpoint. The things I noted were mostly about another language pair ca|oc but again that was a snapshot in time. Then a VM IP caught my eye, one that was of wcdo.wcdo.eqiad.wmflabs. I 've jumped into said VM and stopped a process that was clearly heavily hitting the cxserver API
Mon, Nov 26
Should we also merge https://gerrit.wikimedia.org/r/#/c/integration/zuul/+/465324/ and release 2.5.1-wmf6 ? I 'll upload it to apt.wikimedia.org and I would rather do this once.
Fri, Nov 23
Finally deployed to production
Tue, Nov 20
Mon, Nov 19
Nov 17 2018
This has now been deployed to the kubernetes staging cluster.
Nov 16 2018
Upgrade done, resolving
FWIW, I 'll echo @Ladsgroup and @fgiunchedi. Having the data is obviously useful. Representing them in grafana on the other hand it probably not so practical. I also have my doubts as to whether a graph would help identify the culprits of load spikes, mostly due to the nature of the service, but I am be at fault here.
Nov 15 2018
Nov 14 2018
I think we should support multiple tags per image (docker anyway does support that and they cost next to nothing on the registry level AFAIK)
Nov 13 2018
IIRC this is because of the expand_data directive in https://github.com/wikimedia/puppet/blob/production/modules/puppetmaster/files/production.hiera.yaml#L8
Nov 9 2018
Upgrade completed successfully. Also checked with a SELECT * FROM version of the 2 sql statements displayed in https://community.otrs.com/security-advisory-2018-09-security-update-for-otrs-framework/ and no results were returned so no issues there.
@Papaul I 'd say ignore it. That system+disk self/array is scheduled for decomission, to be replaced with backup2001 (T196477). The data in it is a copy of the data from helium so we ain't gonna lose something if more disks fail. There is no point in maintaining. After talking with @MoritzMuehlenhoff on IRC it seems like we can do a fresh reinstall of backup2001/backup1001 next week with the new stretch point release and set up the service on them and then decomission this
Nov 8 2018
For what is worth, the upstream task is https://github.com/celery/celery/issues/3500. Closed WONTFIX apparently.
Nov 7 2018
Nov 6 2018
Nov 5 2018
FWIW, metrics_host: in config-vars.yaml, which is used by scap to build config.yaml, specifically the
This help generate F27065389
Nov 2 2018