Tue, Jul 17
Fri, Jul 13
Mon, Jul 9
Fri, Jul 6
Closing as declined as we've removed the redis-based jobqueue.
Thu, Jul 5
Wed, Jul 4
Mon, Jul 2
+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.
Sun, Jul 1
Wed, Jun 27
Reopened as this is still not fixed, see https://wikitech.wikimedia.org/wiki/Incident_documentation/20180626-LoadBalancers
Mon, Jun 25
Jun 20 2018
We also need @greg approval for adding people to deployers.
Jun 19 2018
This seemed to be an issue with the smartarray controller; a simple hard reboot fixed the issue.
So there isn't much I can do right now, the situation recovered; I don't think it's reasonable to keep the old versions caches indeed, but we can manage the situation.
Jun 18 2018
So after some reasoning:
The problem comes from https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/437864/
At a quick glance, neither Mediawiki-generated logs nor syslog generated ones show this issue. I can't find anything relevant in the SAL, but I'll try to dig deeper.
mwdebug2001 now has 8 free gigabytes, but one must wonder how we might need 22 GBs of space used by /srv/mediawiki; I guess we're doing something wrong there.
Done. You should be able to access the corresponding resources.
Specifically, it would be useful to use the permissions of another person in your team as a blueprint ("I need the same level of access as X" would help us specify better which permissions you need).
@MSantos while we wait to understand the specific accesses you need, can you please read and sign the L3 document? So I can proceed to create your user and also to add you to the LDAP group for wmf employees.
See also T196547 where the discussion should probably continue
I created a namespace called ci that you can deploy to using helm as long as you use the kubeconfig /etc/kubernetes/ci-staging.config, which is readable by contint-admins and the user jenkins-slave, so that the pipeline should be able to deploy using helm to that namespace.
@elukey since you did the work of removing the submodule, will you do the honours?
@elukey is this still ongoing? It's opened with priority high.
@herron any news on this? I am assigning the ticket to you as you have an open patch for this.
@Krinkle I prepared a patch to use the auto prepend file on all appservers, not just the canaries. Should we deploy it once the deployment freeze is over?
will merge this change once we're out of the deployment freeze.
Hi @Tarrow you are indeed already part of the WMDE Ldap group, but not the NDA one, which is what you need.
I actually think the best way to handle this is to add a redirection of stderr and stdout to files, and to properly logrotate them.