Aug 5 2019
Jul 29 2019
merged and applied
Jul 26 2019
@thcipriani you can launch the pipeline again and it should work, however a better fix is to change limits in blubber default values in the chart, 1m is not realistic as a cpu minimum
@thcipriani is granular per namespace, you can submit a CR with changed values anytime, i will bump those values and refer to this phab task so you can see how is done
@greg thanks for following this, i definitely would like to have a retrospective about it, and there are some leftovers like creating phab tasks et al.
Jul 25 2019
Jul 24 2019
Keeping this task opened, but we can mark iteration 1 as completed with the exemption of using envoy for proxying between redis instances. Right now if the redis server goes down registry will go down because healthchecks will fail.
package is done and uploaded long time ago.
as result of this issue, registries in the passive DC (eqiad now) are set in read only mode (they accept pulls but no pushes of new images)
Jul 23 2019
This has been deployed via the DNS artifact previously discused .
the main issue is in notifying changes to the deployment object department, not in helmfile. helmfile is AFAICT working as intended.
Jul 22 2019
Jul 19 2019
@cchen as stated in https://wikitech.wikimedia.org/wiki/Production_shell_access we need your public SSH key, this key shouldn't be the same you use to access gerrit or WMCS.
I did a complete pull of all images and tags of our registry running (results are in the file attached)
Jul 18 2019
fixes also docker-registry.wikimedia.org/releng/composer-test-hhvm:0.2.6-s1 @Nikerabbit
i've uploaded the missing layers from a backup, it works for me now
Jul 17 2019
@Halfak thanks for the patch
as long @RStallman-legalteam comes back with a positive result, the clinic duty person will move this forward (this week i am this person)
it seems that container synchronization is broken and swift container on eqiad doesnt hold the same data that in codfw. swift is eventually consistent so lets wait if the sync does it job over the weekend. If it doesnt get restored the best action plan is can think of right now is:
Jul 16 2019
after rescuing blobs from ms-fe2005 backup it seems to have fixed pulling images. I don't see any errors doing:
base images wikimedia-jessie and wikimedia-stretch and affected production images
lisf of affected images
uploaded a new image today (coredns) and rechecked like @fgiunchedi and it seems to be working \o/ so resolving this issue.
Jul 12 2019
Jul 11 2019
Thanks for the audit @fgiunchedi !
Jul 10 2019
Jul 9 2019
Jul 5 2019
after further testing it seems that in order to use helmfile we need to set up some environment variables i.e HELM_HOME=/etc/helm KUBECONFIG=/etc/kubernetes/zotero-staging.config helmfile diff
Jul 3 2019
pending some documentation for helping people to migrate this is essentially done
Jun 28 2019
Jun 26 2019
Jun 25 2019
Jun 21 2019
+1 to what @Joe said, there are some challenges with that approach because there are go projects and libraries that would require the really latest go version so it could include a prerequisite of package golang itself to be used as a build dependency.
Jun 20 2019
works for me using python 2.7 and docker==3.7.2