Page MenuHomePhabricator

Move proton to use TLS only
Closed, ResolvedPublic

Description

  • Add TLS support to the deployment chart
  • Enable TLS on k8s in production
  • Add Additional LVS endpoint configuration
  • Switch services to use the TLS LVS
  • Remove non-TLS LVS endpoint configuration
  • Remove the non-TLS k8s service
  • Remove proton VMs
  • Remove all proton puppet configuration related to the old, non k8s, infra

Event Timeline

Change 607536 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] proton: Switch restbase production to TLS

https://gerrit.wikimedia.org/r/607536

jcrespo triaged this task as Medium priority.Jul 8 2020, 10:44 AM

Change 607536 abandoned by Alexandros Kosiaris:
[operations/puppet@production] proton: Switch restbase production to TLS

Reason:
Done differently in https://gerrit.wikimedia.org/r/c/operations/puppet/ /610720

https://gerrit.wikimedia.org/r/607536

Change 610789 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] proton: Set LVS level OpenAPI checks on TLS

https://gerrit.wikimedia.org/r/610789

Change 610789 merged by Alexandros Kosiaris:
[operations/puppet@production] proton: Set LVS level OpenAPI checks on TLS

https://gerrit.wikimedia.org/r/610789

Change 610855 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] proton: Amend prometheus-statsd config

https://gerrit.wikimedia.org/r/610855

Change 610855 merged by jenkins-bot:
[operations/deployment-charts@master] proton: Amend prometheus-statsd config

https://gerrit.wikimedia.org/r/610855

Change 627541 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] lvs: Remove proton non-TLS endpoint from LVS 1/2

https://gerrit.wikimedia.org/r/627541

Change 627542 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/puppet@production] lvs: Remove proton non-TLS endpoint from LVS 2/2

https://gerrit.wikimedia.org/r/627542

Change 627857 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] proton-http: stop monitoring the endpoint

https://gerrit.wikimedia.org/r/627857

Change 627858 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] proton: remove non-https endpoint

https://gerrit.wikimedia.org/r/627858

Change 627859 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] proton: remove conftool-data

https://gerrit.wikimedia.org/r/627859

Change 627860 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] proton: remove the ganeti VMs from puppet

https://gerrit.wikimedia.org/r/627860

Change 627861 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] proton: remove all puppet code, other references to the non-k8s service

https://gerrit.wikimedia.org/r/627861

Change 627857 merged by Giuseppe Lavagetto:
[operations/puppet@production] proton-http: stop monitoring the endpoint

https://gerrit.wikimedia.org/r/627857

Change 627541 abandoned by JMeybohm:
[operations/puppet@production] lvs: Remove proton non-TLS endpoint from LVS 1/2

Reason:
Done with https://gerrit.wikimedia.org/r/c/operations/puppet/ /627857

https://gerrit.wikimedia.org/r/627541

Change 627858 merged by JMeybohm:
[operations/puppet@production] proton: remove non-https endpoint

https://gerrit.wikimedia.org/r/627858

Mentioned in SAL (#wikimedia-operations) [2020-09-22T14:09:15Z] <jayme> restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255868 T255877

Mentioned in SAL (#wikimedia-operations) [2020-09-22T14:11:21Z] <jayme> restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255868 T255877

Mentioned in SAL (#wikimedia-operations) [2020-09-22T14:12:01Z] <jayme> running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255868 T255877

Mentioned in SAL (#wikimedia-operations) [2020-09-22T14:12:40Z] <jayme> running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255868 T255877

Change 627542 abandoned by JMeybohm:
[operations/puppet@production] lvs: Remove proton non-TLS endpoint from LVS 2/2

Reason:
done with https://gerrit.wikimedia.org/r/c/operations/puppet/ /627858

https://gerrit.wikimedia.org/r/627542

Change 627859 merged by Alexandros Kosiaris:
[operations/puppet@production] proton: remove conftool-data

https://gerrit.wikimedia.org/r/627859

JMeybohm updated the task description. (Show Details)

Change 627860 merged by Alexandros Kosiaris:
[operations/puppet@production] proton: remove the ganeti VMs from puppet

https://gerrit.wikimedia.org/r/627860

cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: proton1001.eqiad.wmnet

  • proton1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)
    • Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

ERROR: some step on some host failed, check the bolded items above

Change 627861 merged by Alexandros Kosiaris:
[operations/puppet@production] proton: remove all puppet code, other references to the non-k8s service

https://gerrit.wikimedia.org/r/627861

cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: proton1002.eqiad.wmnet

  • proton1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)
    • Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

ERROR: some step on some host failed, check the bolded items above

Change 631398 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Remove proton{1,2}00{1,2}

https://gerrit.wikimedia.org/r/631398

cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: proton2001.codfw.wmnet

  • proton2001.codfw.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
  • COMMON_STEPS (WARN)
    • Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

cookbooks.sre.hosts.decommission executed by akosiaris@cumin1001 for hosts: proton2002.codfw.wmnet

  • proton2002.codfw.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.codfw.wmnet to Netbox
  • COMMON_STEPS (WARN)
    • Not all affected DC(s) have been migrated to automatic DNS, a manual patch to the operations/dns repository is required

Change 631398 merged by Alexandros Kosiaris:
[operations/dns@master] Remove proton{1,2}00{1,2}

https://gerrit.wikimedia.org/r/631398

akosiaris updated the task description. (Show Details)

All old stuff has been removed, I 'll resolve this.