etherpad.wikimedia.org/etherpad1001 is currently running jessie.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | MoritzMuehlenhoff | T224549 Track remaining jessie systems in production | |||
Resolved | Dzahn | T224580 Migrate etherpad1001 to Buster | |||
Resolved | Dzahn | T243475 vm request for etherpad1002 |
Event Timeline
The following packages are used by the puppet role but so far missing on buster:
- prometheus-etherpad-exporter
- etherpad-lite
Change 566467 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/debs/prometheus-etherpad-exporter@master] Rebuild for Buster T224580
Change 566467 merged by Muehlenhoff:
[operations/debs/prometheus-etherpad-exporter@master] Rebuild for Buster T224580
Mentioned in SAL (#wikimedia-operations) [2020-01-22T08:45:12Z] <moritzm> upload prometheus-etherpad-exporter 0.2 to buster-wikimedia T224580
Change 566517 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/debs/etherpad-lite@master] Rebuild for buster
Change 566517 merged by Alexandros Kosiaris:
[operations/debs/etherpad-lite@master] Rebuild for buster
Mentioned in SAL (#wikimedia-operations) [2020-01-22T14:18:54Z] <akosiaris> upload etherpad-lite_1.7.5-3 to apt.wikimedia.org buster-wikimedia/main T224580
Change 566628 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IP for etherpad1002
Change 566629 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add etherpad-new.wikimedia.org
Change 566631 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: add etherpad1002 to netboot/partman
Change 566634 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add etherpad role to etherpad1002
Change 566635 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: remove etherpad1001
Change 566636 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] trafficserver/cache: add etherpad-new -> etherpad1002
Change 566637 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] trafficserver/cache: switch backend for etherpad to etherpad1002
Change 566638 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove etherpad-new.wikimedia.org
Change 566631 merged by Dzahn:
[operations/puppet@production] install_server: add etherpad1002 to netboot/partman
Change 566629 merged by Dzahn:
[operations/dns@master] add etherpad-new.wikimedia.org
Change 566634 merged by Dzahn:
[operations/puppet@production] site: add etherpad role to etherpad1002
Change 566899 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ssl: update TLS cert for etherpad, added etherpad1002
Change 566899 merged by Dzahn:
[operations/puppet@production] ssl: update TLS cert for etherpad, add etherpad1002, etherpad-new
Change 566906 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] switch discovery record for etherpad from 1001 to 1002
Change 566636 merged by Dzahn:
[operations/puppet@production] trafficserver/cache: add etherpad-new -> etherpad1002
@akosiaris @Muehlenhoff Here it is on buster as "etherpad-new". https://etherpad-new.wikimedia.org/p/aXjrQTK8PD6bjj9TqK4Q
Hm, etherpad stores a lot of data in memory (because ueberDB, see https://github.com/ether/etherpad-lite/issues/2826) before flushing it to the database, it's not exactly written to be scaled out. That is, people using this might end up causing database corruption. So, since this seems to work, it's probably best to finish the migration as soon as possible and kill this public URL.
Change 566638 merged by Alexandros Kosiaris:
[operations/dns@master] remove etherpad-new.wikimedia.org
Mentioned in SAL (#wikimedia-operations) [2020-01-24T09:29:43Z] <akosiaris> disable and mask etherpad-lite on etherpad1002 to avoid corruption issues. T224580
I 've removed the DNS and stopped and masked the service for now on etherpad1002. Since we proved it works, let's just move over to etherpad1002.eqiad.wmnet, stopping beforehand etherpad1001 (to avoid the issues I alluded to). etherpad is anyway best effort, it's ok to even have an extended downtime.
Pad that per logs have been accessed on https://etherpad-new.wikimedia.org
90D1o-quuUNWqCrt0CIV WMCS-2019-06-25 WMCS-2020-01-22 WMCS-2020-02-04 WMCS-2020-02-05 aXjrQTK8PD6bjj9TqK4Q
if those were at the same times accessed on https://etherpad.wikimedia.org as well, what their content will be is undefined.
Change 566637 merged by Alexandros Kosiaris:
[operations/puppet@production] trafficserver/cache: switch backend for etherpad to etherpad1002
Change 566635 merged by Alexandros Kosiaris:
[operations/puppet@production] site: remove etherpad1001
@Dzahn, I 've merged the required remaining changes to get the migration done. Now etherpad.wikimedia.org uses etherpad1002. Checked a couple of pads, it seems everything is fine. Hopefully we have no corruption issues. etherpad1001 is now removed from site.pp and I 've removed the etherpad-lite debian package from it. I 've also -2ed the discovery record changes due to the issue above about the software not supporting scaling out. I guess what's left is to decomission and delete that VM.
Change 566906 abandoned by Dzahn:
switch discovery record for etherpad from 1001 to 1002
Change 567113 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: remove etherpad1001 from DHCP
Change 567113 merged by Dzahn:
[operations/puppet@production] install_server: remove etherpad1001 from DHCP
Change 567115 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove etherpad1001.eqiad.wmnet
cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: etherpad1001.eqiad.wmnet
- etherpad1001.eqiad.wmnet (FAIL)
- Downtimed host on Icinga
- No management interface found (likely a VM)
- Wiped bootloaders
- Shutdown issued. Verify it manually, verification not yet supported
- Set Netbox status on VM not yet supported: manual intervention required
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
ERROR: some step on some host failed, check the bolded items above
Mentioned in SAL (#wikimedia-operations) [2020-01-24T22:21:51Z] <mutante> shutting down etherpad1001 - service fully migrated to etherpad1002 - running decom cookbook on ganeti VM (T224580)
Mentioned in SAL (#wikimedia-operations) [2020-01-24T22:31:44Z] <mutante> ganeti1003 - sudo gnt-instance remove etherpad1001.eqiad.wmnet (T224580)
Change 567115 merged by Dzahn:
[operations/dns@master] remove etherpad1001.eqiad.wmnet
@akosiaris Excellent! Thanks for all that.
I should not have merged the varnish change to actually enable etherpad-new.wm.org though before letting you review. As you say it is best effort though and i'm glad we could keep all the existing pads and hope there is no corruption.
I finished the decom of the VM and removed it from DHCP and DNS after gnt-instance remove.