Reimage thorium to Debian Stretch. It will require careful planning since most of our websites will be unavailable.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | elukey | T192642 Upgrade Analytics infrastructure to Debian Stretch | |||
Resolved | Ottomata | T192641 Reimage thorium to Debian Stretch | |||
Resolved | Ottomata | T202011 Move internal sites hosted on thorium to ganeti instance(s) | |||
Resolved | Dzahn | T202013 eqiad: (3) VM %request for internal analytics web sites | |||
Resolved | akosiaris | T202559 Allow ganeti instance inside of the Analytics VLAN; move analytics-tool* to it and change IPs. |
Event Timeline
Thinking out loud :)
During the last offsite we were wondering if our websites could have been served by more than one host, in order to be tolerant incase of failures. All the things on thorium as far as I can remember are stateless, so we could think about:
- replacing it with two ganeti instances not running in the Analytics VLAN, using a lvs endpoint in front of them (that will be called by Varnish).
- repurposing thorium for other needs, like analytics1003's standby db, etc..
If these ideas are too crazy I'll shut up :)
Hmm, good idea in general! The only issue is:
https://analytics.wikimedia.org/datasets/ and also wikistats 1.0. Both need
a lot of space.
Good points.. In theory wikistats 1.0 should go away soon right? The datasets are indeed a problem, I'll try to think about a solution :)
I don't think wikistats 1.0 will ever go away, will it? Erik might stop updating it, but I think it will stay online forever.
Luca and I just discussed, and decided that we should upgrade thorium to stretch anyway, and then later think about moving sites elsewhere.
Change 458174 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Temporarily removing thorium from netboot.cfg
Change 458174 merged by Ottomata:
[operations/puppet@production] Temporarily removing thorium from netboot.cfg
Mentioned in SAL (#wikimedia-analytics) [2018-09-05T13:40:28Z] <ottomata> reimaging thorium to debian stretch (this will cause an announced {stats,analytics}.wm.org downtime!) - T192641
Mentioned in SAL (#wikimedia-operations) [2018-09-05T13:40:35Z] <ottomata> reimaging thorium to debian stretch (this will cause an announced {stats,analytics}.http://wm.org/ downtime!) - T192641
Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts:
thorium.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/201809051340_otto_10146_thorium_eqiad_wmnet.log.
Change 458176 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use stretch for thorium
Change 458176 merged by Ottomata:
[operations/puppet@production] Use stretch for thorium
Completed auto-reimage of hosts:
['thorium.eqiad.wmnet']
Of which those FAILED:
['thorium.eqiad.wmnet']
Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts:
thorium.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/201809051351_otto_12066_thorium_eqiad_wmnet.log.
Completed auto-reimage of hosts:
['thorium.eqiad.wmnet']
Of which those FAILED:
['thorium.eqiad.wmnet']
Script wmf-auto-reimage was launched by otto on neodymium.eqiad.wmnet for hosts:
thorium.eqiad.wmnet
The log can be found in /var/log/wmf-auto-reimage/201809051352_otto_12248_thorium_eqiad_wmnet.log.
There is cronspam from: Cron <root@stat1006> /usr/local/bin/published-datasets-sync -q
rsync: stat "/published-datasets-rsynced/stat1006/archive/public-datasets/all/cross_wiki/.editor_month.tsv.gz.gkVHiO" (in srv) failed: No such file or directory (2)
and earlier it included:
rsync: failed to connect to thorium.eqiad.wmnet (10.64.53.26): No route to host (113)
That's made me search for thorium in Phab and come here.
Cron <root@thorium> /usr/local/bin/hardsync -t /srv /srv/published-datasets-rsynced/* /srv/analytics.wikimedia.org/datasets 2>&1 > /dev/null Inbox x Cron Daemon root@thorium.eqiad.wmnet via wikimedia.org 6:30 PM (1 hour ago) to root cp: cannot stat '/srv/published-datasets-rsynced/stat1006/periodic/reports/metrics/page-creation/pagecreations_main_bots/.mrwiktionary.tsv.b028qZ': No such file or directory