Page MenuHomePhabricator

Migrate planet servers to bullseye or bookworm
Closed, ResolvedPublic

Event Timeline

Change 964176 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] WIP: planet: Update for rawdog v3 on bookworm

https://gerrit.wikimedia.org/r/964176

LSobanski triaged this task as Medium priority.Oct 9 2023, 6:06 AM
LSobanski raised the priority of this task from Medium to High.
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

Change 964176 merged by Dzahn:

[operations/puppet@production] planet: Update for rawdog v3 on bookworm

https://gerrit.wikimedia.org/r/964176

Puppet code started by Legoktm has been amended and merged.

Debian package has been built and added to WMF apt repo.

T281219#9346444

Change 976854 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add planet[12]003 to role

https://gerrit.wikimedia.org/r/976854

Change 976854 merged by Dzahn:

[operations/puppet@production] site: add planet[12]003 to planet role

https://gerrit.wikimedia.org/r/976854

@Legoktm I setup new VMs in production on bookworm, applied the updated puppet role, built your package and it was installed.

Issue I ran into right now was when running an update I got:

NameError: name 'load_plugins' is not defined

plugindirs is a configuration directive

elif l[0] == "plugindirs":
    for dir in parse_list(l[1]):
        load_plugins(dir, self)

Things work if I remove the "pluginsdir" config line from the global config file at /etc/rawdog/config. Doing that in puppet.

Dzahn changed the task status from Open to In Progress.Nov 27 2023, 10:39 PM

Change 977796 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: only use plugindirs config setting on buster

https://gerrit.wikimedia.org/r/977796

Change 977796 merged by Dzahn:

[operations/puppet@production] planet: only use plugindirs config setting on buster

https://gerrit.wikimedia.org/r/977796

issue above fixed. upon test updates with for example the French planet feeds:

Nov 27 23:29:54 planet1003 rawdog[39961]:     rc = feed.update(self, now, config, articles, content)
Nov 27 23:29:54 planet1003 rawdog[39961]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 27 23:29:54 planet1003 rawdog[39961]:   File "/usr/lib/python3/dist-packages/rawdog/rawdog.py", line 447, in update
Nov 27 23:29:54 planet1003 rawdog[39961]:     if len(responses) > 0:
Nov 27 23:29:54 planet1003 rawdog[39961]:        ^^^^^^^^^^^^^^
Nov 27 23:29:54 planet1003 rawdog[39961]: TypeError: object of type 'NoneType' has no len()
Nov 27 23:29:54 planet1003 systemd[1]: planet-update-fr.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 23:29:54 planet1003 systemd[1]: planet-update-fr.service: Failed with result 'exit-code'.

Change 979169 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: add ensure parameter allowing to disable update jobs

https://gerrit.wikimedia.org/r/979169

Change 979169 merged by Dzahn:

[operations/puppet@production] planet: add ensure parameter allowing to disable update jobs

https://gerrit.wikimedia.org/r/979169

@Legoktm I think I fixed it. I applied a patch and build version 3.0.2 and now I can update feeds.

I created a merge request at: https://gitlab.wikimedia.org/legoktm/rawdog/-/merge_requests/2

Change 981623 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: enable update timers in planet1003

https://gerrit.wikimedia.org/r/981623

Change 981623 merged by Dzahn:

[operations/puppet@production] planet: enable update timers in planet1003

https://gerrit.wikimedia.org/r/981623

Change 982156 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] Switch planet to bookworm VM backends

https://gerrit.wikimedia.org/r/982156

Change 982157 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: remove buster VMs from planet regex

https://gerrit.wikimedia.org/r/982157

Change 982484 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: enable feed updates on both new VMs

https://gerrit.wikimedia.org/r/982484

after some more debugging, fixing privileges on some state files and removing a broken feed from the cs lang version, now our httpbb tests work on both old and new machines:

[deploy1002:~] $ httpbb /srv/deployment/httpbb-tests/planet/test_planet.yaml --hosts=planet1002.eqiad.wmnet
Sending to planet1002.eqiad.wmnet...
PASS: 19 requests sent to planet1002.eqiad.wmnet. All assertions passed.

[deploy1002:~] $ httpbb /srv/deployment/httpbb-tests/planet/test_planet.yaml --hosts=planet1003.eqiad.wmnet
Sending to planet1003.eqiad.wmnet...
PASS: 19 requests sent to planet1003.eqiad.wmnet. All assertions passed.

Looks like we can switch :)

Change 982156 merged by Dzahn:

[operations/dns@master] Switch planet to bookworm VM backends

https://gerrit.wikimedia.org/r/982156

Change 982484 merged by Dzahn:

[operations/puppet@production] planet: enable feed updates on both new VMs

https://gerrit.wikimedia.org/r/982484

Mentioned in SAL (#wikimedia-operations) [2023-12-12T22:43:52Z] <mutante> planet2003 -manually upgrade rawdog package to 3.0.2 T348392

Change 982489 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] planet: switch to new eqiad backend

https://gerrit.wikimedia.org/r/982489

Change 982489 merged by Dzahn:

[operations/dns@master] planet: switch to new eqiad backend

https://gerrit.wikimedia.org/r/982489

We are now serving all lang versions from a bookworm VM in eqiad.

planet.discovery.wmnet is an alias for planet1003.eqiad.wmnet.
root@planet1003:/# tail -f /var/log/apache2/*.log

I can't help but notice that en.planet.wikimedia.org looks a little broken right now (It has posts from sumana from long ago), is that possibly related to this change (Just think timing is coincidental)

That's indeed a bit weird, but she also writes that these articles have been published years ago in a banner kind of thing on top.

I commented out Sumana's feed in config and ran the update service. en.planet start page has more recent things now.

(Also fixed permission issue on rss20.xml files)

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: planet1002.eqiad.wmnet

  • planet1002.eqiad.wmnet (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Host steps raised exception: Error while performing request to RAPI

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: planet1002.eqiad.wmnet

  • planet1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox

Change 983207 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: remove support for buster

https://gerrit.wikimedia.org/r/983207

cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: planet2002.codfw.wmnet

  • planet2002.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox

Change 982157 merged by Dzahn:

[operations/puppet@production] site: remove buster VMs from planet regex

https://gerrit.wikimedia.org/r/982157

This is done! Planet is now hosted on new bookworm VMs and the buster VMs have been deleted.

Change 983207 merged by Dzahn:

[operations/puppet@production] planet: remove support for buster

https://gerrit.wikimedia.org/r/983207