Page MenuHomePhabricator

Find a replacement for RSS aggregator for planet.wikimedia.org
Closed, ResolvedPublic

Description

Some early tests in https://phabricator.wikimedia.org/T280989#7030571 have shown that the RSS aggregator we're currently using for planet.wikimedia.org is no longer developed and wasn't ported to Python 3. As such we eventually need to find a replacement.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Port to Python 3 and other improvementslegoktm/rawdog!1legoktmpy3main
Customize query in GitLab

Event Timeline

On IRC you mentioned a github link that seemed promising.

Thanks! will look at that (later)

Change 898982 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] planet: if on bullseye, install package contents via puppet

https://gerrit.wikimedia.org/r/898982

I very much appreciate and depend upon the planet service so happy to spend some time working on it if it would be useful...

https://github.com/rubys/venus/issues/37 points to https://github.com/feedreader/pluto which is written in Ruby and still seems to be actively maintained.

pluto hasn't had a commit since Oct 2020 :(

ptlink last had a commit 7 months ago (https://salsa.debian.org/wouter/ptlink), but knowing the author it's probably stable enough, though it's not packaged in Debian AFAICT. It is perl, which is (my impression) less preferred at Wikimedia but not a dealbreaker.

I skimmed the buster version of the rawdog code, I'd estimate it would take like a week to port to Python 3 if we could rip out features Wikimedia doesn't use. I'd also be willing to write a new planet that addresses stuff like T207244 if I'm allowed to do it in Rust.

I'd also be willing to write a new planet that addresses stuff like T207244 if I'm allowed to do it in Rust.

That would be fantastic :-) If it only uses rustc and crates found in bookworm, that would be perfect, but given the static ELF nature of Rust binaries it's also not a problem to build on sid and then import (that's what we also do with various Go code bases in the moment)

Change 898982 abandoned by Dzahn:

[operations/puppet@production] planet: if on bullseye, install package contents via puppet

Reason:

https://gerrit.wikimedia.org/r/898982

https://gitlab.wikimedia.org/legoktm/planet is what I have so far, the basic structure is in place and it works, but is really rough. The README outlines the remaining work in rough priority order.

Dzahn lowered the priority of this task from Medium to Low.EditedJul 11 2023, 5:23 PM
Dzahn edited projects, added collaboration-services; removed SRE.

This will be revisited end of October.

Then we will decide to either use Legoktm's solution or the solution used by Debian or, if for some reason both are not possible, we would have to shut it down.

One of the options should then be completed by end of year 2023.

Dzahn removed Dzahn as the assignee of this task.Jul 11 2023, 5:24 PM

I don't really have as much time as I'd like to make the October deadline, so I ended up spending ~2 hours porting rawdog to Python 3.11, it's mostly there. There's still a bit more to do (I need to fix the tidy integration plus move the rss plugin in-tree) and some other improvements to make, like using a TOML file for configuration and maybe jinja2 templates. That'll be done in time and then we can incrementally rewrite it in Rust at our leisure :)

There's still some TODOs left in https://gitlab.wikimedia.org/legoktm/rawdog but in my limited local testing, it seems to work. I'll do another pass in a few days (I also want to add some Python CI tools) before uploading to apt.wikimedia.org. I also pushed up a puppet patch that should make all the necessary changes (but that too needs more work).

I amended to Legoktm's puppet change that was in Gerrit, in a way that it was noop on existing buster servers while adding support for the new version. Then once I could confirm noop in prod I merged it.

Now next step is to get the package built.

Then we can create new VMs on bookworm and try things out there.

I git pulled from the gitlab repo to build2001 and was able to succesfully build the package.

Then uploaded all files to apt1001, edited the .changes file to have "bookworm" rather than "unstable" as distribution name and then
I could import it into wikimedia-bookworm with reprepro.

[apt1001:~] $ sudo -E reprepro ls rawdog
rawdog | 3.0.0 | bookworm-wikimedia | amd64, source
Dzahn raised the priority of this task from Low to Medium.Nov 20 2023, 7:34 PM

Meanwhile we have 2 new VMs with bookworm, planet puppet role applied and new package installed.

I had to make a small change to remove the "plugindir" config line if not on buster. Details in T348392.

After that I could run a test update of feeds but ran into this:

Nov 27 23:29:54 planet1003 rawdog[39961]:     rc = feed.update(self, now, config, articles, content)
Nov 27 23:29:54 planet1003 rawdog[39961]:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 27 23:29:54 planet1003 rawdog[39961]:   File "/usr/lib/python3/dist-packages/rawdog/rawdog.py", line 447, in update
Nov 27 23:29:54 planet1003 rawdog[39961]:     if len(responses) > 0:
Nov 27 23:29:54 planet1003 rawdog[39961]:        ^^^^^^^^^^^^^^
Nov 27 23:29:54 planet1003 rawdog[39961]: TypeError: object of type 'NoneType' has no len()
Nov 27 23:29:54 planet1003 systemd[1]: planet-update-fr.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 23:29:54 planet1003 systemd[1]: planet-update-fr.service: Failed with result 'exit-code'.

^ I got this error fixed. And built version 3.0.2 of the package.

I can now succesfully update feeds and should be all done.

Merge request is at https://gitlab.wikimedia.org/legoktm/rawdog/-/merge_requests/2

Optimistically closing this as resolved.

Also it's now a duplicate of T348392 :) yay

we found a replacement. implementing is also T348392