Page MenuHomePhabricator

stewards1001 / stewards2001: automatically subscribe stewards to mailman lists (was: Enable API access for Mailman3)
Open, Stalled, LowPublic

Description

The stewards* production virtual machines were created (T344164) to facilitate onboarding and offboarding stewards from/their roles, see the parent task. One of the major bottlenecks when (on/off)boarding stewards is provisioning/revoking access to a number of mailing lists, such as:

  • stewards-l
  • global-sysops
  • global-renamers
  • checkuser-l
  • stewards-usergroup

I would like to wire membership management for those lists to the tool that I'm working on. However, Mailman3 doesn't have an easily accessible API. According to a discussion with @Ladsgroup I had about Mailman's API, the API exists, but it is restricted to production network and requires a secret that enables the client to access anything in Mailman.

Since the onboarding tool for stewards resides in production (ATM,stewards1001 and stewards2001), I believe it might be possible to provision the secret to those machines, and make use of the API to manage those lists automatically. Is that a correct assumption? Do we need to do anything else to make that possible? Is there a different way to automate list membership management in this case?

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+5 -3
operations/puppetproduction+1 -1
operations/puppetproduction+3 -3
operations/puppetproduction+16 -2
operations/puppetproduction+18 -1
operations/puppetproduction+6 -0
operations/puppetproduction+62 -13
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+14 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+7 -7
operations/puppetproduction+15 -0
operations/puppetproduction+4 -1
operations/puppetproduction+17 -0
operations/puppetproduction+11 -0
operations/puppetproduction+5 -0
operations/puppetproduction+30 -2
operations/puppetproduction+2 -1
Show related patches Customize query in gerrit
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
roles: remove stewards-usergroup@lists.wikimedia.org from the steward rolerepos/stewards/onboarding-system!1jjmc89steward-remove-wmsugmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1022193 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists: start a class for automating certain subscriptions

https://gerrit.wikimedia.org/r/1022193

Change #1023505 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] stewards-onboarder: Add mediawiki_api to the config

https://gerrit.wikimedia.org/r/1023505

Change #1023505 merged by Dzahn:

[operations/puppet@production] stewards-onboarder: Add mediawiki_api to the config

https://gerrit.wikimedia.org/r/1023505

Change #1022193 merged by Dzahn:

[operations/puppet@production] lists: start a class for automating certain subscriptions

https://gerrit.wikimedia.org/r/1022193

Change #1031565 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] stewards: add rsync server, let lists primary host pull data

https://gerrit.wikimedia.org/r/1031565

Change #1031565 merged by Dzahn:

[operations/puppet@production] stewards: add rsync server, let lists primary host pull data

https://gerrit.wikimedia.org/r/1031565

We now have an rsync server running on both steward machines that allows the primary list server to pull data from /srv/exports.

Change #1032844 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] stewards: make rsync server listen on IPv6 as well, not just 0.0.0.0

https://gerrit.wikimedia.org/r/1032844

Change #1032844 merged by Dzahn:

[operations/puppet@production] stewards: make rsync server listen on IPv6 as well, not just 0.0.0.0

https://gerrit.wikimedia.org/r/1032844

Change #1032872 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists: add timer to sync data from stewards hosts

https://gerrit.wikimedia.org/r/1032872

Change #1032872 merged by Dzahn:

[operations/puppet@production] lists: add timer to sync data from stewards hosts

https://gerrit.wikimedia.org/r/1032872

@Urbanecm lists1001 now syncs (pulls) /srv/exports from the active stewards machine. Next step would be another timer to actually run commands to sync list subscribers using that data.

[lists1001:~] $ sudo systemctl start stewards_subscriber_data_sync
[lists1001:~] $ sudo systemctl status stewards_subscriber_data_sync
● stewards_subscriber_data_sync.service - copy exported subscriber data from steward machines ()
   Loaded: loaded (/lib/systemd/system/stewards_subscriber_data_sync.service; static; vendor preset: enabled)
   Active: inactive (dead) since Mon 2024-05-20 17:41:47 UTC; 1s ago
     Docs: https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
  Process: 912 ExecStart=/usr/bin/rsync --address lists1001.wikimedia.org -ap rsync://stewards1001.eqiad.wmnet/steward-data-export-dir /srv/exports (code=exited, status=0/SUCCESS)
 Main PID: 912 (code=exited, status=0/SUCCESS)

May 20 17:41:47 lists1001 systemd[1]: Starting copy exported subscriber data from steward machines ()...
May 20 17:41:47 lists1001 systemd[1]: stewards_subscriber_data_sync.service: Succeeded.

Change #1034137 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists/stewards: add timer to run mailman syncmembers for stewards-l

https://gerrit.wikimedia.org/r/1034137

Change #1034137 merged by Dzahn:

[operations/puppet@production] lists/stewards: add timer to run mailman syncmembers for stewards-l

https://gerrit.wikimedia.org/r/1034137

Change #1036719 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists::automation: don't try to write to logfile from command

https://gerrit.wikimedia.org/r/1036719

Change #1036719 merged by Dzahn:

[operations/puppet@production] lists::automation: don't try to write to logfile from command

https://gerrit.wikimedia.org/r/1036719

Change #1036722 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists::automation: fix heredoc syntax, remove double quotes

https://gerrit.wikimedia.org/r/1036722

Change #1036722 merged by Dzahn:

[operations/puppet@production] lists::automation: fix heredoc syntax, remove double quotes

https://gerrit.wikimedia.org/r/1036722

Change #1036727 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists::automation: double quote end text to enable interpolation

https://gerrit.wikimedia.org/r/1036727

Change #1036727 merged by Dzahn:

[operations/puppet@production] lists::automation: double quote end text to enable interpolation

https://gerrit.wikimedia.org/r/1036727

Change #1036731 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists::automation: add missing spaces before line breaks

https://gerrit.wikimedia.org/r/1036731

Change #1036731 merged by Dzahn:

[operations/puppet@production] lists::automation: add missing spaces before line breaks

https://gerrit.wikimedia.org/r/1036731

@Urbanecm We now have another timer on lists1001 that runs the sync command for stewards-l, but as dry-run initially.

[lists1001:~] $ sudo systemctl status stewards_subscriber_list_sync
● stewards_subscriber_list_sync.service - sync stewards lists members with imported subscriber data
   Loaded: loaded (/lib/systemd/system/stewards_subscriber_list_sync.service; static; vendor preset: enabled)
   Active: inactive (dead) since Tue 2024-05-28 22:07:23 UTC; 3s ago
     Docs: https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
  Process: 1450 ExecStart=/usr/bin/mailman-wrapper syncmembers -n /srv/exports/mailman_list/lists.wikimedia.org/stewards-l stewards-l@lists.wikimedia.org (code=exited, status=0/SUCCESS)
 Main PID: 1450 (code=exited, status=0/SUCCESS)

The export file is rsynced from the stewards* machine(s) and it has the same subscribers already on stewards-l, so it says "Nothing to do" in the simulated run:

May 28 22:07:23 lists1001 mailman-wrapper[1450]: Nothing to do
May 28 22:07:23 lists1001 systemd[1]: stewards_subscriber_list_sync.service: Succeeded.
May 28 22:07:23 lists1001 systemd[1]: Started sync stewards lists members with imported subscriber data.

Are the subscribers for checkuser-l, global-renamers and stewards-usergroup different from the subscribers of stewards-l or should they all have the same members?

LSobanski lowered the priority of this task from Medium to Low.Jun 24 2024, 3:35 PM

Let's chat about the next steps. Also see the question about the subscribers above. Cheers!

Dzahn changed the task status from Open to Stalled.Jul 1 2024, 5:58 PM

Per IRC chat: currently this is stalled on T368836 which should be resolved first.

Change #1052188 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] lists::automation: Update stewards-l in real mode

https://gerrit.wikimedia.org/r/1052188

Urbanecm changed the task status from Stalled to Open.EditedJul 4 2024, 11:06 PM

Unstalling, as the repo has been created.

@Dzahn Let's move this forward in a bit – sorry for the delay & silence on my end. Would it be possible to:

  1. disable dry-run for stewards-l, and run it for real (hopefully, https://gerrit.wikimedia.org/r/1052188 should do that, but feel free to abandon that patch if it doesn't do the trick) let's hold on that for a couple of days, there is a last minute procedural question to resolve,
  2. get a dry-run for global-sysops, global-renamers and stewards-usergroup (private Phabricator paste should be fine for the results)?

Once we get the dry runs, we should be able to review them on the stewards end, and then we can hopefully move forward with managing more lists. I'm still working on getting all of the members for checkuser-l, as that includes users from all Wikimedia wikis, but hopefully I will be able to do that by the end of the week.

On a related note: I'm curious how logging could/should work – I can use mailman UI access to see what is the current membership, but it is challenging to see what was done by the system and what by a human. But, since the actual changes happen on another server, I figure logging can be difficult (although possibly Logstash might be an option?). Can be definitely left for another task if too complex for now, just wondering.

@Dzahn: I populated the users db with checkusers as well, so checkuser-l should now be ready for a dry run as well.

Reassigning to @Dzahn. Once the dry runs are available, happy to take over to review the diffs.

  1. get a dry-run for global-sysops, global-renamers and stewards-usergroup (private Phabricator paste should be fine for the results)?

{P66165}

Change #1053399 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman3: add defined type to sync list members (WIP)

https://gerrit.wikimedia.org/r/1053399

@Urbanecm https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053399 creates a defined type to sync the members of any list.

Then it uses it to create a timer for each of the lists above, with an "each"-loop over an array.

The default would be first to dry-run.

Change #1052188 abandoned by Urbanecm:

[operations/puppet@production] lists::automation: Update stewards-l in real mode

Reason:

this will be done differently once I526e77bdb7ceaee9b0054a3b02d18d4c89775cc8 gets merged, no point in keeping

https://gerrit.wikimedia.org/r/1052188

Change #1053399 merged by Dzahn:

[operations/puppet@production] mailman3: defined type to sync list members, create timers for each list

https://gerrit.wikimedia.org/r/1053399

Change #1054388 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman3: add missing whitespace in sync_list_members

https://gerrit.wikimedia.org/r/1054388

Change #1054388 merged by Dzahn:

[operations/puppet@production] mailman3: add missing whitespace in sync_list_members

https://gerrit.wikimedia.org/r/1054388

jjmc89 updated https://gitlab.wikimedia.org/repos/stewards/onboarding-system/-/merge_requests/1

roles: remove stewards-usergroup@lists.wikimedia.org from the steward role

https://gitlab.wikimedia.org/repos/stewards/users/-/merge_requests/1 starts to address the differences in the dry run. See comments in P66165 and users.yaml.

Change #1054610 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists: ensure list member sync only happens on the active server

https://gerrit.wikimedia.org/r/1054610

Change #1054610 merged by Dzahn:

[operations/puppet@production] lists: ensure list member sync only happens on the active server

https://gerrit.wikimedia.org/r/1054610

https://gitlab.wikimedia.org/repos/stewards/users/-/merge_requests/1 starts to address the differences in the dry run. See comments in P66165 and users.yaml.

Thanks! I merged the req, it all looks good.

@Dzahn @JJMC89's update to users.yaml above should fix a lot of differences reported in P66165. Would you mind running the dry run for global-sysops, global-renamers, stewards-usergroup and checkuser-l again, and pasting it as a new paste, so that we can go through the remaining differences?

It'd be great if the paste can be left visible to acl*stewards (where all stewards are), but I can adjust the visibility as needed once it is pasted.

Thanks for your help here!

@Urbanecm No problem. See {P67224}. This paste has a custom policy so only acl*stewards can read it and only I can edit it.

Thanks! We'll review again and let you know :).

urbanecm merged https://gitlab.wikimedia.org/repos/stewards/onboarding-system/-/merge_requests/1

roles: remove stewards-usergroup@lists.wikimedia.org from the steward role

Mentioned in SAL (#wikimedia-operations) [2024-10-09T15:23:34Z] <mutante> stewards* - rebooting machines - T351202

Dzahn changed the task status from Open to Stalled.Jan 29 2025, 8:18 PM

@Urbanecm What's the latest here? Anything waiting for me or my team here? Would you consider this stalled for now?

@Urbanecm What's the latest here? Anything waiting for me or my team here? Would you consider this stalled for now?

Hey! Thanks for the question. The problem I'm running into is limited visibility into what the actual changes are. Unless the sync runs at the lists servers, I do not know what action I am making, and whether I am accidentally removing/adding someone or not. While people should be using the yaml file instead of manual additions, there is always a chance for mistakes, and there should be a way to notice them.

However, I am unsure on how to deal with that. An ideal solution would probably be directly manipulating the list membership via an API or something (similar to GitLab or Phabricator), but that doesn't appear to be easily possible. If you have any thoughts on this, that would be really appreciated.

Would it be useful to run dry-run syncs that don't actually modify things but show what _would_ happen and then email those out to you (a group)?

Would it be useful to run dry-run syncs that don't actually modify things but show what _would_ happen and then email those out to you (a group)?

Would definitely be an improvement, but it would be good to have it more in real time. However, I recognise that might be challenging.

A general note on this: The biggest remaining problem to solve is the "it is very hard to see what you just done" problem. When running the sync, the onboarder tool doesn't tell you what changes are being made (as compared to the current maillist membership), mostly because it does not have a way how. Here is how the Gitlab integration currently behaves (for comparison and a "ideal end result" description):

== Updating gitlab_group
ERROR:root:Failed to add derhexer to repos/stewards, GitLab account does not exist.
ERROR:root:Failed to add j89 to repos/stewards, GitLab account does not exist.
ERROR:root:Failed to add stryn to repos/stewards, GitLab account does not exist.
ERROR:root:Failed to add xxblackburnxx to repos/stewards, GitLab account does not exist.
ERROR:root:Failed to add sotiale to repos/stewards, GitLab account does not exist.
ERROR:root:Failed to add adrianr to repos/stewards, GitLab account does not exist.
INFO:root:Skipping urbanecm, their access level is not managed.
INFO:root:Skipping group_2915_bot_b704c4ff5e79a1e0504714318bb1f542, their access level is not managed.
INFO:root:Skipping group_2915_bot_0336196b03d5a24b195837d1f1ee4635, their access level is not managed.
INFO:root:Removing urbanecmtest from repos/stewards, no longer authorised

I'm seeing what changes the tool is making, and if I see something that definitely shouldn't happen, I can intervene pretty much immediately. This is a bit complicated by the failures (as the tool cannot _create_ GitLab accounts for people), but that is another problem. For comparison, here is how the Mailman integration behaves:

== Updating mailman_list

I do not see anything at all, and the only way how I can audit my own actions is by manually downloading the list membership (assuming I'm a list admin on Mailman's side) before I do anything, and then compare it with the export. This is a lot of work. Moreover, it requires people to actually be Mailman list admins, which increases the probability someone would accidentally make an update on the Mailman's side, further contributing to this problem.

In theory, one might argue anyone with this level of admin access should (somehow) know of the Onboarding system's existence, and use that instead of manual actions. While that might be true in principle, it is challenging to achieve, especially in the initial period, when the Onboarding system is a very novel way of managing Mailman lists membership. I fully expect people would think of the old way (manual change on Mailman's side first) initially, and since there is nothing at all on the Mailman's side that would tell them "No", mistakes can happen very easily.

Since I happen to be the sole list admin for stewards-l (all other list admins ceased to be stewards), I decided it is a good enough time to start with that list, and I filled T351202 for now. However, this is going to be a blocker for pretty much any other list, especially big lists like checkuser-l.

In addition to what I wrote above, note that even for stewards-l, there is a non-trivial risk of someone accidentally making a change in Mailman, even though there is no other list owner. This is because Mailman3 has administrators too, and while the Trust and Safety team (one of the admins) probably wouldn't remove someone from stewards-l, it can still happen (and if it does, the T&S team should fully expect their change to stay). I decided the risk is low enough for stewards-l, as T&S would probably alert the stewards before making changes (and/or route the change request through us), but the risk is present regardless.

For all other lists, we need a solution for this problem. Unfortunately, the only one I can see is to create a intermediate service that would sit between the actual API on Mailman's side and stewards1001, authenticating updates using its own credentials (so that stewards1001 would have access only to lists it is supposed to manage, rather than all of them). This is something I can start writing if we decide that is the right solution, but I would presumably need access to Mailman itself for that (which I currently do not have).

@Dzahn, I would appreciate your thoughts on this, so that we can unblock this and put forward.

@Urbanecm What I could realistically do here is to automate sending emails to you (or any address or list of addresses) with the output of the dry-run on the lists server. And/or the actual run, the output of the mailman-wrapper command. Would that be a fix for now so you know what would happen or has happened?

Change #1128551 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists::automation: explain how this can sync mailman list members

https://gerrit.wikimedia.org/r/1128551

Change #1128551 merged by Dzahn:

[operations/puppet@production] lists::automation: explain how this can sync mailman list members

https://gerrit.wikimedia.org/r/1128551

Change #1128564 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman: list sync, add option to mail changes to an admin

https://gerrit.wikimedia.org/r/1128564

Change #1128564 merged by Dzahn:

[operations/puppet@production] mailman: list sync, add option to mail changes to an admin

https://gerrit.wikimedia.org/r/1128564

Change #1134000 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] lists: send email to meta admin when steward list members are synced

https://gerrit.wikimedia.org/r/1134000

Change #1134001 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman3: fix quoting in mail_cmd for sync_list_members

https://gerrit.wikimedia.org/r/1134001

Change #1134001 merged by Dzahn:

[operations/puppet@production] mailman3: fix quoting in mail_cmd for sync_list_members

https://gerrit.wikimedia.org/r/1134001

Change #1134000 merged by Dzahn:

[operations/puppet@production] lists: send email to meta admin when steward list members are synced

https://gerrit.wikimedia.org/r/1134000

Change #1134013 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] mailman3: remove superfluous double quotes in sync_list_members

https://gerrit.wikimedia.org/r/1134013

Change #1134013 merged by Dzahn:

[operations/puppet@production] mailman3: remove superfluous double quotes in sync_list_members

https://gerrit.wikimedia.org/r/1134013

Dzahn removed Dzahn as the assignee of this task.Dec 11 2025, 5:18 PM
Dzahn added a subscriber: Urbanecm_WMF.

@Urbanecm_WMF T351202#10626638 was still an open question. Let me or my team know when/if you want to go back to this.

@Dzahn I have added the email addresses of the new stewards to the users.yaml document, but they are not being subscribed. I have attempted doing some stuff in the SSH to load the changes, which hasn't seemed to work - have I done something wrong here or missed something? Cc @Urbanecm; I did try to contact them privately as well but they appear to be unavailable for now, and I am unable to add them manually as Urbanec is currently the only list admin.

@Dzahn I have added the email addresses of the new stewards to the users.yaml document, but they are not being subscribed. I have attempted doing some stuff in the SSH to load the changes, which hasn't seemed to work - have I done something wrong here or missed something? Cc @Urbanecm; I did try to contact them privately as well but they appear to be unavailable for now, and I am unable to add them manually as Urbanec is currently the only list admin.

It seems like the userdb (profile::stewards::userdb_gitlab_token) token expired. I put a new one at /home/urbanecm/userdb_token.secret and pulled it using it. @Dzahn, can you update it, please?

@Urbanecm I have updated the userdb_gitlab_token with the value provided, as requested.

on next puppet run:

Notice: /Stage[main]/Profile::Stewards/Git::Clone[repos/stewards/users]/Exec[git_set_origin_repos/stewards/users]/returns: executed successfully (corrective)

@EPIC It's all supposed to be handled by configuration management (puppet). Manual changes will mostly be overridden after a few minutes.