Page MenuHomePhabricator

Update make-toolforge-user-list.py
Open, Needs TriagePublic

Description

The script used to produce the list of email addresses for the annual survey expects there to be a local 'labswiki' database. That used to work on a cloudweb but now that wikitech is a real wiki we need another way to determine the opt-outs.

https://gitlab.wikimedia.org/repos/cloud/wmcs/wmcs-survey-mail-list

Is this something we can get from the wikireplicas, and move this to toolforge? Or is the opt-out flag redacted?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The --optout flag is for the SendBulkEmails php script.

Bryan's script above only extracts the list of users into a file that is passed to the php script.

The labswiki database that used to be on m5-master is now in the s6 cluster. Developer accounts are completely disconnected from Wikitech following the SUL migration starting last October, so any script that was using the labswiki database will need to find some new way to look at the LDAP directory which is the canonical storage instead.

The part where we used to check the Wikitech account's "Allow other users to email me" preference (disablemail) in make-toolforge-user-list.py already uses the labswiki database from the wiki replicas (no, it just shells out to sql but did that against the production db), but it is questionable usability now that labswiki users are SUL users and not Developer accounts.

make-opt-out-list.py should still work as it is based on taking the names on the https://wikitech.wikimedia.org/wiki/Annual_Toolforge_Survey/Opt_out page and mapping them to cn values in the LDAP directory.

Basically the workflow that is based on email addresses that we came up with when using qualtrics should still work. The original workflow that maps into labswiki users and the SendBulkEmails.php maintenance script is should probably be deleted because there is no stable mapping from Developer accounts to wiki accounts anymore.

I've gotten the email list of toolforge and cloudvps users via ldap(using project-tools and project-bastion groups).
I've removed duplicates and also taken out those on the optout list.

I still have 6K+ users. This is way more than the usual <3K users we often contact for the survey.
I want to make sure I don't spam others.

I can also take out obvious ones like root@wmflabs.org, root@wmcloud.org etc.
What other emails should I be filtering out from the list?

bd808@tools-bastion-14.tools.eqiad1:~$ ldap -b cn=project-bastion,ou=groups,dc=wikimedia,dc=org | grep member: | wc -l
5558
bd808@tools-bastion-14.tools.eqiad1:~$ ldap -b cn=project-tools,ou=groups,dc=wikimedia,dc=org | grep member: | wc -l
2917

The 5,558 count in the bastion project is a little surprising to me, but I suppose we never really remove anyone from that group, even if their account has been locked at some point or we haven't seen them login in 10 years.

This is way more than the usual <3K users we often contact for the survey.

When we were pulling the list from Wikitech's user table we were missing some of the Toolforge maintainers and Cloud VPS members who had never attached their Developer accounts there, but this does look like a lot more than the 2553 folks we sent the 2022 survey to.

Past counts I got from meta:

  • 2018: 1722
  • 2019: 1857
  • 2020: 1825 <-- seems weird that this one went down
  • 2021: 2388
  • 2022: 2553

One of the things I wondered about was missing lots of users who had opt-ed out of email contact in the past via their Wikitech preferences. That doesn't seem like a big problem however:

wikiadmin2023@10.192.9.8(labswiki)> select count(*) from user_properties where up_property='disablemail' and up_value=1;
+----------+
| count(*) |
+----------+
|      223 |
+----------+
1 row in set (0.005 sec)

Ah ha! The https://gitlab.wikimedia.org/repos/cloud/wmcs/wmcs-survey-mail-list/-/blob/main/make-cloudvps-email-list.py script only gathered emails for folks with the projectadmin role (replaced by the member role). The method of taking everyone from the bastion project according to LDAP is a much wider group.

There are 1293 users in eqiad1 with the 'member' role. It's easy for me (or you) to dump the list of userids ("openstack role assignment list --role member") but correlating the usernames back to their email addresses will take a few lines of code.

Thanks!
Getting the email from the username is the easy part for me.
After some discussions with Bryan, I'm working to tweak the make-cloudvps-email-list script to get the list.

I've gotten past an error I was getting earlier so let's see how it goes...

I might come back to ask for the above 1293 users.

I've gotten the cloud vps email list now.

Once I'm done, I will update the wiki with the new process.

Thank you all.

The total list is now 3279.

I've started sending them out.