Page MenuHomePhabricator

Support with steps to access Toolforge user data
Closed, ResolvedPublic

Description

The Design team at WMF is planning to conduct user research of Toolforge users. For the study, we need to recruit participants.

The data required is

  • Toolforge user name
  • Date when toolforge account is created
  • Associated SUL wiki account (if available)

The request is not the data itself, but a writeup how this data can be accessed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

Assigning this to Bryan after a Slack conversation with him.

The data you are looking for exists in the LDAP directory that stores Developer account information. This is the same LDAP directory that is used in the Cloud VPS and Toolforge projects to provide Unix account information and ssh public keys to virtual machines. It is also the same LDAP directory that backs the Wikimedia IDM (Bitu) and Wikimedia Developer SSO Portal (CAS-SSO) projects.

The LDAP directory is open for inspection and queries from the Cloud VPS, Toolforge, and WMF internal networks. I tend to access it from Toolforge more than anywhere else myself. I have documented some helpers I use at https://wikitech.wikimedia.org/wiki/User:BryanDavis/LDAP.

The specific information that you are after are the uid (shell account name), cn (legacy Wikitech username), createTimestamp (account create date), wikimediaGlobalAccountId & wikimediaGlobalAccountName (SUL account), and mail (email address) attributes of objectClass=posixAccount entities that are members of the cn=project-tools,ou=groups,dc=wikimedia,dc=org group.

Using the shell aliases from User:BryanDavis/LDAP, you could request that information from a Toolforge bastion something like:

bd808@tools-bastion-12$ alias ldap='ldapsearch -xLLL -P 3 -E pr=5000/noprompt -o ldif-wrap=no -b"dc=wikimedia,dc=org"'
bd808@tools-bastion-12$ alias un64='awk '\''BEGIN{FS=":: ";c="base64 -d"}{if(/\w+:: /) {print $2 |& c; close(c,"to"); c |& getline $2; close(c); printf("%s:: \"%s\"\n", $1, $2); next} print $0 }'\'''
bd808@tools-bastion-12$ ldap '(&(objectClass=posixAccount)(memberOf=cn=project-tools,ou=groups,dc=wikimedia,dc=org))' uid cn createTimestamp wikimediaGlobalAccountId wikimediaGlobalAccountName mail | un64
(...snip...)

dn: uid=bd808,ou=people,dc=wikimedia,dc=org
uid: bd808
cn: BryanDavis
createTimestamp: 20130729163514Z
wikimediaGlobalAccountId: 12874
wikimediaGlobalAccountName: BryanDavis
mail: bdavis@wikimedia.org

(...snip...)

This shows my Developer account information:

  • dn: uid=bd808,ou=people,dc=wikimedia,dc=org -- the "dn" is the primary key for an LDAP record.
  • uid: bd808 -- "uid" in our environment is the account's shell name.
  • cn: BryanDavis -- "cn" in our environment is the account's "common name" which was also historically the user's Wikitech account name prior to October 2024.
  • createTimestamp: 20130729163514Z -- this Developer account was created 2013-07-29 16:35:51 UTC.
  • wikimediaGlobalAccountId: 12874 -- OAuth verified associated SUL account id.
  • wikimediaGlobalAccountName: BryanDavis -- OAuth verified associated SUL account username. Note that the username may have changed via a global account rename since being stored in LDAP. The SUL account id is invariant and can be used to find the current account name in the centralauth.globaluser database table.
  • mail: bdavis@wikimedia.org -- The Developer account's email address.

The command line tools can be nice for making quick lookups. The output format can be challenging to work with if you really want to create a CSV or TSV data file or to drive other automation. In that case I tend to write small Python scripts that use the ldap3 library to access the LDAP directory. Here is an example adapted from a script I recently wrote to support the Wikitech SUL migration:

example.py
import ldap3
import yaml

cfg = yaml.safe_load(open("/etc/ldap.yaml")) # Toolforge bastions and Kubernetes containers have this file at runtime
conn = ldap3.Connection(cfg["servers"], auto_bind=True, read_only=True)
base = "ou=people,{basedn}".format(basedn=cfg["basedn"])
selector = "(&{})".format(
    "".join(
        [
            "(objectClass=posixAccount)",
            "(memberOf=cn=project-tools,ou=groups,dc=wikimedia,dc=org)",
        ]
    )
)

r = conn.extend.standard.paged_search(
    base,
    selector,
    attributes=[
        "uid",
        "cn",
        "createTimestamp",
        "wikimediaGlobalAccountId",
        "wikimediaGlobalAccountName",
        "mail",
    ],
    paged_size=256,
    time_limit=5,
    generator=True,
)
for user in r:
    # Do interesting stuff with the Developer account here
    print(user)

To run this script on Toolforge I would typically use some tool account I have access to (like https://toolsadmin.wikimedia.org/tools/id/bd808-test) and run things from inside a webservice python3.11 shell session:

$ ssh login.toolforge.org
$ become $MY_TOOL_NAME
$ webservice python3.11 shell
tools.MY_TOOL_NAME@shell-1234567890:~$ python3 -m venv venv
tools.MY_TOOL_NAME@shell-1234567890:~$ ./venv/bin/pip3 install ldap3 pyyaml
tools.MY_TOOL_NAME@shell-1234567890:~$ vim example.py
  # Paste in the script
  # :wq
tools.MY_TOOL_NAME@shell-1234567890:~$ ./venv/bin/python3 example.py
KCVelaga_WMF awarded a token.

All that information was super helpful @bd808! We are able to get the data required, thanks again.