Page MenuHomePhabricator

Database table backing https://tools.wmflabs.org/admin/tools listing not being updated properly
Open, HighPublicBUG REPORT

Description

I created a tool called 'rmstats' 2 days ago (7/27). I set a bunch of metadata fields for it (title, description, link to source code, tags) at https://toolsadmin.wikimedia.org/tools/id/rmstats.

I would expect my tool to appear in the list of all hosted tools at https://tools.wmflabs.org/admin/tools, but it does not. (ctrl+f for "Colin M", or "rmstats", give 0 results). It also doesn't appear in Hay's tool directory (https://tools.wmflabs.org/hay/directory/#/search/rmstats).

(The original title of my tool was set to "RM Stats", which I later realized duplicated the title of a different tool. In case that was the cause of the issue, I changed my tool's title to "RM Stats Visualizer", but that does not seem to have fixed the issue.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 29 2019, 6:15 PM
bd808 added a subscriber: bd808.

It looks like an LDAP directory hiccup killed the script that runs on tools-sge-services-03 to populate the database that the admin tool reads tool information from:

$ sudo journalctl -u updatetools --no-pager
-- Logs begin at Mon 2019-07-29 10:10:17 UTC, end at Mon 2019-07-29 22:03:03 UTC. --
Jul 29 10:13:47 tools-sge-services-03 systemd[1]: [/lib/systemd/system/updatetools.service:13] Invalid user/group name or numeric ID, ignoring: tools.admin
Jul 29 10:13:47 tools-sge-services-03 systemd[1]: [/lib/systemd/system/updatetools.service:14] Invalid user/group name or numeric ID, ignoring: tools.admin
Jul 29 10:13:47 tools-sge-services-03 systemd[1]: Started toolforge updatetools service, update tools and user tables.
Jul 29 10:13:47 tools-sge-services-03 systemd[1]: updatetools.service: Main process exited, code=exited, status=1/FAILURE
Jul 29 10:13:47 tools-sge-services-03 systemd[1]: updatetools.service: Unit entered failed state.
bd808 renamed this task from Newly created tool not listed at https://tools.wmflabs.org/admin/tools to Database table backing https://tools.wmflabs.org/admin/tools listing not being updated properly.Jul 29 2019, 11:34 PM

And in working on this I have managed to empty out the backing database table entirely and thus far have not been able to get it to refill as expected. The script that does this maintenance is expecting python's pwd.getpwall() function to return a list of every user on the system. On the server where this runs (tools-sge-services-03) that expectation is failing. pwd.getpwall() seems to be returning the contents of /etc/passwd and ignoring the /etc/nsswitch.conf configuration which says that both the local file and sss (LDAP connector) should be queried.

bd808 claimed this task.Jul 29 2019, 11:38 PM
bd808 triaged this task as High priority.

I'm now wondering how the tools table was not emptied already, but I think it may be because the script was not starting at all because of the "Invalid user/group name" error from systemd. When I ran it manually as the tools.admin user, the pwd.getpwall() call in the script only returned the local /etc/passwd contents and not the LDAP directory contents. This is a "feature" of sssd which disables enumeration by default.

This script needs to be rewritten to talk directly to the LDAP directory instead of (ab)using the pwd family of functions.

bd808 added a comment.Jul 30 2019, 4:28 AM

I have a working version of the updatetools script that uses LDAP directly instead of Python's pwd and grp wrappers around NSS lookups. Using this I have repopulated the database tables. I also decided to fix T164971: Broken unicode characters / invalid UTF-8 on Tool Labs index while working on this.

Change 526309 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: modernize updatetools script

https://gerrit.wikimedia.org/r/526309

Change 526309 merged by Bstorm:
[operations/puppet@production] toolforge: modernize updatetools script

https://gerrit.wikimedia.org/r/526309

bd808 added a subscriber: Bstorm.Aug 7 2019, 9:10 PM

Systemd has decreed that . is an invalid character in a username (in contradiction to the POSIX standard). This is keeping the systemd unit for this script from starting. After talking about various options on IRC with @Bstorm I think the "best" fix here is to move this script into the Tool-admin tool itself as a grid job or custom Kubernetes deployment.

bd808 moved this task from Triage to In Progress on the Toolforge board.Aug 8 2019, 3:27 AM

Ah, ok, so we didn't fix that, per se. It's just the way it is right now, regarding my comment on the other task :)

Change 542777 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: Remove updatetools script

https://gerrit.wikimedia.org/r/542777

Change 542777 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: Remove updatetools script

https://gerrit.wikimedia.org/r/542777

Mentioned in SAL (#wikimedia-cloud) [2019-10-14T09:26:44Z] <arturo> cleaned-up updatetools from tools-sge-services nodes (T229261)

The remaining piece of work here is getting the current updatetools.py script that is running as a cron job under the admin tool into version control somewhere along with docs on deploying/running it.