Page MenuHomePhabricator

List of TWL users
Closed, ResolvedPublic5 Estimated Story Points

Description

I need the list of 2 cohorts of TWL users.

  • All users who logged in to TWL in the past 12 months.
  • All users who logged in to TWL in Jan, Feb and Mar of 2025.

A useful bifurcation to have on this data would be those who logged in and then used a resource, either via proxy or by successfully getting access to resource that's not in library bundle.

We need only the usernames. Please share this data privately.

Event Timeline

jsn.sherman subscribed.

A useful bifurcation to have on this data would be those who logged in and then used a resource, either via proxy or by successfully getting access to resource that's not in library bundle.

I don't understand what the two sides of this "split" are; could you clarify?

I'm trying to get a sense if there are users who logged in but didn't actually access any content. This is the split.

  1. Users who logged in and accessed a partner's website via proxy or applied for access to a non-bundle partner collection.
  2. Users who logged in but didn't access any partner collections.
Kgraessle set the point value for this task to 5.May 13 2025, 3:07 PM
Kgraessle moved this task from To be estimated to Estimated on the Moderator-Tools-Team board.

We talked about this in estimation, and we're looking at combining a couple of separate queries:

  • TWLight query for users who logged in during the specified time frames
  • TWLight query for users who submitted applications during specified time frames
  • ezprozy logsearch for users who accessed proxied resources during specified time frames

The TWLight queries would be run directly on the server, and could potentially be added as admin views.
The ezproxy logsearch will probably require that we amend our logsearch scripts which engineers run locally.

DMburugu triaged this task as Medium priority.May 27 2025, 4:11 PM
DMburugu moved this task from Estimated to Kanban on the Moderator-Tools-Team board.

@sjvipin I'm pulling this off of our board for now to reflect reality: we had more work on the board than we could accomplish at full capacity and we also have people out this week and are doing hiring (which ties up engineer time).

Kgraessle moved this task from In Progress to Ready on the Moderator-Tools-Team (Kanban) board.
Kgraessle subscribed.
jsn.sherman changed the task status from Open to In Progress.Jun 12 2025, 1:30 AM
jsn.sherman claimed this task.
jsn.sherman moved this task from Ready to In Progress on the Moderator-Tools-Team (Kanban) board.
This comment was removed by jsn.sherman.

I wanted to run my approach past @Scardenasmolinar as I think I have a reasonable approximation of the requested data:

select users_editor.wp_username, if(
    -- has application status APPROVED = 2 SENT = 4
    (applications_application.status = 2 or applications_application.status = 4), 'true', 'false') as has_approved_apps,
    -- has authorizations that were:
    if((
      -- created no more than a year ago or
      users_authorization.date_authorized >= date_sub(now(),interval 1 year) or
      -- expired no more than a year ago or
      users_authorization.date_expires >= date_sub(now(),interval 1 year) or
      -- are currently active (eg. have currently associated partners)
      count(users_authorization_partners.id) > 0
    ), 'true', 'false') as has_current_auths
from auth_user join users_editor on auth_user.id = users_editor.user_id
-- left outer join used to grab apps for approved_apps virtual column
left outer join applications_application on users_editor.id = applications_application.editor_id
-- left outer join used to grab auths for current_auths virtual column
left outer join users_authorization on auth_user.id = users_authorization.user_id
-- left outer join used to grab auth partners for current_auths virtual column
left outer join users_authorization_partners on users_authorization.id = users_authorization_partners.authorization_id
-- limit to people who logged in within the last year
where auth_user.last_login >= date_sub(now(),interval 1 year) group by users_editor.wp_username;

Instead of grabbing ezproxy log data (which is not guaranteed to be complete anyway), I'm checking on whether or not users have any approved apps (though I'm not checking the date on those) and also if they have any auths that would have been valid in the selected period.
This seems to return reasonable results, but I wanted another person pretty familiar with the TWL db to lay eyes on this complex query.

This query looks good! The only caveat I can find is that we don't know if these users lost bundle eligibility (and therefore access to the library) in the year.

Yeah, that was intentional on my part. I was trying to count everybody who had valid access for some part of the year.

This query looks good! The only caveat I can find is that we don't know if these users lost bundle eligibility (and therefore access to the library) in the year.

@sjvipin It occurs to me that can't really narrow down the logins to to a timebox like this:

All users who logged in to TWL in Jan, Feb and Mar of 2025.

We have the last login date, which lets us set a start date but not an end date. If we want this kind of monthly breakdown, then we should run a "who logged in this month" query every month and retain it until it's time to use it in a report like this. That won't help with this request, but would help moving forward. Please let me know if you have questions.

@sjvipin It occurs to me that can't really narrow down the logins to to a timebox like this:

All users who logged in to TWL in Jan, Feb and Mar of 2025.

We have the last login date, which lets us set a start date but not an end date. If we want this kind of monthly breakdown, then we should run a "who logged in this month" query every month and retain it until it's time to use it in a report like this. That won't help with this request, but would help moving forward. Please let me know if you have questions.

Noting that @sjvipin and I talked about this off thread and he agreed to the approach.