Page MenuHomePhabricator

Create a table to store which users have access to IPInfo, and the timestamp when access was granted [L]
Closed, ResolvedPublic

Description

Background

Following T264150: User needs to request access to IP information [L] users with the 'ipinfo' right must check a box to indicate that they will use the tool only for anti-abuse purposes. This will be stored as a user preference. Users with both the right and the preference set will be able to use the tool.

Problem

We may need to ask users to re-permission periodically. This means we need to know who has access to the tool and how long they have had access for.

This will be particularly important once users other than checkusers have access to this tool. (Checkusers have already agreed to https://meta.wikimedia.org/wiki/Confidentiality_agreement_for_nonpublic_information)

We also need to be able to tell the difference between someone changing their user preference to give up access to the tool, vs someone having their access revoked.

Solution

See T263756#7469467 and T263756#7471822 (tl;dr: Use the logging table).

To do this we will need a new table in the IPInfo extension that stores the user name, the timestamp when they first gained access, and whether they have had their access revoked. If a user unsets the preference, their row can be deleted from this table.

Proposed schema: T263756#6525579
Notes on the process: T263756#7391325

Acceptance criteria

  • Gain DBA approval for the new table. We'll need to provide the table structure and estimates of the following information (from T260372#6482475):
  • size of the table (number of rows expected)
  • expected growth per year (number of rows)
  • Expected writes to the table (per minute, per hour...per day, any of those are ok).
  • Expected amount of reads
  • Can this table be public or private (so we know if it can be replicated to our public cloud infra or it needs to be filtered)
  • The release plan for the feature (are there specific wikis you'd like to test first etc)
  • Create the new table

Event Timeline

To do this we will need a new table in the IPInfo extension that stores the user name and the timestamp when they set the user preference. If a user unsets the preference, their row can be deleted from this table.

@Tchanders perhaps instead store as the preference value the time it was set?

To do this we will need a new table in the IPInfo extension that stores the user name and the timestamp when they set the user preference. If a user unsets the preference, their row can be deleted from this table.

@Tchanders perhaps instead store as the preference value the time it was set?

I was just about to ask why we wouldn't do that... I think we could have a hook handler for the preference that converts a "check" to the datetime (if it's being changed). When we check the preference (in the API) we'll also check the value of the preference.

Niharika triaged this task as Medium priority.Sep 30 2020, 5:26 AM
Niharika updated the task description. (Show Details)

To do this we will need a new table in the IPInfo extension that stores the user name and the timestamp when they set the user preference. If a user unsets the preference, their row can be deleted from this table.

@Tchanders perhaps instead store as the preference value the time it was set?

I was just about to ask why we wouldn't do that... I think we could have a hook handler for the preference that converts a "check" to the datetime (if it's being changed). When we check the preference (in the API) we'll also check the value of the preference.

A couple notes:

  • We need to be able to track which users had access at any given time if an incident comes up. Per Legal requirements.
  • We need to have a way to blacklist some users from gaining access to this feature if their access was revoked because of an incident. We'll need to record that somewhere.

@Tchanders said she'll update this task so I'll leave that to her.

To do this we will need a new table in the IPInfo extension that stores the user name, the timestamp when they first gained access, and whether they have had their access revoked.

I wonder if we should store events in the sense of something like this:

  • user_id
  • event_type (an int, which is really an enum)
  • timestamp

This way we could have an event for "Granted Access", "Revoked Access", or "Blocked Access". Looking at the most recent event would tell us what to do as far as showing the info or allowing access to be granted again.

This task is on hold until late Q3/early Q4.

phuedx renamed this task from Store which users have access to IPInfo, and the timestamp when access was granted to Create a table to store which users have access to IPInfo, and the timestamp when access was granted.Sep 29 2021, 4:11 PM
phuedx added a subscriber: phuedx.

☝️ per today's AHT: Estimation and Planning meeting.

ARamirez_WMF renamed this task from Create a table to store which users have access to IPInfo, and the timestamp when access was granted to Create a table to store which users have access to IPInfo, and the timestamp when access was granted [L].Sep 29 2021, 4:13 PM

Following on from yesterday's AHT: Estimation and Planning meeting, here are some example table.json files:

The process of building schema change patches (which encompasses patches that add tables) is documented here: https://www.mediawiki.org/wiki/Manual:Schema_changes

Change 726686 had a related patch set uploaded (by TsepoThoabala; author: TsepoThoabala):

[mediawiki/extensions/IPInfo@master] Create a table to store which users have access to IPInfo, and the timestamp when access was granted

https://gerrit.wikimedia.org/r/726686

Based on what we've talked about is this the correct understanding of the situation? From what I can tell, we need to record:

  • user enables their own access
  • user disable their own access
  • user gains "basic" access
  • user gains "full" access
  • user has (all? some?) access revoked
  • user has (all? some?) access re-enabled (?)

If so, given the number of events, is (int)event_type still how we want to proceed? A solution in the past has been to add a performer in addition to the event type, so that it can be checked against in the case of someone else changing access. In a similar vein, it might be necessary to add a level column? So that basic vs full access/revokes can be distinguished.

Based on what we've talked about is this the correct understanding of the situation? From what I can tell, we need to record:

  • user enables their own access
  • user disable their own access
  • user gains "basic" access
  • user gains "full" access
  • user has (all? some?) access revoked
  • user has (all? some?) access re-enabled (?)

Also

  • User is granted the right to enable the tool - the ipinfo right
  • User has the right to enable the tool revoked

A solution in the past has been to add a performer in addition to the event type, so that it can be checked against in the case of someone else changing access.

Good point. Since a user can have their access changed, we'll need to log the performer, if there is one.

In a similar vein, it might be necessary to add a level column? So that basic vs full access/revokes can be distinguished.

This could either be encoded as part of the enum, e.g. GRANT_BASIC_ACCESS and GRANT_FULL_ACCESS, or we could include a context field that could store data associated with the logline. I don't see the enum becoming too unwieldy though so my preference would be the former (but it's only a preference).

TThoabala added a subscriber: TThoabala.

Based on a discussion @Niharika and I had w/Legal, this table should log enough information so that there remains a historical record of who had access/accessed information. The following log line was suggested:

“User A (admin/steward level access) checked User B” (at X timestamp?)

Which means it should log:

  • performer
  • target
  • time
  • access level (as of writing, full or basic)

We didn't get to talk in-depth about revocation but it sounded like, and @Niharika if you could confirm or deny I'd appreciate it, the log line would be something like:

"User A revoked access for User B at X timestamp for Y reason and Z time"

Which would mean logging:

  • performer
  • target
  • time
  • reason*
  • duration*

*these would be optional. The same log line could be used for the user giving themself access "User A agreed to ToS at X timestamp"
There wasn't explicit clarification about the reason and duration usages, but they came up while we were spitballing so I'm putting it here for reference just in case.

AFAICT all of these could be stored in the logging table (see https://www.mediawiki.org/wiki/Manual:Logging_table) with the following mapping:

logging table field
performerlog_actor_id
targetlog_user_text
timelog_timestamp
access level, durationlog_params

If applicable, we could insert the reason into the comment table (see https://www.mediawiki.org/wiki/Manual:Comment_table) and set the logging.log_comment_id to the relevant value. This is already done automatically by ManualLogEntry::insert().

Logging to Special:Log is a very-nearly-almost exhaustive manual entry on logging to the table/allowing users to see the table on Special:Log. Further, if Legal were to request that the log is only visible to users with a specific right, we can configure this using $wgLogRestrictions (see https://www.mediawiki.org/wiki/Manual:$wgLogRestrictions).


Edit

I should add that I should have spotted this sooner. Sorry.

We didn't get to talk in-depth about revocation but it sounded like, and @Niharika if you could confirm or deny I'd appreciate it, the log line would be something like:
"User A revoked access for User B at X timestamp for Y reason and Z time"

This looks about right. Agree that the reason and duration may be optional but the others would be required. We also discussed the retention of the logs and their impact on storing this revocation of access. We settled on retaining the logs forever in the meeting we were in but if the logs become too big and unwieldy we might want to reconsider that.

Being bold and moving this to Done: Q2 2021-22 based on our discussion during yesterday's AHT RTL Stand-up & Check-in meeting.

Niharika claimed this task.