Page MenuHomePhabricator

Decide how we want to store how we log which users have access to IP Info [8 hours]
Closed, ResolvedPublic

Description

The product requirements are outlined in T264150: User needs to request access to IP information [L].

The list of questions to answer, and what the table needs to do, are outlined in T263756: Create a table to store which users have access to IPInfo, and the timestamp when access was granted [L]

Details

Due Date
Oct 28 2020, 4:00 AM

Event Timeline

I'm not sure why a new table is necessary, see T263756#6501992

Niharika renamed this task from Design table for storing which users have access to the IPInfo feature to Decide how we want to store how we log which users have access to IP Info .Oct 7 2020, 4:42 PM
Niharika updated the task description. (Show Details)
Niharika renamed this task from Decide how we want to store how we log which users have access to IP Info to Decide how we want to store how we log which users have access to IP Info [8 hours].Oct 7 2020, 4:48 PM
ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".Oct 8 2020, 10:14 PM
ARamirez_WMF set Due Date to Oct 20 2020, 4:00 AM.
ARamirez_WMF changed Due Date from Oct 20 2020, 4:00 AM to Oct 28 2020, 4:00 AM.Oct 21 2020, 6:37 PM

I'm not sure why a new table is necessary, see T263756#6501992

(This is referring to a suggestion that we simply store the preference as a timestamp instead of a bool.)

The main reason for a new table is to be able to store history about who had access and when, and to be able to record whether a user had their access revoked.

These requirements come from T264150: User needs to request access to IP information [L]:

Things to note:

  • T&S/Legal reserve the right to revoke a user’s access permissions in case of abuse
  • If a user’s permission is revoked by us, they should not be able to activate it again
  • Legal would like us to capture who had access to IP Info at any given time in case an incident occurs
  • There is a possibility that users might need to regain access periodically (TBD)

Brief outline

User groups with the ipinfo right can access the IPInfo feature. However, for legal reasons, we need users who actually use the feature to sign an agreement first, and this signature may need to be repeated periodically. We also need to be able to revoke access from individual users, in case of abuse. We also anticipate that users may want to enable and disable the feature at different times.

For these reasons, users with the ipinfo right will also have an ipinfo user preference, which they can enable or disable at any time (T263752). When they enable it, they will need to sign the agreement (via checking a box). Legal should be able to revoke their access, which sets their preference to disabled and prevents the user from re-enabling it.

We also want to be able to log all of these events, so that we can see who had access to the tool at any particular time.

Investigation

I think an ipinfo_log table along the lines of David's suggestion from T263756#6525579 should work (we should rename the fields in accordance with the table naming conventions):

user_id
event_type (an int, which is really an enum)
timestamp

Event types (as concepts)

  • Toggle preference on
  • Toggle preference off
  • Legal revokes access
  • ?Legal restores access (see below)

How each requirement from T264150 is met

  • T&S/Legal reserve the right to revoke a user’s access permissions in case of abuse

When access is revoked:

  • Insert a row with the "Legal revokes access" type
  • Toggle preference off
  • Insert a row with the "Toggle preference off" type (this should happen automatically when the preference is toggled off)

I think we should have a way to restore access, in case access is incorrectly revoked for whatever reason.

When access is restored:

  • Delete any rows with the "Legal revokes access" type (...Or insert a row with the "Legal restores access" type, if we need to keep a record that it was revoked)
  • If a user’s permission is revoked by us, they should not be able to activate it again

When a user tries to enable the preference:

  • Check for rows with type "Legal revokes access" (...And "Legal restores access" if we decided not to delete "Legal revokes access" rows)
  • Disallow if any are found (...Or if the one with the most recent timestamp is "Legal revokes access")

@Prtksxna @Niharika I think it would be helpful if we could have a UI for revoking/restoring access rather than changing the DB directly (e.g. via a maintenance script) each time.
@Prtksxna We should have a way to communicate to the user why they couldn't set the preference, if they had their access revoked

  • Legal would like us to capture who had access to IP Info at any given time in case an incident occurs

We could work this out from the events in the log table:

  • Following "User toggles preference on", the user has access
  • Following "User toggles preference off", the user does not have access
  • There is a possibility that users might need to regain access periodically (TBD)

We could check the timestamp the most recent "Toggle preference on" row. When we would do this depends on how this needs to be done.

Answering questions from T263756

  • size of the table (number of rows expected)
  • expected growth per year (number of rows)
  • Expected writes to the table (per minute, per hour...per day, any of those are ok).
  • Expected amount of reads

These will all depend on how many people have access. @Niharika do we know this yet?

I'd expect this to change over time - presumably it would increase a lot once IPs were masked.

  • Can this table be public or private (so we know if it can be replicated to our public cloud infra or it needs to be filtered)

I believe this table would need to be private, as it contains details of user preferences?

  • The release plan for the feature (are there specific wikis you'd like to test first etc)

@Niharika would know more about this.

Thanks for the investigation @Tchanders! This is great.

Brief outline

User groups with the ipinfo right can access the IPInfo feature. However, for legal reasons, we need users who actually use the feature to sign an agreement first, and this signature may need to be repeated periodically. We also need to be able to revoke access from individual users, in case of abuse. We also anticipate that users may want to enable and disable the feature at different times.

For these reasons, users with the ipinfo right will also have an ipinfo user preference, which they can enable or disable at any time (T263752). When they enable it, they will need to sign the agreement (via checking a box). Legal should be able to revoke their access, which sets their preference to disabled and prevents the user from re-enabling it.

We also want to be able to log all of these events, so that we can see who had access to the tool at any particular time.

Investigation

I think an ipinfo_log table along the lines of David's suggestion from T263756#6525579 should work (we should rename the fields in accordance with the table naming conventions):

user_id
event_type (an int, which is really an enum)
timestamp

Event types (as concepts)

  • Toggle preference on
  • Toggle preference off
  • Legal revokes access
  • ?Legal restores access (see below)

How each requirement from T264150 is met

  • T&S/Legal reserve the right to revoke a user’s access permissions in case of abuse

When access is revoked:

  • Insert a row with the "Legal revokes access" type
  • Toggle preference off
  • Insert a row with the "Toggle preference off" type (this should happen automatically when the preference is toggled off)

I think we should have a way to restore access, in case access is incorrectly revoked for whatever reason.

When access is restored:

  • Delete any rows with the "Legal revokes access" type (...Or insert a row with the "Legal restores access" type, if we need to keep a record that it was revoked)
  • If a user’s permission is revoked by us, they should not be able to activate it again

When a user tries to enable the preference:

  • Check for rows with type "Legal revokes access" (...And "Legal restores access" if we decided not to delete "Legal revokes access" rows)
  • Disallow if any are found (...Or if the one with the most recent timestamp is "Legal revokes access")

@Prtksxna @Niharika I think it would be helpful if we could have a UI for revoking/restoring access rather than changing the DB directly (e.g. via a maintenance script) each time.

I am hesitant about us building this UI because we don't know how frequently this might be needed. If I were to guess, I would say it would probably happen once every 6 months. We can make a task for building a UI and wait on it until we have more clarity on that.

@Prtksxna We should have a way to communicate to the user why they couldn't set the preference, if they had their access revoked

I am envisioning the preference would appear disabled in Special:Preferences and we can show text underneath explaining why.

  • Legal would like us to capture who had access to IP Info at any given time in case an incident occurs

We could work this out from the events in the log table:

  • Following "User toggles preference on", the user has access
  • Following "User toggles preference off", the user does not have access
  • There is a possibility that users might need to regain access periodically (TBD)

We could check the timestamp the most recent "Toggle preference on" row. When we would do this depends on how this needs to be done.

Answering questions from T263756

  • size of the table (number of rows expected)
  • expected growth per year (number of rows)
  • Expected writes to the table (per minute, per hour...per day, any of those are ok).
  • Expected amount of reads

These will all depend on how many people have access. @Niharika do we know this yet?

Per @Prtksxna and my discussion, we think that the feature will be accessible to all autoconfirmed users and above. We will reserve the more private information for admins and checkusers only. But, this does not mean that all this feature will be of interest to all those users. We don't have a good way to estimate how many users might be interested in seeing this information for patrolling purposes. Any ideas?

I'd expect this to change over time - presumably it would increase a lot once IPs were masked.

Yep, I agree.

  • Can this table be public or private (so we know if it can be replicated to our public cloud infra or it needs to be filtered)

I believe this table would need to be private, as it contains details of user preferences?

I would think so.

  • The release plan for the feature (are there specific wikis you'd like to test first etc)

@Niharika would know more about this.

Current plan is beta cluster first. Yet to determine a set of pilot wikis.

I am hesitant about us building this UI because we don't know how frequently this might be needed. If I were to guess, I would say it would probably happen once every 6 months. We can make a task for building a UI and wait on it until we have more clarity on that.

Ok - we could make a maintenance script in the meantime.

Per @Prtksxna and my discussion, we think that the feature will be accessible to all autoconfirmed users and above. We will reserve the more private information for admins and checkusers only. But, this does not mean that all this feature will be of interest to all those users. We don't have a good way to estimate how many users might be interested in seeing this information for patrolling purposes. Any ideas?

@cwylo might have some thoughts? We can also ask the DBAs what they think, at the point when we ask for the new table. All autoconfirmed users could be millions of rows though, so we might need to take that into consideration.

Niharika changed the subtype of this task from "Deadline" to "Task".Dec 8 2020, 5:10 AM

@ARamirez_WMF: Hi, the Due Date set for this open task is more than three months ago. Can you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!