Page MenuHomePhabricator

CU 2.0: Logging for Special:Investigate - Adding the data
Closed, ResolvedPublic5 Estimated Story PointsMar 12 2020

Description

Goal

We want to add logging to Special:Investigate so there is a record for every lookup made by a checkuser for someone's private information.

Acceptance criteria

Adding to the log:
  • In the initial input form, generate a new log entry for every username/IP being looked up.
  • As the user drills down into the data by clicking the Show all N IPs for this user and Show all N users on this IP, there is a new log entry created for the check with the same reason as they used in the initial form. (Will be done after T244817 and T244816)
  • As discussed previously, we will be using the cul_type (thank God that column is not an ENUM) column to indicate that the edits are coming from the new special page, so we only display those in the log page for Special:Investigate.
Displaying the logs:
  • There is a button on the right side of the Special:Investigate header to access the logs - Logs (T247516)

image.png (192×2 px, 59 KB)

  • That leads to Special:InvestigateLog (T247516) with the page title - Investigation logs (T247024)
  • There is a link under the page title to go back to the main form - Switch to Investigate form (T247516)
  • On the log page, log entries appear as: (T247024)
00:00, 1 January 2020, CheckUser looked up information for 1.1.1.1 (talk | contribs | block) (Reason for investigation)
00:01, 1 January 2020, CheckUser looked up information for Foobar (talk | contribs | block) (Reason for investigation)

Notes

  • There will be follow-up ticket(s) to add some filters to the logs like there are right now.
  • There will be follow-up ticket(s) to have some messaging in the UI to let the user know when we are logging something.

Event Timeline

Niharika triaged this task as Medium priority.Feb 19 2020, 7:26 PM
Niharika created this task.

@Niharika will this be a separate log or will we use the existing log?

@Niharika will this be a separate log or will we use the existing log?

There will be a new special page for the log, associated with Special:Investigate so we can keep track of the lookups happening through the new page.

There will be a new special page for the log, associated with Special:Investigate so we can keep track of the lookups happening through the new page.

Ah! ok, so we'll create a new database table(s) for this. Cool beans.

Another new table? I wonder if we'll run into resistance for that?

Maybe we could "tag" the logs from the new tool and keep them in the same DB, he said without looking at the schema.

Maybe we could "tag" the logs from the new tool and keep them in the same DB, he said without looking at the schema.

The current log is one entry per target. I assume we'll now have one entry with multiple targets. Then again, we could aggregate the tagged log entries, which might be better anyways.

Another new table? I wonder if we'll run into resistance for that?

I would expect little to no resistance as the new table is for an extension, not MW core. And it won't be storing any private data, just the logs. I don't think anyone would object.

Cool. I wasn't aware of the distinctions. That's good news.

@Tchanders pointed out that there is an existing "type" field on the cu_log table:

-- String indicating the type of query, may be:
-- "useredits", "userips", "ipedits", "ipusers", "ipedits-xff", "ipusers-xff"
cul_type varbinary(30) not null,

I think we could use that to indicate that this is an investigation log?

Todos before we can estimate this:

  • Check in with Trust and Safety about what we log here
  • Ensure the schema will not change after we've decided what to log
  • Check in with DBAs about creating a new table

I kind of like the idea of using the existing logging table because then as the user adds additional targets during the course of their investigation, those targets can be added as individual log entries, rather than modifying the existing log entry.

I like this idea of using the same table with a specific cul_type. I agree with David that it provides flexibility and continuity for whatever tools, people, processes might be using the data in this table.

I suppose there's a question about polluting the table before this is officially released. Is there a way to mitigate that? Or is it not a problem to begin with?

The new special page allows you to lookup multiple usernames and IPs at once and when T244816 and T244817 are done, they will be adding usernames and IPs to the existing ongoing investigation. It might be more helpful to log all the lookups made in one record, indicating to the end user that they were all part of the same investigation. This relates to the idea @jwang mentioned about being able to uniquely identify cases from the logs. Is it possible to achieve this if we reuse the existing table?

It might be more helpful to log all the lookups made in one record, indicating to the end user that they were all part of the same investigation. This relates to the idea @jwang mentioned about being able to uniquely identify cases from the logs. Is it possible to achieve this if we reuse the existing table?

Yes. We could either add a new column to the existing database table (cul_case or something like that) or we could build that into the design of a new table. Do you know if it is time-consuming to get a new column added to a database table?

I think having a "case" sounds like a wonderful idea. It would allow us to remove the tokens too (if we want).

! In T245662#5916567, @dbarratt wrote:
Do you know if it is time-consuming to get a new column added to a database table?

Yes, it can be. This is especially true when it means adding a column to a database with many rows. The locking that occurs is very challenging to manage on a hot DB like ours and with our replication needs. However, this is an extension so maybe the expectations are different. I'd look to @Mooeypoo for more cogent guidance. Caveat: I'm not an expert ;)

I think having a "case" sounds like a wonderful idea. It would allow us to remove the tokens too (if we want).

This smells a bit like scope creep though I like the idea. Given that this is an extension, adding a new table shouldn't be too challenging and we could have this functionality live there. But, if adding a column on the table is way easier than I know about, we could do that instead.

Todos before we can estimate this:

  • Check in with Trust and Safety about what we log here

T&S gave their thumbs up about changing the log from being specific about what's logged from being specific about what was retrieved to a more generic version.
Currently:

00:00, 1 January 1970, CheckUser got edits for 1.1.1.1 (talk | block) (Barfoo)
00:00, 1 January 1970, CheckUser got users for 1.1.1.1 (talk | block) (Barfoo)
00:00, 1 January 1970, CheckUser got IP addresses for Foobar (talk | contribs | block) (Barfoo)

New log:

00:00, 1 January 1970, CheckUser looked up information for 1.1.1.1 (talk | contribs | block) (Barfoo)
00:00, 1 January 1970, CheckUser looked up information for Foobar (talk | contribs | block) (Barfoo)
  • Ensure the schema will not change after we've decided what to log
  • Check in with DBAs about creating a new table

Looking at the above comments, it seems to me that we can go with the existing table since we aren't changing that's much. Let's discuss this more in the planning meeting.

Niharika set the point value for this task to 5.Mar 5 2020, 7:18 PM
Niharika renamed this task from CU 2.0: Logging for Special:Investigate to CU 2.0: Logging for Special:Investigate - Adding the data.Mar 5 2020, 8:09 PM

@Niharika can we have a separate ticket for displaying the "Logs" button at the top right? Or maybe group it with the ticket for "new investigation"? I can't find that task if one already exists.

@Niharika can we have a separate ticket for displaying the "Logs" button at the top right? Or maybe group it with the ticket for "new investigation"? I can't find that task if one already exists.

Yeah - feel free to split it out. There's a task for New investigation button here: T242945: CheckUser 2.0: Provide a way to start a new investigation [xsmall].

Change 578972 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/CheckUser@master] SpecialInvestigate: Add log entries when investigation is performed

https://gerrit.wikimedia.org/r/578972

@Niharika Should the investigation be re-logged when a check user filters out targets?

Example: A check user starts an investigation with UserA and UserB. A log entry is added for each of UserA and UserB at this point. On the Compare tab, the check user then filters out UserB. Do we think of this as starting a new investigation with UserA alone, in which case we should add a log entry for UserA again? Or do we think of it as continuing the previous investigation, so we shouldn't log anything at this point?

@Niharika Should the investigation be re-logged when a check user filters out targets?

Example: A check user starts an investigation with UserA and UserB. A log entry is added for each of UserA and UserB at this point. On the Compare tab, the check user then filters out UserB. Do we think of this as starting a new investigation with UserA alone, in which case we should add a log entry for UserA again? Or do we think of it as continuing the previous investigation, so we shouldn't log anything at this point?

The purpose of the logging is to essentially document the fact that a user's private information was accessed by the checkuser. So we only need to log whenever a new target is added. Filtering out a target should not add any logs.

@Niharika can we have a separate ticket for displaying the "Logs" button at the top right? Or maybe group it with the ticket for "new investigation"? I can't find that task if one already exists.

Yeah - feel free to split it out. There's a task for New investigation button here: T242945: CheckUser 2.0: Provide a way to start a new investigation [xsmall].

Have made a new task for the links - T247516. Also annotated the acceptance criteria of this task, to make clear which ones go with which task.

ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".

Change 578972 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] SpecialInvestigate: Add log entries when investigation is performed

https://gerrit.wikimedia.org/r/578972

dom_walden added a subscriber: dom_walden.
  • In the initial input form, generate a new log entry for every username/IP being looked up.

Tested investigating usernames, IPs (v4 and v6) and ranges, including multiples of each.

It gets correctly recorded (including Reason) in Special:InvestigateLog, one row per username/IP/range.

  • As discussed previously, we will be using the cul_type (thank God that column is not an ENUM) column to indicate that the edits are coming from the new special page, so we only display those in the log page for Special:Investigate.

The log entries do not appear in Special:CheckUserLog.

We don't record new log entries when you use filters. This makes sense, as you are not actually retrieving any new data about users.

Displaying the logs:

Tested as part of T247024.