CU 2.0: Eventlogging for Special:Investigate [8Hours]
Closed, ResolvedPublic3 Estimated Story PointsJun 2 2020
Actions

Assigned To

Authored By

	Niharika
	Mar 4 2020, 10:19 PM

Description

Goal

This task is about instrumenting Special:CheckUser so we can gather data we need to track adoption and make necessary changes on the tool.

Metrics

We want to be gathering anonymous data so we won't be capturing any usernames or IPs. This data will be captured for each wiki Special:Investigate is deployed on.

#	What we want to know	What can we track	Log event
1	Adoption for the new special page	How often Special:Investigate is accessed?
2	Adoption for the new special page	How often Special:Checkuser is accessed?
3	Technical limits	How many users does an investigation start with?
4	Technical limits	How many users are under investigation by the time the investigation ends?
5	Technical limits	How many records records are fetched per investigation?
6	User experience	How often did we display incomplete results to the user?
7	User experience	How long did a user spend waiting for the results?
8	User experience	How often did a user quit the investigation (page close or back button) while the results were being generated?
9	Feature usage for the tabs	Over the course of an investigation how much time did a user spend on Preliminary check versus Compare versus Timeline? OR how often did they access the individual tabs over the course of the investigation?
10	Feature usage for the filters	Over the course of an investigation how many times did a user use filters and which ones?
11	Feature usage for the highlights	How many times did a user pin a highlight?
12	Feature usage for blocking	(contingent on T248530) How often did the block feature get used and how many users were blocked?
13	Reliance on external websites for IP information	Which tools under the IP address (T250290) were clicked and how many times?

Open question:

How can we track when an investigation "ends"?

Details

Due Date: Jun 2 2020, 7:00 PM

Related Objects

Mentioned In: T257539: Special:Investigate usage metrics
T256888: Add 'End investigation' button to Special:Investigate
T255688: Instrument Special:Investigate using EventLogging
T255687: Create EventLogging schema for Special:Investigate
Mentioned Here: T248530: CU 2.0 - Block: Select users and IPs to block
T250290: CU 2.0: Show tool-links underneath IP addresses

Event Timeline

Niharika triaged this task as Medium priority.Mar 4 2020, 10:19 PM

Niharika created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2020, 10:19 PM

Niharika updated the task description. (Show Details)Mar 4 2020, 10:21 PM

Niharika updated the task description. (Show Details)Mar 4 2020, 10:24 PM

Niharika updated the task description. (Show Details)Apr 29 2020, 4:03 AM

Niharika moved this task from Product/Tech backlog to Triage/To be Estimated on the Anti-Harassment board.

ARamirez_WMF renamed this task from CU 2.0: Eventlogging for Special:Investigate to CU 2.0: Eventlogging for Special:Investigate [8Hours].Apr 29 2020, 4:53 PM

ARamirez_WMF set the point value for this task to 3.

ARamirez_WMF moved this task from Triage/To be Estimated to The Letter Song on the Anti-Harassment board.

ARamirez_WMF edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

@dbarratt Added a column for event we want to log per your suggestion.

ARamirez_WMF set Due Date to May 12 2020, 4:00 AM.May 5 2020, 3:52 PM

ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".

jwang subscribed.May 5 2020, 10:24 PM

Niharika changed Due Date from May 12 2020, 4:00 AM to May 19 2020, 4:00 AM.May 7 2020, 3:32 PM

Niharika changed Due Date from May 19 2020, 4:00 AM to May 19 2020, 7:00 PM.

Tchanders claimed this task.May 14 2020, 1:41 PM

Tchanders moved this task from Ready 🎬 (ONLY IF YOU HAVE NO MORE CODE TO REVIEW) to In Progress 💪 on the Anti-Harassment (The Letter Song) board.

cwylo subscribed.May 14 2020, 3:46 PM

How can we track when an investigation "ends"?

We discussed this in our engineering meeting - it's difficult for a few reasons. We can't detect a tab close, but even if we could, it wouldn't necessarily indicate that the investigation was over, since the URL for this investigation could be reused until it expired. Since the feature is new and only users with high engagement use Special:Investigate, we concluded that the best option could be to have a temporary "Done" button (with a message to indicate that clicking it helps analytics).

I think we should be able to answer most questions with the schemas below. I'm not sure whether the best practice is to use fewer schemas or multiple, simpler schemas, but here's an approach that uses fewer.

Schemas

If I've understood https://www.mediawiki.org/wiki/Extension:EventLogging/Guide correctly, these will be added to each record without us specifying them in the schema: https://meta.wikimedia.org/wiki/Schema:EventCapsule - this includes timestamp, and the user's IP and UA.

Back-end schema

property	type	description	required
event	string/enum: 'submit', 'query', 'end', 'block'	'submit' if the user submitted a form, 'query' if a new results page is loaded, 'end' if the user clicks the 'end ' button, 'block' if showing block results	true
targetsCount	int	Number of targets in the request (if nonzero, that means the original form was submitted)	false (present if event is 'submit')
excludeTargetsCount	int	Number of excluded targets in the request	false (present if event is 'submit')
relevantTargetsCount	int	Number of targets that an action is performed on. (NB this can't be calculated from the request.)	false (present if event is 'end', 'block' or 'query')
tab	string/enum: 'preliminary', 'compare', 'timeline'	The current tab	false (present if event is 'query')
startTime	int	Start time for events with a duration	false (present if event is 'query')
resultsCount	int	Number of result rows	false (present if event is 'query')
resultsIncomplete	boolean	Whether the results were incomplete	false (present if event is 'query')

Front-end schema

property	type	description	required
event	string/enum: 'pin', 'tool'	'pin' if the user clicked on a highlight pin, 'tool' if the user clicked on a tool link	true

Implementation details

The 'submit' event could be captured in SpecialInvestigate::onSubmit; 'query' could be captured in SpecialInvestigate::addTabContent; 'end' when the user clicks an 'end' button, and 'block' when the user makes a block, after we know which blocks were successfully made.

The front-end events could be captured in the click handlers, and since neither navigates away from the page, this should be straightforward.

Questions 1 and 2

Since I don't have analytics access, I'm not sure whether question 1 and 2 can just be answered from the data we already capture. @cwylo @jwang Would you know this?

Also, do we want to know how many people visit these URLs (even if they don't have access); how many people land on these pages who can access them; or how many people actually submit a form on these pages? The proposed schemas can answer the last one for Special:Investigate (specifically targetsCount is nonzero only when the first form is submitted).

Questions 3 to 13

For questions that ask what happened per investigation, we may be able to reconstruct whether records came from the same investigation by using their timestamps and IPs...

Question 3: How many users does an investigation start with?

answered by targetsCount

Question 4: How many users are under investigation by the time the investigation ends?

answered by relevantTargetsCount

Question 5: How many records records are fetched per investigation?

resultsCount tells us how many result rows were fetched for each new page of results

Question 6: How often did we display incomplete results to the user?

answered by resultsIncomplete

Question 7: How long did a user spend waiting for the results?

we can time how long the tab takes to load using startTime and the timestamp

Question 8: How often did a user quit the investigation (page close or back button) while the results were being generated?

unsolved - this is a bit tricky, but perhaps we could check if an unload event fires before the DOMContentLoaded event?

Question 9: Over the course of an investigation how much time did a user spend on Preliminary check versus Compare versus Timeline? OR how often did they access the individual tabs over the course of the investigation?

answered by tab, startTime and timestamp

Question 10: Over the course of an investigation how many times did a user use filters and which ones?

answered by excludeTargetsCount
we'd need something similar for time filters, once they are added

Question 11: How many times did a user pin a highlight?

answered by event (front-end schema)

Question 12: How often did the block feature get used and how many users were blocked?

answered by relevantTargetsCount (passing the number of targets successfully blocked)

Question 13: Which tools under the IP address were clicked and how many times?

answered by event (front-end schema)

Tchanders moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.May 19 2020, 2:30 PM

ARamirez_WMF changed Due Date from May 19 2020, 7:00 PM to Jun 2 2020, 7:00 PM.May 21 2020, 3:12 AM

Since I don't have analytics access, I'm not sure whether question 1 and 2 can just be answered from the data we already capture. @cwylo @jwang Would you know this?

I don't believe this is already captured. They do capture pageview data for some special pages but not for sensitive ones. I did a quick check on pageviews tool and it seems like Special:CheckUser does not have any data. Special:Version does, for instance.

Also, do we want to know how many people visit these URLs (even if they don't have access); how many people land on these pages who can access them; or how many people actually submit a form on these pages? The proposed schemas can answer the last one for Special:Investigate (specifically targetsCount is nonzero only when the first form is submitted).

The last one is fine. How many times they submit a form on these pages. Can we also track that for Special:CheckUser?

ARamirez_WMF moved this task from Code Review 🔍 to Done: Q4 (2019-20) on the Anti-Harassment (The Letter Song) board.Jun 3 2020, 5:54 PM

Can we also track that for Special:CheckUser?

We can, although I'd imagine there are likely to be multiple submits per investigation, since the form has to be resubmitted for every type of check (users from this IP, IPs from this user, checking more than one target, etc). On the other hand, the Special:Investigate form is only submitted once at the start of an investigation. (A checkuser could start an investigation again for some reason, but it seems less likely.) That might make these numbers difficult to compare. I'm not sure how to solve this...

Tchanders mentioned this in T255687: Create EventLogging schema for Special:Investigate.Jun 17 2020, 3:38 PM

Tchanders mentioned this in T255688: Instrument Special:Investigate using EventLogging.Jun 17 2020, 3:44 PM

Following a discussion about this, we decided to compare usage of Special:Investigate and Special:CheckUser using the cu_log table. We talked about dropping the 'submit' event from the schema, but it seems we need to keep it for tracking the count of targets and excluded targets.

Tchanders mentioned this in T256888: Add 'End investigation' button to Special:Investigate.Jul 1 2020, 3:46 PM

Niharika closed this task as Resolved.Jul 31 2020, 9:47 PM

jwang mentioned this in T257539: Special:Investigate usage metrics.Sep 9 2020, 6:48 PM

jwang added a comment.Oct 14 2020, 10:56 PM

This comment was removed by jwang.

CU 2.0: Eventlogging for Special:Investigate [8Hours]Closed, ResolvedPublic3 Estimated Story PointsJun 2 2020Actions