Page MenuHomePhabricator

Instrument Special:Investigate using EventLogging
Closed, ResolvedPublic3 Estimated Story Points

Description

Gather data on the adoption and usage of Special:Investigate.

The questions we would like to answer, and how these can be answered by the schema, are outlined in T246951.

The schema is here: https://meta.wikimedia.org/wiki/Schema:SpecialInvestigate

Which actions to log

The schema defines the following actions to be logged:

  • submit
  • query
  • end[1]
  • block
  • pin[2]
  • tool[2]

[1] Logging the 'end' action is dependent on T256888: Add 'End investigation' button to Special:Investigate. As discussed with @Niharika, there are a number of questions to resolve on that task, so logging that action is not included in this task.

[2] These are logged from the client, so we'll add this in a separate task: T257031

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJun 17 2020, 3:44 PM
ARamirez_WMF set the point value for this task to 3.Jun 17 2020, 4:13 PM
Tchanders updated the task description. (Show Details)Jul 1 2020, 5:08 PM
Tchanders added a subscriber: Niharika.

Change 609150 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/CheckUser@master] WIP Add instrumentation for Special:Investigate

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/ /609150

Tchanders added a subscriber: dbarratt.

@dbarratt This is good to be reviewed now. It looks like front-end logging is more complicated, and will need to be handled in the WikimediaEvents extension. Since that will require an extra patch and wasn't considered when we estimated this, I've separated that into a separate task: T257031

Tchanders updated the task description. (Show Details)Jul 3 2020, 10:00 AM
Tchanders updated the task description. (Show Details)Jul 8 2020, 12:33 PM

Change 609150 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Add instrumentation for Special:Investigate

https://gerrit.wikimedia.org/r/609150

@Tchanders I find when I submit the "Filters" form during an investigation, the targetsCount parameter is always 0. Is this deliberate?

I notice when POSTing the Filters form, the targets parameter is blank.

In https://meta.wikimedia.org/wiki/Schema:SpecialInvestigate the description for targetsCount is: "Number of targets in the request (if nonzero, that means the original form was submitted). Present if action is 'submit' or 'block'." I am not sure I understand this properly. Should this instead say ...(if zero...?

Can you trace the event back to the original form that was submitted (if that would be even necessary)?

I also seeing if I add a new target to the investigation or submit a filter, two "query" action log events get generated.

I only see one when I first submit an investigation.

For example, adding an IP to an investigation (clicking "Show all users on this IP")

{
  "timestamp": 1594731654,
  "revision": 20261100,
  "uuid": "3b031aec33145b749f39ba228053881d",
  "wiki": "wiki",
  "schema": "SpecialInvestigate",
  "seqId": 38,
  "event": {
    "targetsCount": 1,
    "excludeTargetsCount": 0,
    "action": "submit"
  },
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0",
  "webHost": "localhost:8888",
  "recvFrom": "b2742f8e73f1"
}
{
  "timestamp": 1594731654,
  "revision": 20261100,
  "uuid": "d6b66e3d292255718403a6005663bf82",
  "wiki": "wiki",
  "schema": "SpecialInvestigate",
  "seqId": 39,
  "event": {
    "queryTime": 15,
    "resultsCount": 1,
    "relevantTargetsCount": 1,
    "tab": "compare",
    "action": "query",
    "resultsIncomplete": false
  },
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0",
  "webHost": "localhost:8888",
  "recvFrom": "b2742f8e73f1"
}
{
  "timestamp": 1594731654,
  "revision": 20261100,
  "uuid": "26b23b67dd2a5b19b5b3b50b76ac966c",
  "wiki": "wiki",
  "schema": "SpecialInvestigate",
  "seqId": 40,
  "event": {
    "queryTime": 15,
    "resultsCount": 1,
    "relevantTargetsCount": 2,
    "tab": "compare",
    "action": "query",
    "resultsIncomplete": false
  },
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0",
  "webHost": "localhost:8888",
  "recvFrom": "b2742f8e73f1"
}

@Tchanders I find when I submit the "Filters" form during an investigation, the targetsCount parameter is always 0. Is this deliberate?

This was intentional. The idea was that you could tell whether it was the original form, based on whether targetsCount is 0. However, I hadn't accounted for adding more targets, in which case targetsCount will also be nonzero (it'll be 1).

In https://meta.wikimedia.org/wiki/Schema:SpecialInvestigate the description for targetsCount is: "Number of targets in the request (if nonzero, that means the original form was submitted). Present if action is 'submit' or 'block'." I am not sure I understand this properly. Should this instead say ...(if zero...?

The documentation is incorrect, for the reasons above... I'll fix it.

Can you trace the event back to the original form that was submitted (if that would be even necessary)?

I was assuming we'd use the identifying data plus timestamps if we ever wanted to do that.

I also seeing if I add a new target to the investigation or submit a filter, two "query" action log events get generated.

I'm not sure what's going on here. Adding an IP should only call SpecialInvestigate::addTabContent once, which should only call SpecialInvestigate::logQuery once. The fact that relevantTargetsCount increments looks like two IPs are getting added, but there's only one submit event. Was there definitely no second submit event?

dom_walden added a comment.EditedJul 27 2020, 8:10 AM

I also seeing if I add a new target to the investigation or submit a filter, two "query" action log events get generated.

I'm not sure what's going on here. Adding an IP should only call SpecialInvestigate::addTabContent once, which should only call SpecialInvestigate::logQuery once. The fact that relevantTargetsCount increments looks like two IPs are getting added, but there's only one submit event. Was there definitely no second submit event?

I am pretty sure there is no second submit.

What I am seeing when adding a new target:

  • First, a POST to ?title=Special:Investigate/IPs_%26_User_agents&token=..., which returns a 302
  • Then a GET to the same URL (but I think a different token)

I see this locally and on testwiki.

Perhaps this accounts for the two query events.

I notice whenever I refresh the page it fires another query event.

Weird that the two query events have different parameters, though.

What I am seeing when adding a new target:

  • First, a POST to ?title=Special:Investigate/IPs_%26_User_agents&token=..., which returns a 302
  • Then a GET to the same URL (but I think a different token)

I believe the POST should only be updating the token and performing the redirect, I don't think it is performing a query... so that request shouldn't be logged (afaik).

I am pretty sure there is no second submit.

What I am seeing when adding a new target:

  • First, a POST to ?title=Special:Investigate/IPs_%26_User_agents&token=..., which returns a 302
  • Then a GET to the same URL (but I think a different token)

I see this locally and on testwiki.

Perhaps this accounts for the two query events.

Having look into this properly, @dom_walden is correct about what's happening - filed as T259032.

I notice whenever I refresh the page it fires another query event.

Weird that the two query events have different parameters, though.

It makes sense that this is happening now given the above task - in short we shouldn't be making that first query.

Now, when submitting a new target or filter on an investigation, one "submit" and one "query" event is logged, for example:

{
  "event": {
    "action": "submit",
    "targetsCount": 1,
    "excludeTargetsCount": 0
  },
  "schema": "SpecialInvestigate",
  "revision": 20261100,
  "clientValidated": true,
  "wiki": "wiki",
  "webHost": "dev.wiki.local.wmftest.net:8080",
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0"
}
...
{
  "event": {
    "action": "query",
    "relevantTargetsCount": 2,
    "tab": "compare",
    "resultsCount": 3,
    "resultsIncomplete": false,
    "queryTime": 106
  },
  "schema": "SpecialInvestigate",
  "revision": 20261100,
  "clientValidated": true,
  "wiki": "wiki",
  "webHost": "dev.wiki.local.wmftest.net:8080",
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0"
}

Also when launching a new investigation one "submit" and one "query" is logged (similar to above).

When switching to another tab (or next page on the same tab) one "query" event is logged. This is an example when switching to the Preliminary Check tab:

{
  "event": {
    "action": "query",
    "relevantTargetsCount": 2,
    "tab": "preliminary-check",
    "resultsCount": 3,
    "resultsIncomplete": false,
    "queryTime": 150
  },
  "schema": "SpecialInvestigate",
  "revision": 20261100,
  "clientValidated": true,
  "wiki": "wiki",
  "webHost": "dev.wiki.local.wmftest.net:8080",
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0"
}

Note: When the number of results is greater than the user's results limit, resultsCount is one more than the number of results shown. To check whether or not to paginate results, the database queries get 1 more row than the user's limit. We only show the limited number of rows in the results, but resultsCount counts the number of rows returned by the query.

When submitting a block via Special:InvestigateBlock one "block" event is submitted, such as:

{
  "event": {
    "action": "block",
    "targetsCount": 2,
    "relevantTargetsCount": 2
  },
  "schema": "SpecialInvestigate",
  "revision": 20261100,
  "clientValidated": true,
  "wiki": "wiki",
  "webHost": "dev.wiki.local.wmftest.net:8080",
  "userAgent": "Mozilla/5.0 (X11; Linux ppc64le; rv:68.0) Gecko/20100101 Firefox/68.0"
}
Niharika closed this task as Resolved.Jul 30 2020, 6:09 PM

Note: When the number of results is greater than the user's results limit, resultsCount is one more than the number of results shown. To check whether or not to paginate results, the database queries get 1 more row than the user's limit. We only show the limited number of rows in the results, but resultsCount counts the number of rows returned by the query.

Thanks for catching this. I don't foresee this being an issue but it's good to know.