Page MenuHomePhabricator

Generate report via logstash of pl.wikipedia thanks-notification rate limiting
Closed, ResolvedPublic5 Estimated Story Points

Description

Look at the logstash of thanks-notification throttling on pl.wikipedia (since June 30, 2017)

  • How many users per day hit the threshold
  • How many of these users have 0 edits vs. 1+ edits
  • How else can we determine if these accounts are Wikinger (the harasser responsible for the Thanks spam) vs 'good' accounts?

Event Timeline

dbarratt changed the visibility from "Custom Policy" to "Public (No Login Required)".

I don't see any reason this should be private, so making public.

dbarratt set the point value for this task to 5.Aug 31 2017, 5:06 PM

I think T169268#3556789 was the reason that this task was set private.

I think T169268#3556789 was the reason that this task was set private.

I see, well let me clarify:

How many users per day hit the threshold

This will be a number and not include any personally identifiable information.

How many of these users have 0 edits vs. 1+ edits

This will also be a number and not include any personally identifiable information.

How else can we determine if these accounts are Wikinger (the harasser responsible for the Thanks spam) vs 'good' accounts?

I don't know what this would be, but it looks like it would be some sort of filter on the first number to produce a lower number.

I think research into aggregated personal information should be public so everyone is aware of what data is being queried and how we are using that data.

Regardless, if the results of that query should remain private, we can always send the results directly to only those who need to see it.

I just wanted to point it out because of

I don't see any reason this should be private, so making public.

so that no mistakes happen. It doesn't necessary align with my opinion about this.

dbarratt changed the task status from Open to Stalled.Sep 3 2017, 10:02 PM

I am able to access https://logstash.wikimedia.org now, but I'll need to use Logstash's API in order to get the stats you're looking for, and I'm not sure how to do that (it doesn't seem to be in the documentation anywhere).

Apparently the logs can also be accessed directly on mwlog1001.eqiad.wmnet in the folder /srv/mw-log/

dbarratt changed the task status from Stalled to Open.Sep 4 2017, 3:29 PM

I figured out how to access the Logstash API and updated the docs.

Here's your report. The first and last day will be incomplete.

+------------+------+--------------+------------------------------+-------------------------------------+
| Date       | Hits | Unique Users | Unique Users with Zero Edits | Unique Users with One or More Edits |
+------------+------+--------------+------------------------------+-------------------------------------+
| 2017-08-05 | 2    | 1            | 1                            | 0                                   |
| 2017-08-07 | 1    | 1            | 0                            | 1                                   |
| 2017-08-08 | 8    | 6            | 3                            | 3                                   |
| 2017-08-10 | 8    | 1            | 1                            | 0                                   |
| 2017-08-13 | 2    | 2            | 0                            | 2                                   |
| 2017-08-25 | 1    | 1            | 0                            | 1                                   |
| 2017-08-28 | 3    | 3            | 1                            | 2                                   |
| 2017-09-02 | 1    | 1            | 1                            | 0                                   |
| 2017-09-03 | 1    | 1            | 0                            | 1                                   |
+------------+------+--------------+------------------------------+-------------------------------------+

Days that are missing had zero hits.

I wrote a small script that aggregates this data from a logstash search. I can put the source code somewhere if anyone would like to audit how I came up with this.

Thank you, David. I'll continue the conversation on the parent ticket.

I wrote a small script that aggregates this data from a logstash search. I can put the source code somewhere if anyone would like to audit how I came up with this.

It could be useful in the future, can you put it into a Phabricator paste and link it here? (https://phabricator.wikimedia.org/paste/)