Page MenuHomePhabricator

Bring back abuse_filter_history view
Open, NormalPublic

Description

Was dropped for T123895, should be brought back carefully.

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Cloud-Services.
yuvipanda added a subscriber: yuvipanda.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 18 2016, 10:22 PM
mxn added a subscriber: mxn.Jan 19 2016, 8:22 AM
chasemp triaged this task as Normal priority.May 31 2016, 3:14 PM
Steinsplitter awarded a token.
Krenair added a comment.EditedMar 5 2019, 11:17 AM

Since I was asked about this last night by user meshvogel (no phab?) on IRC, this resources should help someone write a patch:
Need to submit a change to gerrit against this file: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml
Using https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/abusefilter.tables.sql#63 as the original schema
<Krenair> you'd probably need to respect afh_deleted, and join against abuse_filter to check filter visibility?
There might be additional security checks required that I don't know about. It may be worth reading the AbuseFilter code to understand the correct restrictions that should be applied.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Mar 5 2019, 4:16 PM

Now I'm on phabricator as well (had forgotten to associate it with the ldap account).
I'll be willing to have a try at writing a patch since I really do need the table for a research project.

The description of this ticket also signals we should be mindful of whatever happened in T123895, however I don't have the rights to view it. Can someone point me to where I can request to be given corresponding rights?

Change 498773 had a related patch set uploaded (by Meshvogel; owner: Meshvogel):
[operations/puppet@production] db::views: Bring back abuse_filter_history table

https://gerrit.wikimedia.org/r/498773

Okay, I tried writing a patch as it is. I took care that hidden filters' comments and patterns are not replicated. Otherwise, nothing potentially privacy disturbing came to my mind. I'm happy to hear your feedback (and have the table back^^)

Bstorm added a subscriber: Bawolff.Apr 4 2019, 2:53 PM
Bstorm added a subscriber: Bstorm.
bd808 added a subscriber: bd808.

Adding Security-Team to the tags here and explicitly pinging @Bawolff as the person who had been helping make determinations of public/private data in the Wiki Replica views. Fundamentally we need an official evaluation of what data in the abuse_filter_history table is ok to make public and what if any data needs to be either nulled in the sanitarium layer or masked in the view layer.

Daimona added a subscriber: Daimona.May 9 2019, 3:46 PM

(Copying my comment from gerrit)

I don't really know how things are usually done for quarry, but this change isn't convincing.
It's definitely super-useful to have this table available on quarry (thanks Meshvogel!), given how many times I haven't been able to run some queries on quarry due to its absence.
However, the situation for private filters on-wiki is much different: there, we just hide everything. For instance, nonprivileged people cannot associate filter IDs and names, cannot tell who last changed a filter and when, etc.
I also have to note that there are some inconsistencies, see T21005. Moreover, if we have a tech vandal who's clever enough to scrape quarry in order to retrieve private filters data, then it's likely that filters won't be of much help against them anyway.
Nevertheless, I'd be happier with some more restrictions. Which ones exactly, I don't know. To be paranoid, I can think of (very mild) ways to exploit almost all of those columns, aside from the filter id.

In short, I think this needs further discussion on phab before moving on.

bd808 added a subscriber: JBennett.Jul 16 2019, 2:40 AM

Adding Security-Team to the tags here and explicitly pinging @Bawolff as the person who had been helping make determinations of public/private data in the Wiki Replica views. Fundamentally we need an official evaluation of what data in the abuse_filter_history table is ok to make public and what if any data needs to be either nulled in the sanitarium layer or masked in the view layer.

@JBennett is there anyone on the Security-Team that can investigate/review exposing new tables and columns in the Wiki Replicas these days?

@bd808 - there'd been some long-running (and kind of incomplete) work on this here, I believe: T103011, particularly T103011#3536648. And some more work here: T169097. Regarding r498773, specifically, it looks like @Bawolff had labeled that Safe to replicate but requires view-based redaction. I'll plan to book some time today or tomorrow with @JFishback_WMF to further review.

@bd808 - there'd been some long-running (and kind of incomplete) work on this here, I believe: T103011, particularly T103011#3536648. And some more work here: T169097. Regarding r498773, specifically, it looks like @Bawolff had labeled that Safe to replicate but requires view-based redaction. I'll plan to book some time today or tomorrow with @JFishback_WMF to further review.

Should you have any doubt, please ask. Regarding my comment above, I'm still unsure about what to do here, since we have way too many levels of data protection:

  • In the WebUI we hide everything to nonprivileged people
  • The API may have different restrictions (either broader, stricter, or depending on the field)
  • We plan to make some of this data public anyway (that should be T21005)

For the abuse_filter_history, I'd definitely be fine with hiding the whole row if afh_flags contains 'hidden', if we need a "quick" solution. More specifically, some data should never ever be available (pattern and comments, mostly). The rest of the data can probably be made visible. I'm a bit unsure about that, mainly due to the discrepancies above, the fact that several (I think 3) user rights are involved in determining what to show, and the fact that I can't recall exactly what is shown to who.

I'd definitely be fine with hiding the whole row if afh_flags contains 'hidden'

I'm a little ignorant on parts of this, but is that feasible within maintain-views.yaml? Seems not to support conditional logic like this on its face. So where would we need to control this?

Anyhow, I've booked some time with @JFishback_WMF tomorrow to review. I imagine that might result in us posting more questions here or on the patch set :)

I'd definitely be fine with hiding the whole row if afh_flags contains 'hidden'

I'm a little ignorant on parts of this, but is that feasible within maintain-views.yaml? Seems not to support conditional logic like this on its face. So where would we need to control this?

Yep, that's what the patch is already doing using LOCATE + SQL conditional.

Anyhow, I've booked some time with @JFishback_WMF tomorrow to review. I imagine that might result in us posting more questions here or on the patch set :)

Sure, I should be around as long as my timezone allows it. In case you'll need a quick response during the review, we can try to find another medium.

sbassett added a comment.EditedJul 16 2019, 7:24 PM

Yep, that's what the patch is already doing using LOCATE + SQL conditional.

Ah, ok, so the afh_pattern and afh_comments cols are of the most concern here.

Sure, I should be around as long as my timezone allows it. In case you'll need a quick response during the review, we can try to find another medium.

Ok, I might reach out to you on irc to maybe set something up. Might be the best approach given your AF domain knowledge :) Thanks.

Ah, ok, so the afh_pattern and afh_comments cols are of the most concern here.

Yes, I believe this is one of the few things everyone would agree with.

Ok, I might reach out to you on irc to maybe set something up. Might be the best approach given your AF domain knowledge :) Thanks.

Sure, feel free to :-) Note that I should be available here and on IRC at the same times, so it depends on what time are you meeting at. Either way, I should be around for most of the day UTC+2.

Per conversations w/ @Daimona and @JFishback_WMF today, provided soft +1 with qualifiers on r498773.