Page MenuHomePhabricator

Adding a "Potential COI" alert to the feed
Open, Needs TriagePublic5 Story Points

Description

As a Page Curation user, I want to know if a new page created by user triggered Abuse Filter 148 or 149, so that I can determine if there may be some COI or notability issues for me to investigate.

Acceptance Criteria:
• In the page curation list, an article should have be flagged if the user triggers Abuse Filter 148 or 149 (due to correlation between page name and username)

Background: The page curation list should show if a new page could have a potential COI issue or notability due to someone being a close subject. It should detect the tag or use a filter judging by things such as the username. Example a page created called "ExampleIncorporated" was made by a user called "JohnatExampleInc". A match program could be used to detect if a COI issue could be a problem with the page.

This could potentially be accomplished simply by flagging any new article where a user triggered Abuse Filter 148 or 149, which already make matches between usernames and titles of articles/content of edits: (Please see https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=148 and https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=149).

I'm not certain if the above filters trigger on pages in the draft space (I don't see any draft space pages in the lists at the moment), if not they should either be added to the abuse filters, or new abuse filters made for draft space so that the AfC new pages feed can get the same benefits from flagging potential COI's

Note that this should be a filterable criteria as well.

Requested here: https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/Suggested_improvements#81._Adding_a_%22Potential_COI%22_alert_to_the_feed

Event Timeline

Restricted Application added a project: Growth-Team. · View Herald TranscriptOct 23 2018, 3:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ifried added a subscriber: ifried.Jul 17 2019, 8:50 PM
kostajh moved this task from Inbox to Triaged but Future on the Growth-Team board.Jul 18 2019, 1:07 AM
kostajh added a subscriber: kostajh.

Growth-Team will not have time to work on this in the short-to-medium term.

ifried updated the task description. (Show Details)Jul 23 2019, 11:55 PM

Is there anyway to not hardcode the filters that triggers this alert? To have it feed from a template or a page or some other method that would let en wiki editors maintain this rather than WMF? This way if 148 and 149 get supplemented or supplanted we're not locked into old information?

I was going to advise against using AbuseFilter at all. We probably shouldn't have that kind of dependency in Page Curation, especially the reliance on the implementation of a filter which we can't control (later it could get re-purposed but we still assume it's COI-related).

Those two filters do rudimentary checks -- just remove special characters from the username (asterisks, parentheses, etc.) and see if it's in the article title or external links, and vice-versa. We should be able to do this on our own, and similarly limit the checks to "newcomers" via User::getExperienceLevel(). There will of course be false positives, but this is why it's flagged as a "possible issue", just like the ORES and copyvio checks.

@MusikAnimal That works too and makes sense to me (but I am going to add the idea of wiki ownership over how the tags are applied for some future wishlist because I do think it's a good idea overall).

ifried set the point value for this task to 5.Aug 1 2019, 5:30 PM
ifried added a comment.EditedAug 1 2019, 10:51 PM

@Insertcleverphrasehere @Barkeep49 We've been able to discuss this ticket as a team. We can most likely take on this work, if there are some adjustments to the scope.

Here is what we propose:

  1. We indicate if there is a match between the username and article title.
  2. We don’t indicate if there is a match between the username and external links in the article. This is due to technical complexities, which would make it difficult to consistently and accurately provide useful data. We came to this conclusion after discussing username + link matching in greater depth. If you would like more technical details, we can certainly share them.
  3. Since this work will specifically check one form of potential abuse, we think this feature should be renamed. Rather than calling it “Potential COI” alert, we can call it “Username in Article Title."

With this in mind, we have two questions for you:

  1. If we go with this proposal, will this be satisfactory? Or do you feel that it’s not useful in its current scope?
  2. If we go with this proposal, do you prefer that we only check new users (i.e. the current behavior of AbuseFilter 148) or all non-autopatrolled users? If we choose the latter, this may give the “Username in Article Title” some additional functionality that is not found in the current AbuseFilters.

Thanks!

This will need some community feedback. I will post the querry to WT:NPR but please feel free to do so yourself. When MMiller was working on this he was active there which I know was rather appreciated.

@Barkeep49 Great suggestion. I just posted the proposal to the Page Curation and New Pages Feed Improvements Talk page. I look forward to the feedback. Thanks!

It's better than nothing for sure. If the links edit filter part can't be done it can't be done I suppose. Support the name change as a result.

@Insertcleverphrasehere Is is though? At some point we'll have too many indicators and clutter. COI is definitely worthy of inclusion. Is this pared down version also worthy? I'm less sure. I'm curious what the discussion at https://en.wikipedia.org/wiki/Wikipedia_talk:New_pages_patrol/Reviewers says

Yes, it's better than nothing and very useful, but I am also concerned that every new piece of page meta information we now start adding to the entries in the feed will not only introduce clutter to add to a patroller's bewilderment - I'm thinking here in terms related to Banner Blindness, a phenomenon well researched by information scientists, well before the advent of the Internet.
I suggest the info be displayed as:

'Potential issues: COI'

The addition of many more snippets of information may also lead to a slowing down of the loading/rendering of the feed.

Since this ticket is broader in scope, I have created a new ticket to specifically address the 'Username in title' alternative: T233115