Investigation into AntiSpoof maintenance [4H]
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Niharika
	Oct 13 2020, 6:54 PM

Description

We've been asked to be code stewards for AntiSpoof extension. Before we take on this responsibility, we should do research to understand what the extension does and assess how much investment it would be to maintain this extension.

We would also want to look into https://www.mediawiki.org/wiki/Equivset as part of this investigation because AntiSpoof heavily relies on it.

Related Objects

Mentioned Here: T65217: Augment our AntiSpoof normalization data with Unicode/CLDR data
T174197: Split off AntiSpoof equivset generation and string normalization into its own library
T178010: missing character equivalencies: ÈÉÊẼÌÍÏÓÒÔÕ∅Q̃ÚŰÜŨ
T179834: Make sure mappings from AntiSpoof were all moved over to the new Equivset library
T185154: AbuseFilter (and dependencies): code stewardship review
T191736: Bundle AntiSpoof extension with MediaWiki
T246353: Investigate and mitigate trivial bypass to AntiSpoof

Event Timeline

Niharika triaged this task as Medium priority.Oct 13 2020, 6:54 PM

Niharika created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2020, 6:54 PM

ARamirez_WMF renamed this task from Investigation into AntiSpoof maintenance to Investigation into AntiSpoof maintenance [4H].Oct 14 2020, 4:37 PM

Niharika renamed this task from Investigation into AntiSpoof maintenance [4H] to Investigation into AntiSpoof maintenance.Oct 14 2020, 4:38 PM

Niharika renamed this task from Investigation into AntiSpoof maintenance to Investigation into AntiSpoof maintenance [4H].

Niharika added a project: AntiSpoof.

Niharika updated the task description. (Show Details)

Niharika moved this task from Triage/To be Estimated to Cards ready for development on the Anti-Harassment board.Oct 21 2020, 4:21 PM

Just to give some context, AntiSpoof has two main use cases (which don't always align perfectly well):

It is used by AbuseFilter to help filter for vandalism in edits.
It is used to prevent the registration of usernames that closely match existing usernames.

ARamirez_WMF moved this task from Cards ready for development to The Letter Song on the Anti-Harassment board.Nov 5 2020, 6:19 PM

ARamirez_WMF edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

ARamirez_WMF moved this task from The Letter Song to Cards ready for development on the Anti-Harassment board.

ARamirez_WMF edited projects, added Anti-Harassment; removed Anti-Harassment (The Letter Song).

ARamirez_WMF moved this task from Cards ready for development to The Letter Song on the Anti-Harassment board.Nov 23 2020, 7:16 PM

ARamirez_WMF edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

STran claimed this task.Nov 24 2020, 9:26 PM

STran moved this task from Ready 🎬 (ONLY IF YOU HAVE NO MORE CODE TO REVIEW) to In Progress 💪 on the Anti-Harassment (The Letter Song) board.

I looked into both AntiSpoof and Equivset and to a lesser degree, AbuseFilter (since it uses Equivset as well).

I looked at GH's metrics on the codebases (frequency of commits, lines +/- over time, and what the commits were)
For AntiSpoof:
https://github.com/wikimedia/mediawiki-extensions-AntiSpoof/graphs/commit-activity
https://github.com/wikimedia/mediawiki-extensions-AntiSpoof/graphs/code-frequency
https://github.com/wikimedia/mediawiki-extensions-AntiSpoof/commits/master

For Equivset:
https://github.com/wikimedia/Equivset/graphs/commit-activity
https://github.com/wikimedia/Equivset/graphs/code-frequency
https://github.com/wikimedia/Equivset/commits/master

Based on these charts, it seems like neither of these libraries see much activity (possibly a result of not having code stewards?) but also that they seem to be, on a code level, doing mostly what they're expected to do. Before 2017, AntiSpoof and Equivset were the same thing (https://phabricator.wikimedia.org/T174197). AntiSpoof in its current iteration is a wrapper around Equivset functionality and as such is a fairly simple library. Equivset was taken from AntiSpoof and has apparently not seen any major changes in how it functions (the bump in 2018 is the addition of math character mappings: https://github.com/wikimedia/Equivset/commit/4464b4454b48fe2d79b6b84f0810394d6db6b776) If I look at the phab spaces for them, I can also see that (compared to, say, AbuseFilter) there aren't any critical/high priority issues with either library.

AntiSpoof phab project: https://phabricator.wikimedia.org/project/profile/257/
Equivset phab project: https://phabricator.wikimedia.org/project/profile/3068/

There was a code stewardship review earlier that reviewed AbuseFilter, AntiSpoof, and Equivset: https://phabricator.wikimedia.org/T185154 Everyone agreed that AbuseFilter was very critical and uh there was not necessarily an opinion on AntiSpoof or Equivset.

Since then, the statuses for AntiSpoof/Equivset have not changed that much - There are 11 open tasks in Equivset (https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-5i5bluezh4htet5xjkd4&statuses=open()&group=none&order=newest#R) and 28 open tasks (27 if you exclude this investigation) in AntiSpoof (https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-sx5kds2srrtinusmybsj&statuses=open()&group=none&order=newest#R).

In AntiSpoof, there are 8 issues that need triage, 8 medium priority, and 11 low priority. 5 issues in AntiSpoof also have the Equivset tag and mostly relate to deciding whether or not to add new character equivalencies. AntiSpoof will (eventually?) be bundled with mediawiki (https://phabricator.wikimedia.org/T191736). In Equivset, 6 issues need triage, 2 are medium priority, and 3 are low priority.

Skimming both backlogs, it seems like there's discussion on whether or not AntiSpoof is calibrated corectly, what to add to Equivset, and whether or not Equivset is the correct system for AntiSpoof (I see confusables.txt floated around a lot and there is some interesting discussion here: https://phabricator.wikimedia.org/T65217). As far as I can tell, none of this discussion has led to any further investigation. It seems like we agree that it will not be a perfect system but we haven't agreed on what sort of imperfect we want. I'd like to bring up this point from Huji as a concern (https://phabricator.wikimedia.org/T246353#6215366) - since we don't have robust tests, it would be difficult to make changes and know they didn't cause regressions (https://phabricator.wikimedia.org/T179834).

I was curious how (and if) changes to equivset would affect AbuseFilter since kaldari mentions that equivset is more tailored toward AbuseFilter rather than AntiSpoof (https://phabricator.wikimedia.org/T246353#6533626). I tried doing a search based on tags but didn't surface much. However, reading through the comments on some of these issues, it seems like if we could improve the filtering system, we would be making moderators' lives easier (example from ptwiki: https://phabricator.wikimedia.org/T178010#3680739) and possibly improve performance (some discussion around that: https://phabricator.wikimedia.org/T185154#3928483).

Also potentially of interesting, AbuseFilter is undergoing an overhaul: https://phabricator.wikimedia.org/project/view/4939/

I think there could be work mostly focused around improving code health (tests, regressions, and performance) but it doesn't seem like these things have become critical painpoints (yet?).

STran moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.Nov 26 2020, 6:25 AM

@kaldari We discussed this as a team today and agreed to take on code stewardship for AntiSpoof and Equivset. Equivset is also a dependency for AbuseFilter and there is a higher potential for bugs/feature requests with it. Flagging that in case there might be bugs/feature requests that we are unable to take on because of our existing workloads.

I've updated the Developers/Maintainers list.

kaldari moved this task from Code Review 🔍 to Done: Q2 (2020-2021) ✅ on the Anti-Harassment (The Letter Song) board.Dec 9 2020, 11:52 PM

Niharika closed this task as Resolved.Dec 10 2020, 2:24 PM

Investigation into AntiSpoof maintenance [4H]Closed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Investigation into AntiSpoof maintenance [4H]
Closed, ResolvedPublic
Actions