Page MenuHomePhabricator

Cannot add a previously used change tag to an abuse filter
Open, NormalPublic

Description

Wikidata has had adding invalid links tag. Since August 21, this tag has become unused by a change of a filter. Now, I want to add this tag to filter 97 but I'm not able to. The error message is One or more of the tags you specified is not valid. Tags should be short, they should not contain special characters, and they should not be reserved by other software. Try choosing a new tag name (abusefilter-edit-bad-tags).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 23 2017, 12:17 PM
Anomie removed a subscriber: Anomie.Aug 23 2017, 4:38 PM

While I created PS1 there, PS1 was limited to just validating the characters used in the tag name. Unfortunately later revisions (in which I had no involvement) to that patch went well beyond that.

[Wc9aqgpAMFoAAETe2kEAAAAS] 2017-09-30 08:50:46: Kritická výjimka typu „Wikimedia\Rdbms\DBQueryTimeoutError“ is another result.

I had the same problem after moving our wiki from one host to another (importing stuff externally, no database dump available), and had the same problem importing filters from the old wiki... Since I had database access on the new wiki I had to use tags with different names, and then ran an update on the table to change them back to what they should be, because the system didn't let me use the same tag names as the old wiki were using...

I'm wondering if it checks if the tag is in use, but it's not currently being used on any filter, it doesn't let you use it even if it was used by a filter before.

[Wc9aqgpAMFoAAETe2kEAAAAS] 2017-09-30 08:50:46: Kritická výjimka typu „Wikimedia\Rdbms\DBQueryTimeoutError“ is another result.

A database query timeout has occurred. 
Error: 2062 Read timeout is reached (10.64.48.25)
#0 /srv/mediawiki/php-1.31.0-wmf.1/includes/libs/rdbms/database/Database.php(979): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#1 /srv/mediawiki/php-1.31.0-wmf.1/includes/libs/rdbms/database/Database.php(1361): Wikimedia\Rdbms\Database->query(string, string)
#2 /srv/mediawiki/php-1.31.0-wmf.1/includes/changetags/ChangeTags.php(1390): Wikimedia\Rdbms\Database->select(string, array, array, string, array)
#3 /srv/mediawiki/php-1.31.0-wmf.1/includes/libs/objectcache/WANObjectCache.php(1012): Closure$ChangeTags::tagUsageStatistics(boolean, integer, array, NULL)
#4 /srv/mediawiki/php-1.31.0-wmf.1/includes/libs/objectcache/WANObjectCache.php(914): WANObjectCache->doGetWithSetCallback(string, integer, Closure$ChangeTags::tagUsageStatistics;2024, array)
#5 /srv/mediawiki/php-1.31.0-wmf.1/includes/changetags/ChangeTags.php(1404): WANObjectCache->getWithSetCallback(string, integer, Closure$ChangeTags::tagUsageStatistics;2024, array)
#6 /srv/mediawiki/php-1.31.0-wmf.1/includes/changetags/ChangeTags.php(1019): ChangeTags::tagUsageStatistics()
#7 /srv/mediawiki/php-1.31.0-wmf.1/extensions/AbuseFilter/includes/Views/AbuseFilterViewEdit.php(72): ChangeTags::canCreateTag(string)
#8 /srv/mediawiki/php-1.31.0-wmf.1/extensions/AbuseFilter/includes/Views/AbuseFilterViewEdit.php(195): AbuseFilterViewEdit->isAllowedTag(string)
#9 /srv/mediawiki/php-1.31.0-wmf.1/extensions/AbuseFilter/includes/special/SpecialAbuseFilter.php(115): AbuseFilterViewEdit->show()
#10 /srv/mediawiki/php-1.31.0-wmf.1/includes/specialpage/SpecialPage.php(522): SpecialAbuseFilter->execute(string)
#11 /srv/mediawiki/php-1.31.0-wmf.1/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(string)
#12 /srv/mediawiki/php-1.31.0-wmf.1/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
#13 /srv/mediawiki/php-1.31.0-wmf.1/includes/MediaWiki.php(851): MediaWiki->performRequest()
#14 /srv/mediawiki/php-1.31.0-wmf.1/includes/MediaWiki.php(523): MediaWiki->main()
#15 /srv/mediawiki/php-1.31.0-wmf.1/index.php(43): MediaWiki->run()
#16 /srv/mediawiki/w/index.php(3): include(string)
#17 {main}

on /wiki/Special:AbuseFilter/97 on wikidatawiki.

Huji added a subscriber: Huji.Dec 30 2017, 4:12 PM

Even when a tag is actively being used by another filter, why should it not be available to another filter? Why shouldn't multiple filters be able to assign the same tag?

I just faced this trouble when testing OOUI tag selector on my wiki. Surely, this is something annoying. The culprit is canAddTagsAccompanyingChange called here. There could be some reasons not to have the same tag applied by different filters at the same time, but I see no reason not to allow users to re-add it again if it was disabled. Actually, there's a tricky way to do it, these are the steps:

  • Create a filter which uses the tag 'testingtag' and save it
  • Edit the filter removing the tag and save. Now it can't be re-added.
  • If you go to Special:tags you'll notice that 'testingtag' is marked as disabled.
  • Manually activate it, it'll became manually applied
  • Go back to the filter; now you can re-add 'testingtag', which will be considered both manual and automatic
  • Going back to Special:tags you can disable it again, leaving only the automatic part.

This should be avoided, and instead we should perform better checks.

BTW, https://gerrit.wikimedia.org/r/#/c/421487/ is stalled until we decide what to do with this. Right now, it makes no sense to provide a list of already defined tags (which isn't really straightforward to generate), and depending on how we decide to fix this, that one will need a specific change. In any case, in deciding how to fix this behaviour we need to make sure that nothing won't break (i.e. there must be a reason if currently this isn't possible).

TTO added subscribers: Anomie, TTO.Apr 19 2018, 12:10 AM

Even when a tag is actively being used by another filter, why should it not be available to another filter? Why shouldn't multiple filters be able to assign the same tag?

It's not set up this way on purpose - it's a bug, hence this task :)

The culprit is canAddTagsAccompanyingChange called here.

I'm pretty sure this is not the right function to be calling. It's intended to check whether users can manually add tags accompanying a change. Perhaps the username parameter being optional has caused confusion - this is intended for the case when you don't have a specific user in mind, but you still need to check whether a generic non-blocked user with suitable permissions can add a particular set of tags. But in any case, the AbuseFilter is software, not a user, so it should not be calling this function.

This makes sense. However I'm not sure about the reason of adding that check: what undesired behaviour is it intended to avoid? Is there some problem somewhere else in the code if we omit that check? To me this isn't clear, and I'd also like to add @Mattflaschen-WMF who wrote the patch.

TTO added a comment.Apr 19 2018, 8:42 AM

This was added by @Anomie in rEABFcec8352e5b8f: Improve tag name validation, wasn't it? Or are we talking about different things?

@TTO Anomie abandoned the commit after PS1. It was then restored and deeply changed by Mattflaschen.

TTO added a comment.Apr 19 2018, 9:46 AM

Ah, I see. I knew there was a sordid history here (the original patch goes back to 2015). In any case, Matt Flaschen's account is disabled, so I don't think he will be commenting here.

@TTO Right. Actually, I find it quite tricky to properly solve the problem here, but my proposal is to rework the way AbuseFilter stores used tag in the following way.
Currently, when fetching used tags we run a query for existing filters which have tagging enabled and aren't deleted, and return the list of so gathered tags. This way, if a tag has been removed from its filter, it won't be listed.
Consequently, when we run AbuseFilterViewEdit->isAllowedTag, this is the situation:

  • $canAddStatus is false because the tag can't be manually added;
  • $alreadyDefinedTags uses the above hook which doesn't include deactivated tags, so the tag isn't listed and the status isn't good;
  • $canCreateTagStatus is then obviously false because the tag already exists.

So the tag is considered invalid. What should happen is that the second check passes and the tag is considered valid. To achieve this, fetchAllTags isn't enough. We need to store used tags somewhere else (some DB table or external storage, like we do for var dumps), and use such storage in fetchAllTags. Not only this will allow more flexibility (= tag reuse), but will probably make fetchAllTags quicker, since it'll only have to query a dedicated and presumably smaller table.
Apart from the change itself (which shouldn't be hard, but quite fragile), we need to decide when tags should be removed from such table (if they should be) and of course adapt existing methods to the new table without breaking anything. I'll try to give it a look in the next days.

P.S. I don't know whether canAddTagsAccompanyingChange should be used here, but now I see it's not its fault and it shouldn't cause any harm to leave it in place, right?

I tried to figure out where we may store our data. External storage doesn't seem suitable, while DB would be an overkill: basically we only need a single cell to store a list of tags. I couldn't find any existing AF table where we may add a dedicated column, so we'd need to create a new table with a single row and a single column. As I was saying, an overkill. However I'm not aware of any alternative, and I'd be glad to hear some. Also note that this would then need DBA approval, as well as a maintenance script to populate the new table.

Daimona claimed this task.Apr 22 2018, 11:53 AM
Daimona triaged this task as Normal priority.
Daimona added a project: DBA.

Actually we also need some other info (at least enabled and global), so the table can't be a single-row one. I'm writing this patch and hope to have it ready in a few days. Adding DBA since we're adding a table and need approval.

TTO added a comment.EditedApr 23 2018, 1:39 AM

Let's just take a step back here: what problem are we trying to solve here? It seems that we need to allow tags used by other abuse filters, but not tags that are manually defined or in use by other extensions or the MediaWiki software.

If the DBAs are happy to allow the queries in fetchAllTags to be run more frequently, I wouldn't advise changing that function too drastically. We would simply need to check as follows:

  1. Is the tag name valid? If not, bail out.
  2. Is the tag is in use by an existing abuse filter? If so, it is OK.
  3. Is the tag not used by an existing abuse filter and listed in ChangeTags::listDefinedTags()? If so, it is not OK. Otherwise, the tag is OK.

No new table needed (unless the DBAs are unhappy about the queries). The call to canAddTagsAccompanyingChange seems misguided, and even the call to canCreateTag is very doubtful (the comment for that function is "Is it OK to allow the user to create this tag?", but AbuseFilter is not a user).

@TTO Not exactly. I agree until (3.), but IMHO there's another check (to be implemented in 2.):

  • Was the tag ever used by an abuse filter? If so, the tag is OK.

The first 3 checks are the ones currently performed, but fetchAllTags won't return a tag which was previously used by AF. I think this is some kind of intentional technical debt, looking at the comment "this is a pretty awful hack". To retrieve all AF-defined tags, we should query abuse_filter_history, which would introduce a performance drop. Or, better, we may store tags in a separate table to track them. I should be able to send a patch tonight, in order to have some concrete basis to build on.

TTO added a comment.Apr 23 2018, 6:15 AM
  • Was the tag ever used by an abuse filter? If so, the tag is OK.

Is that really needed, though? Wouldn't tags that were previously used by an abusefilter, but not currently used by one, get approved by the "Otherwise, the tag is OK" at the very end? Can you provide me a scenario where, running through the procedure, the extra check you're suggesting would be required?

@TTO Yes, it's the main problem for which this task was created :-) However, I didn't clearly understand your point (3.) at first. I just realized that maybe we don't need the extra table. I'll do do some tests later and will let you know.

TTO added a comment.Apr 23 2018, 6:37 AM

@TTO Yes, it's the main problem for which this task was created :-)

I meant a scenario in my new, proposed procedure :)

Glad it makes sense now.

@TTO Indeed :-) However, I changed my mind again. We're currently switching AF to OOUI, and in the editing interface we'd like to provide a list of available tags to apply. With your method, this isn't possible (you cannot list every non-existent tag), while with a separate table it would be enough to list its content. For this specific problem, the 2 solutions are equivalent, they only differ in how things are processed but give the same result (although mine requires a new table). Again, I'll hopefully send the patch later, so that we can discuss looking at that

Adding DBA since we're adding a table and need approval.

If the DBAs are happy to allow the queries in fetchAllTags to be run more frequently,

Could you please provide me to pointers to the code or previous conversations, I am missing lots of context here. Please update the description with the concrete proposed solution, or mark it as "still in discussion" if there is disagreements.

@jcrespo we don't have any code yet, but I'll send my patch later in the afternoon. Then we'll have to discuss about the approach, see if the proposed one is fine and then summarize it here. So yes, this is still in discussion.

TTO added a comment.Apr 23 2018, 12:00 PM

With your method, this isn't possible (you cannot list every non-existent tag)

Regardless of which method you use, if a tag doesn't exist, how can you list it? Are you saying it should be possible to pick old tags that were used by deleted AbuseFilters, or by previous revisions of existing AbuseFilters? That has a number of issues, including that unwanted and misspelt tags would be forever part of the list with no way to remove them.

What I think we really need in the UI is a dropdown list of currently available tags (the same as would be returned by a new, revamped fetchAllTags per my previous comment), combined with an input box where a new tag name can be typed. The input box may be behind a button or "Other..." option in the dropdown (or similar).

But, yes, let's see a patch.

TTO added a comment.Apr 23 2018, 12:04 PM

@jcrespo One of the things we are talking about is the queries in this function. How would you feel if that function was called every time Special:AbuseFilter/* was GETted by a user with relevant permissions (mostly sysops, global sysops, stewards and so on)?

@TTO Right. What if we intersect such dedicated table with the list of tags in special:tags? Unused tags would still be included and if users want to permanently delete a tag they might just delete it there.

TTO added a comment.Apr 23 2018, 12:23 PM

If you code it up I will be able to let you know whether it makes sense :)

Change 428409 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add DB table to store tags and allow their reuse

https://gerrit.wikimedia.org/r/428409

@TTO There we go :) My thoughts are summarized in commit message. The patch itself is still quite WIPpy, but I decided to upload it in order to have something concrete to look at.

Daimona moved this task from Backlog to Change tags on the AbuseFilter board.Aug 26 2018, 2:10 PM