Page MenuHomePhabricator

Edit filter issue still prevents publishing after being fixed
Open, MediumPublic

Description

As reported in this comment a user was prevented from translating the Amazon Aurora article from English to Spanish because an edit filter from Spanish Wikipedia that prevents the insertion of links pointing to Amazon. This was expected. However, after the links to Amazon were removed by the user, the user was still not allowed to publish.

Inspecting the translation with the translation debugger (id: 1396515) shows that there are no links to Amazon or references that may contain them (all links shown in the image for the user final content seem internal):

people.wikimedia.org__santhosh_translation.html(iPad Pro).png (6×1 px, 1 MB)

Event Timeline

Pginer-WMF created this task.
Pginer-WMF moved this task from Needs Triage to Bugs on the ContentTranslation board.

What was the timestamp of the last attempt to save to eswiki? This log entry shows the last SBL hit:

2021-10-12T14:57:36 Fanmixco talk contribs caused a spam blacklist hit on Amazon Aurora by attempting to add http://aws.amazon.com/rds/aurora/.

You mentioned the edit filter, are you using that term generically, or referring to the AbuseFilter extension? (I see no hits on the eswiki AF for this user for what it is worth).

In looking at the screen shot on the linked thread, almost the very first line (a link in the infobox) says "amazon.com" as well. Have they tried publishing this after ensuring that they did not use this term?

In looking at the screenshot some more, it suggests that they may possibly be trying to publish the "machine translation" of the section 0 which includes this link. Can they try using manual translation on this section instead - and not including the SBL link?

image.png (551×718 px, 81 KB)

In looking at the screenshot some more, it suggests that they may possibly be trying to publish the "machine translation" of the section 0 which includes this link. Can they try using manual translation on this section instead - and not including the SBL link?

image.png (551×718 px, 81 KB)

For additional context, the debugger is showing three different columns with different information:

  • Source article. The original content from the source article.
  • Machine Translation. The initial machine translation used as a starting point.
  • User translation. The contents after the user edited the initial translation. This is the content that gets published when the user clicks on the "publish" button.

If you looks at the screenshot, the "User translation" column is empty for the infobox. So the last saved version had that element removed by the user.

What was the timestamp of the last attempt to save to eswiki? This log entry shows the last SBL hit:

2021-10-12T14:57:36 Fanmixco talk contribs caused a spam blacklist hit on Amazon Aurora by attempting to add http://aws.amazon.com/rds/aurora/.

Looking at the timestamps info from the debugger, it seems the initial machine translation was added on October 4, and the modified version removing it was on October 12 (at 14:56). Which would be before the last attempt to save that triggered the abuse filter one minute later (October 12 at 14:57) according to the data you provided.

Screenshot 2021-10-19 at 16.59.25 2.png (634×1 px, 88 KB)

This message from the user seems to indicate that there was an attempt to publish after removing the links.

You mentioned the edit filter, are you using that term generically, or referring to the AbuseFilter extension? (I see no hits on the eswiki AF for this user for what it is worth).

I think Edit filters are the more recent way to refer to the abuse filters. As you see on the linked page, the edit filter info links to the AbuseFilters special page.

Thanks for the update, to be clear in this instance though - do you have any reason to think this is about the AbuseFilter (the "Edit filter" or on eswiki Especial:FiltroAntiAbusos? The error message references the spam filter, and that is recorded in the eswiki sbl log.

Thanks for the update, to be clear in this instance though - do you have any reason to think this is about the AbuseFilter (the "Edit filter" or on eswiki Especial:FiltroAntiAbusos? The error message references the spam filter, and that is recorded in the eswiki sbl log.

The immediate hypotheses that came to mind are the following (in order of what I'd expect to be more likely):

  • Content Translation is sending to publish content that is not the latest version shown in the debugger. That is, some Amazon links from a previous version are leaking into the content to publish (or some content is rendered as a link) and triggering the abuse filter.
  • The Abuse filter definition has some issue. Since the subject of the article is Amazon, maybe some instances of it is matching a regular expression intended for links.
  • There is actually a link to Amazon hidden somewhere in the contents that we are not seeing.

Once users press "publish" the process Content Translation follows is the same as when a page is created from scratch, but there may be some glitch in the previous step where contents are prepared to generate the wikitext. So I think the issue is more likely in the way Content Translation is processing the contents to be published, but we cannot discard some issue in the abuse filter definition for which this content could be an edge case.
With all the above data, I think the next step should be for the engineers in our team to take a closer look to the publishing process for this content.