Page MenuHomePhabricator

Variables old_wikitext and new_wikitext are blank in Page namespace
Closed, ResolvedPublic

Description

Variables old_size, new_size, and size_diff are all 0; edit_diff is also blank; I assume these are in consequence

This has the consequence of a high number of false positives for page blanking filters.

See, for example, https://en.wikisource.org/wiki/Special:AbuseLog/110096

Details

Related Gerrit Patches:
mediawiki/extensions/ProofreadPage : wmf/1.33.0-wmf.23Use getText instead of getNativeData
mediawiki/extensions/ProofreadPage : masterUse getText instead of getNativeData

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 28 2019, 3:14 PM
beleg_tal updated the task description. (Show Details)Mar 28 2019, 3:15 PM
beleg_tal updated the task description. (Show Details)
Daimona triaged this task as High priority.Mar 28 2019, 3:20 PM
Daimona added a subscriber: Daimona.

If old_wikitext and new_wikitext are blank, any other text-related variable will be wrong. I'd like to investigate it, but I have to say that I don't know what the Page namespace is for, or where it comes from.

Alright, so ProofreadPage is currently incompatible with AbuseFilter, because it uses its own content model, and doesn't tell AbuseFilter how to turn it into plain text. Will fix it shortly, hopefully.

Change 499798 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/ProofreadPage@master] Use getText instead of getNativeData

https://gerrit.wikimedia.org/r/499798

Daimona claimed this task.Mar 28 2019, 3:52 PM
Daimona raised the priority of this task from High to Unbreak Now!.

Dang, I wasn't correct! The problem here is rEABF324d0e6aa3ed9c3f6924960692111839f2e3c8d5. ProofreadPage still used getNativeData but didn't define getText, so AbuseFilter stopped receiving its text. Given that filters have unpredictable behaviour on the PP namespace, given that this behaviour could result in unfair actions on the user (e.g. a block), and given that the fix is very simple, I'm marking this as a train blocker + UBN.

Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptMar 28 2019, 3:52 PM

Change 499801 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/ProofreadPage@wmf/1.33.0-wmf.23] Use getText instead of getNativeData

https://gerrit.wikimedia.org/r/499801

Is this issue related to T219371?

No, this is just a regression coming from the AF patch linked above.

Change 499798 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Use getText instead of getNativeData

https://gerrit.wikimedia.org/r/499798

Ankry added a subscriber: Ankry.Mar 28 2019, 5:54 PM

Does this need back-porting to wmf.23?

Change 499801 merged by Jforrester:
[mediawiki/extensions/ProofreadPage@wmf/1.33.0-wmf.23] Use getText instead of getNativeData

https://gerrit.wikimedia.org/r/499801

Mentioned in SAL (#wikimedia-operations) [2019-03-28T18:32:36Z] <jforrester@deploy1001> Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s)

Ankry added a comment.Mar 28 2019, 7:58 PM

It seems that this fix fixed also T219371
So they were related...

For what concerns T219371: my patch fixed it, but the cause wasn't the AbuseFilter commit above. Instead, I think some other part of the code (related to action=raw) has been changed to use getText instead of getNativeData and broke the same way.