Page MenuHomePhabricator

touch.py makes empty edits on ProofreadPage pages
Open, LowPublic

Description

Wikisource is needing to touch 000000s of files across multiple languages. The edit message currently is in English alone (as I understand it); and it was commented to e that it should be possible for the edit summary to be in the language of the wiki, ie. the -lang setting.

I could of course be missing something within the manual if I am meant to be implementing something when I am running the scripts.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJul 8 2018, 5:28 AM
Dvorapa added a subscriber: Dvorapa.EditedJul 8 2018, 7:53 AM

touch.py should never edit the page, so it should never use the summary. Only old pages (last time edited before 2007) can sometimes be edited as they can contain newlines at the end, which touch.py removes. BTW you can probably use -summary:"Something" parameter if you really need to

Dvorapa closed this task as Declined.Jul 8 2018, 7:56 AM

If you find out some page touch.py edits and uses that summary, please fill in a new task as it should never happen

touch is an edit, and as there has been underlying change in the schema that WS pages it will in essence be an edit. FWIW the interaction of ProofreadPage and Mediawiki has had changes in the page interaction. Also Wikisource transcription processes will often have pages not edited for that many years, so please not be hasty about the age of a page, there are more uses than just Wikipedias.

Billinghurst reopened this task as Open.Jul 8 2018, 7:59 AM

ahem. You may not wish to do the task, that does not make it unnecessary or yours to close. Thanks for that unilateral decision.

Please see T198470 for the task being managed

https://en.wikisource.org/w/index.php?title=Special:Contributions/Wikisource-bot&offset=20180708054415&target=Wikisource-bot
for examples of edits.
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (-118)‎ . . Page:A Book of Dartmoor.djvu/56 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (0)‎ . . Page:A History of Italian Literature - Garnett (1898).djvu/3 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (0)‎ . . Page:A Book of Dartmoor.djvu/55 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (0)‎ . . Page:A History of Italian Literature - Garnett (1898).djvu/2 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:42, 8 July 2018 (diff | hist) . . (0)‎ . . Page:A Book of Dartmoor.djvu/54 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:42, 8 July 2018 (diff | hist) . . (0)‎ . . Page:A History of Italian Literature - Garnett (1898).djvu/1 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]

https://sv.wikisource.org/w/index.php?title=Special:Bidrag/Wikisource-bot&offset=20180708054415&target=Wikisource-bot
for examples of edits.
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (0)‎ . . Sida:NicodemusTessin dy dagbok 1688.djvu/103 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (-124)‎ . . Sida:Danska och norska läsestycken.djvu/143 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:43, 8 July 2018 (diff | hist) . . (-118)‎ . . Sida:Myrberg GT t2.png ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:42, 8 July 2018 (diff | hist) . . (-124)‎ . . Sida:Danska och norska läsestycken.djvu/142 ‎ (Pywikibot touch edit) (current) [rollback 1 edit]
(change visibility) 05:42, 8 July 2018 (diff | hist) . . (-118)‎ . . Sida:Myrberg GT 307.png ‎ (Pywikibot touch edit) (current) [rollback 1 edit]

Is -summary global? It is not listed at https://www.mediawiki.org/wiki/Manual:Pywikibot/Global_Options How does one know which of the options are global? I will try that, thanks.

This is weird, the change has removed 122 bytes, but the diff is empty

Dvorapa renamed this task from i18n of message in touch.py to touch.py makes empty edits on ProofreadPage pages.Jul 8 2018, 8:30 AM
Dvorapa edited projects, added Pywikibot, Pywikibot-Scripts; removed Pywikibot-i18n.

No i18n, the touch.py bot should never make edits like this, it should be fixed directly in the code and by adding workarounds like i18n. Thank you for the examples of edits

Ankry added a subscriber: Ankry.Jul 8 2018, 9:16 AM

This is weird, the change has removed 122 bytes, but the diff is empty

Actually, only the byte counter was updated: it current version the hidden page haeder size is not conted. Only the size of the visible/editable data is reported for the proofread-page content model pages. And this change is an edit as it creates a new revision to store the updated value. I do not think such database update can be made directly in code. There are also changes in other tables related to this edit.

And yes, this is a case similar to the trailing spaces/newlines stripping by a nul-edit.

There are several cases where touch.py actually edits the page, which sounds like it’s way easier to fix it on Pywikibot side. For example, I have encountered such when editing outdated translated pages (MediaWiki-extensions-Translate), apart from the above cases. I don’t understand what downsides it has. Too much work for developers to put the translation in the code? Doesn’t look so, but I can claim it. Too much complexity? The interwiki bot already has i18n, and it can still be used widely. Too much work for the translators? Nothing must be translated, it can be left out if the translators don’t want to do it.

Mpaa added a subscriber: Mpaa.Aug 9 2018, 3:34 PM

I do not see any downside. If you would like to take the task, no issues on my side.

There are several cases where touch.py actually edits the page, which sounds like it’s way easier to fix it on Pywikibot side. For example, I have encountered such when editing outdated translated pages (MediaWiki-extensions-Translate), apart from the above cases. I don’t understand what downsides it has. Too much work for developers to put the translation in the code? Doesn’t look so, but I can claim it. Too much complexity? The interwiki bot already has i18n, and it can still be used widely. Too much work for the translators? Nothing must be translated, it can be left out if the translators don’t want to do it.

Please file a new bug about Translate behavior if you feel so

This is weird, the change has removed 122 bytes, but the diff is empty

Actually, only the byte counter was updated: it current version the hidden page haeder size is not conted. Only the size of the visible/editable data is reported for the proofread-page content model pages. And this change is an edit as it creates a new revision to store the updated value. I do not think such database update can be made directly in code. There are also changes in other tables related to this edit.
And yes, this is a case similar to the trailing spaces/newlines stripping by a nul-edit.

To summarize it: this is a server behavior and Pywikibot can do nothing about as there is no difference between the old and new text in the page. What is missing to do in this task?

There are several cases where touch.py actually edits the page, which sounds like it’s way easier to fix it on Pywikibot side. For example, I have encountered such when editing outdated translated pages (MediaWiki-extensions-Translate), apart from the above cases. I don’t understand what downsides it has. Too much work for developers to put the translation in the code? Doesn’t look so, but I can claim it. Too much complexity? The interwiki bot already has i18n, and it can still be used widely. Too much work for the translators? Nothing must be translated, it can be left out if the translators don’t want to do it.

Please file a new bug about Translate behavior if you feel so

No, I don’t feel so. I feel that Pywikibot should finally accept that this behavior exists. Who knows how many other extensions work like these? Should really all of them be fixed one by one, even if it takes weeks to create the patch? Isn’t it way easier to just allow i18n of this message?

Change 487578 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/i18n@master] [i18n] Translations for touch edits

https://gerrit.wikimedia.org/r/487578

Change 487580 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [i18n] Use touch edit summary from twn

https://gerrit.wikimedia.org/r/487580

Xqt triaged this task as Low priority.Feb 2 2019, 12:15 PM
Xqt removed a project: Patch-For-Review.
Xqt added a project: Pywikibot-i18n.EditedFeb 2 2019, 2:39 PM
Xqt added a subscriber: Xqt.

I accept localizing the edit summary string and made the patches above. Please feel free to review them or make a proposal to change the summary string. It can also be changed later at twn.

The reason for i18n of the touch edit summary is that the null edit made by the bot may lead to real changed at the target pages. This is nothing pywikibot can prevent because the changes comes from mediawiki isself. But we can provide a localized summary string which could explain that issue if this unexpected edit occures. Bot owners are the first ones who are asked about that Problem and we should support them as much as possible.

Dalba added a subscriber: Dalba.Feb 13 2019, 12:48 PM

Just curious, does -purge -forcerecursivelinkupdate also have a similar issue? If not, why don't you use that? Doesn't it do the same job?

Xqt added a comment.Feb 13 2019, 1:29 PM

Just curious, does -purge -forcerecursivelinkupdate also have a similar issue? If not, why don't you use that? Doesn't it do the same job?

As I understood purgeing the cache isn't the same as a null edit. ~~~~

Change 487578 merged by jenkins-bot:
[pywikibot/i18n@master] [i18n] Translations for touch edits

https://gerrit.wikimedia.org/r/487578

Just curious, does -purge -forcerecursivelinkupdate also have a similar issue? If not, why don't you use that? Doesn't it do the same job?

As I understood purgeing the cache isn't the same as a null edit. ~~~~

Purge just renews the cached version of the page. Null edit also updates link tables (and categories) etc. Purge with forced link update as @Dalba suggests should be a valid substitute for null edit without editing the page leading to the errors like this.

Xqt added a comment.Feb 13 2019, 3:23 PM

Purge just renews the cached version of the page. Null edit also updates link tables (and categories) etc. Purge with forced link update as @Dalba suggests should be a valid substitute for null edit without editing the page leading to the errors like this.

In that case we can just rewrite Page.touch() to purge with that forcerecursivelinkupdate parameter, isn't it?

Change 487580 merged by jenkins-bot:
[pywikibot/core@master] [i18n] Use touch edit summary from twn

https://gerrit.wikimedia.org/r/487580

I've put my understanding in a table below.

EffectNull editPurgePurge w/ forcelinkupdatePurge w/ forcerecursivelinkupdate
Clear page's server cache
Rebuild the page
Update links tables
Update links tables for pages that transclude the page

Refs: en:Wikipedia:Purge, mw:Manual:Purge, and mw:API:Purge

Based on that, a null edit is the equivalent of a purge with forcelinkupdate. A purge with forcerecursivelinkupdate is equivalent to what a null edit did before circa July 2013.

Note: forcerecursivelinkupdate appears to be expensive.

For reference, en:User:Joe's Null Bot uses purge with forcelinkupdate.

Perhaps @Tgr or @Anomie could verify that my understanding is correct or point out other differences.

Tgr added a comment.Feb 14 2019, 5:03 AM

Yeah, normal edits cause recursive update, null edits only update the links for the edited page. There are minor internal differences between a null edit and a purge with forced links update (e.g. title cache gets reset), none of that should be user-visible. IMO it makes more sense to use purges by default (you might need to fall back to null edits for old wikis or ones where the write API has been disabled).

Wikisource is needing to touch 000000s of files across multiple languages.

That should probably have its own bug report... extensions are expected to work correctly without constant bot maintenance.

Mpaa added a comment.Feb 14 2019, 8:55 PM

Wikisource is needing to touch 000000s of files across multiple languages.

That should probably have its own bug report... extensions are expected to work correctly without constant bot maintenance.

It was an specific need due to a design change, see T200118, now solved. No constant bot maintenance needed.