Page MenuHomePhabricator

Transferbot.py script could stop for many reasons, is it possible to just skip and continue ?
Open, LowPublicFeature

Description

When transfering a list of pages with transferbot.py, the script could stop for many reasons like

  • Presence of {{nobots}} or {{bots|deny}} template
  • Illegal characters ( that are not allowed on page name) where found on the page name # < > [ ] _ { | } (Possible when tranfertbot get pages title from a text file)
  • Page is too long or contains too many images (100+ either as classic call [[File: or in <gallery>) and we get the error message Maximum retries attempted without success.
  • Page has been deleted after script was launched
  • Content has been blocked by SpamBlackList Extension (if installed)
  • History is too big and display the message : pywikibot.exceptions.OtherPageSaveError: Edit to page en:Edithistory:Draft:Sandbox failed: The content you supplied exceeds the article size limit of 2048 kilobytes
  • another reason ?

When one of this problem occurs all remaining pages are not transfered is it possible to just skip the problematic page(s) and continue with remaining one ?

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald TranscriptMay 19 2019, 12:21 PM

It should be easy to solve :)

Xqt triaged this task as Low priority.May 19 2019, 3:23 PM
Xqt changed the subtype of this task from "Task" to "Feature Request".
Xqt added a project: good first task.
Mh-3110 added a subscriber: Mh-3110.EditedMay 19 2019, 5:14 PM

Hi, will be working on this task.
Thanks

Mh-3110 claimed this task.May 19 2019, 5:14 PM

Hi @Nicolas_NALLET , what do you define as "Illegal characters". Could you please provide some examples?
Thanks

Hi @Mh-3110 I mean characters that are not allowed on page name :

# < > [ ] _ { | }

Thanks

@Nicolas_NALLET , thanks. What I want to point out here is that, as no page is supposed to contains such forbidden characters, we will process the same way as if page does not exist. Same thing in case page has been deleted after script was launched.
Fix will look like this:

if not targetpage.exists():
            pywikibot.output(
                'Page {0} doesn\'t exist'.format(
                    page.title(as_link=True)
                )
            )
            continue

cc @Dvorapa

Yeah, this is a standard solution. For nobots, there is a nice botMayEdit function in page.py. I don't know, how to detect SpamBlackList though. The memory issues when saving are still not properly handled in Pywikibot unfortunately.

Thanks @Dvorapa . For SpamBlackList I don't know how to detect neither.
For nobots: below is the fix:

if targetpage.exists() and overwrite:
            if page.botMayEdit() == False:
                pywikibot.output(
                    'Page {0} is not editable by bots'.format(
                        page.title(as_link=True)
                    )
                )
                continue
Mh-3110 updated the task description. (Show Details)May 31 2019, 4:03 PM

@Dvorapa , what do you think about creating subtask for each case:
-A subtask for the case of {{bots}}, {{nobots}} templates presence
-A subtask for the case of page doesn't not exist( presence of illegal characters, page deleted after script launch)
-A subtask for the case of SpamBlackList

Doing so I can already submit fixes for the first 2 cases

Thanks

I think there is no need to split the task. Just create patches for individual parts and assign this task to them. Or just create subtasks for those we can not solve right now.

You can submit as many fixes to one task as you want.

Change 513704 had a related patch set uploaded (by Mh-3110; owner: Mahuton):
[pywikibot/core@master] Transferbot.py: Make script continues when some errors occur

https://gerrit.wikimedia.org/r/513704

Change 513705 had a related patch set uploaded (by Mh-3110; owner: Mahuton):
[pywikibot/core@master] Transfertbot.py: Make script continues when some errors occur

https://gerrit.wikimedia.org/r/513705

Dvorapa added a comment.EditedMay 31 2019, 10:49 PM

Hi @Mh-3110 , you should preserve the same Change-Id line (Change-Id: Ibbe5afaa3b9511e7370dce6685eea9f0e5952cac) in your commit message in order not to create whole new patch for every small change.

Also make sure you are pushing only one commit (using git commit --amend)

:)

Hi @Dvorapa , yeah my bad! I have just realized later after pushing.
Thanks

Change 513705 abandoned by Xqt:
Transfertbot.py: Make script continues when some errors occur

Reason:
In favour of the other PS

https://gerrit.wikimedia.org/r/513705

Change 513704 merged by jenkins-bot:
[pywikibot/core@master] [FEAT]Transferbot.py: Make script continue when some errors occur

https://gerrit.wikimedia.org/r/513704

Okay, now only SpamBlacklist and too long pages fixes are needed, thanks @Mh-3110 for your work!

Thank you both @Dvorapa and @Xqt for your help!