Page MenuHomePhabricator

Allow wikisourcetext.py to recreate pages
Closed, DeclinedPublic

Description

The Pywikibot framework contains the script wikisourcetext.py which creates Wikisource pages for an index including the OCR from the underlying PDF. Detail: https://github.com/wikimedia/pywikibot-core/blob/master/scripts/wikisourcetext.py

Although this script has a -force option it does not allow to recreate/overwrite such pages if they have been already created. However, this would be an important feature in cases when the underlying PDF has been modified.

The current workaround would require a cumbersome movement of pages or their deletion.

Event Timeline

Such an application may be problematic. What would prevent a user from overwriting proofread and validated pages? I would much prefer that pages were separately deleted with another process, then re-added with this tool. It is not difficult for an admin to utilise massdelete gadget to remove a list of subpage files.

@Aschroet: Is this task about proposing a change to Pywikibot code? If so please add the corresponding project tag to this task otherwise the Pywikibot maintainers will never see this task. Thanks!

Xqt triaged this task as Low priority.Jun 19 2018, 4:24 PM

Although this script has a -force option it does not allow to recreate/overwrite such pages if they have been already created. However, this would be an important feature in cases when the underlying PDF has been modified.

Could someone point me the problem. The doc says:

-force:        overwrite existing pages;
               default is False; valid only if '-ocr' is selected.

@Xqt: I see this as a desirable feature. As mentioned above, I would NOT want a bot to overwrite proofread pages. I would prefer that this is closed with no action.

Should we remove the -force option because it doesn’t work as expected? I never used this script.

@Xqt: I see this as a desirable feature. As mentioned above, I would NOT want a bot to overwrite proofread pages. I would prefer that this is closed with no action.

This cannot happen. Only non-existing pages or Not Proofread pages are treated.

This script cannot solve the requested use case:

"Although this script has a -force option it does not allow to recreate/overwrite such pages if they have been already created. However, this would be an important feature in cases when the underlying PDF has been modified."

The script has no capability to access the file content if the page exists. It can only do it indirectly via the preload feature, but for this to be possible, the page must not exist, i.e. a deletion is needed.

Per Billinghurst