Page MenuHomePhabricator

APISite.loadpageinfo may discard changes to page content when information was not loaded
Closed, ResolvedPublicBUG REPORT

Description

Demonstrated on listpages.py.

scripts/listpages.py
page_target.text = text
page_target.save(summary=summary)
  • Page.text.setter updates the Page._text attribute with the new content
  • Page.save calls Page._save internally
  • Page._save calls Page._cosmetic_changes_hook
  • Page._cosmetic_changes_hook calls Page.content_model (new after rPWBC94a3f39a038a87e7516e55a253423609694d5342)
  • Page.content_model cannot use cached information because it hasn't been loaded yet (when -overwrite is used), so it calls APISite.loadpageinfo -> APISite._update_page -> pywikibot.data.api.update_page
  • pywikibot.data.api.update_page does del page.text which unconditionally deletes Page._text (the text saved in page_target.text = text)

The last step is the problem: APISite.loadpageinfo deletes the useful content.

Workarounds:

  • (for users) disable cosmetic changes
  • (for developers) always completely initialize the page object before updating its content

Quick fix: revert rPWBC94a3f39a038a87e7516e55a253423609694d5342 (or just remove the content_model check).

Event Timeline

matej_suchanek created this task.

I made the quick fix revert to publish a new release because this bug is inside a widely used code part

Xqt claimed this task.

see follow-up Task T260489

see follow-up Task T260489

Fine with it but I believe this task is not resolved. The actual problem is not loading content model during save but loadpageinfo discarding _text.

Change 620503 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [bugfix] Change meaning of BasePage.text

https://gerrit.wikimedia.org/r/620503

matej_suchanek removed Xqt as the assignee of this task.
matej_suchanek moved this task from Backlog to Needs Review on the Pywikibot board.

Fine with it but I believe this task is not resolved. The actual problem is not loading content model during save but loadpageinfo discarding _text.

Yes sure, we can have two tasks: One for the underlying problem menioned here, one for the cc on wikitext only (probably soleved with reverting the revert after this patch here has been merged.

Xqt claimed this task.

Change 620503 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Change meaning of BasePage.text

https://gerrit.wikimedia.org/r/620503

Xqt reopened this task as Open.EditedAug 27 2020, 1:26 PM

Reopened due to failing tests after this patch:
https://travis-ci.org/github/wikimedia/pywikibot/builds/721646028

I guess most of them are related to ProofreadPage

Change 622811 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [bugfix] Clean up ProofreadPage.text

https://gerrit.wikimedia.org/r/622811

Change 622811 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Clean up ProofreadPage.text

https://gerrit.wikimedia.org/r/622811

The tests are still failing because ProofreadPage doesn't like the dummy foobar text.
Anyone having a dummy sample for ProofreadPage?

Change 622838 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [WIP] Clean up and fix ProofreadPage and tests

https://gerrit.wikimedia.org/r/622838

Change 622838 merged by jenkins-bot:
[pywikibot/core@master] Clean up and fix ProofreadPage and tests

https://gerrit.wikimedia.org/r/622838

The build doesn't fail on this anymore and the following now works as expected:

>>> page = pywikibot.Page(pywikibot.Site('en', 'wikipedia'), 'Dummy')
>>> page.text = 'dummy'
>>> page.exists()
True
>>> page.text
>>> 'dummy'

I believe we can close it now.