Page MenuHomePhabricator

`allrevisions=False` in XmlDump returns earliest revision, not latest
Closed, ResolvedPublic

Description

xmlreader.XmlDump has a boolean parameter allrevisions that states "If True, parse all revisions instead of only the latest one." If False, it calls the function ._parse_only_latest() in .parse(). However, two issues arise:

  1. As written, ._parse_only_latest() yields the first matching revision found without comparing revisionid to other candidates. In the case of the test suite and my own separate wikidump, this returns the earliest revision of the page, not the latest.
  1. The accompanying test, tests.xml_reader_tests.test_XmlDumpFirstRev(), specifies that it indeed tests loading the first revision. While the test is currently passing, it would be more helpful to match the function and instead check that it returns the latest revision.

PR to follow shortly!

Event Timeline

Change 934432 had a related patch set uploaded (by ElSeiver; author: ElSeiver):

[pywikibot/core@master] Make `allrevisions=False` in XmlDump return latest revision

https://gerrit.wikimedia.org/r/934432

The implementation exists for 15 years and this patch is a breaking change but the documentation looks wrong.

See https://mediawiki.org/wiki/Special:Code/pywikipedia/6739

I propose to implement all variants to have the first, last and all revisions.

Merged as https://gerrit.wikimedia.org/r/934432, it must have unlinked the issue here at some point

Xqt claimed this task.
Xqt reassigned this task from Xqt to ElSeiver.