Page MenuHomePhabricator

Outdated page / corrupt data in enwiki-NS0-20230220-ENTERPRISE-HTML.json.tar.gz
Closed, ResolvedPublic

Description

I've dumped one page entry from the enwiki-NS0-20230220-ENTERPRISE-HTML.json.tar.gz dump to P45730. The wikitext and HTML are clearly from different revisions - the HTML is a redirect and the wikitext is an article. The revision information and comment point to revid 1117754399 (link), but that's not even the latest edit, it's 1117754425.

I haven't looked further to see if other cases exist yet.

Event Timeline

I found another outdated /mismatched entry - P45731. If I had to guess, it's not fully updating pages after they become redirects? But it's still updating their HTML?

Hey, thanks for noting. We'll take a look into this.

This issue was addressed some time ago as part of broader fixes related to data consistency in the dumps. To validate this, we checked a recent dump and did not find similar cases.

Given that this appears to have been resolved already and no longer reproduces in current dumps, this ticket had remained open pending confirmation and can now be marked as resolved.