Page MenuHomePhabricator

Dump HTML is from a different revision than the wikitext and revision reported in the metadata
Open, Needs TriagePublic


While debugging our analysis, we found that the latest dewiki dump includes some articles which have been rendered using a different revision than what was used in metadata and to pull wikitext.

Example: Saugarten, which includes metadata showing that the record is pulled for revision 237928057. The wikitext column correctly matches this revision, however the HTML is older as can be seen from the prologue,

<!DOCTYPE html>\n<html prefix=\"dc: mw:\" about=\"\">