Consider the following sequence of events in the context of Portuguese Wikipedia:
- March 2018: São Carlos was moved to São Carlos (desambiguação)
- March 2018: Discussão:São Carlos was moved to Discussão:São Carlos (desambiguação)
- March 2018: São Carlos (São Paulo) was moved to São Carlos
- March 2018: Discussão:São Carlos (São Paulo) was moved to Discussão:São Carlos
- April 2020: São Carlos was moved to São Carlos (São Paulo)
- April 2020: Discussão:São Carlos was moved to Discussão:São Carlos (São Paulo)
- April 2020: São Carlos (desambiguação) was moved to São Carlos
- April 2020: Discussão:São Carlos (desambiguação) was moved to Discussão:São Carlos
- April 2020: Labels were extracted from ptwiki-20200301-pages-meta-history*.xml*.bz2, including these:
{"timestamp": "20081220171253", "project": "marca de projeto", "wp10": "3", "page_title": "S\u00e3o Carlos"} {"timestamp": "20171109163710", "project": "marca de projeto", "wp10": "4", "page_title": "S\u00e3o Carlos"}
Now, these are timestamps which appear in the history of Discussão:São Carlos (São Paulo).
However, when the text was extracted from ptwiki's API, it came from São Carlos, which is a disambiguation page, and not the article São Carlos (São Paulo) to which the labels refer to.
I don't know how often this mismatch between the text and the labels happens in the full datasets, but it should be fixed.