For example, according to the on-wiki logs, the page "Jeff Caldwell (soccer)" on enwiki was deleted three times, restored once, and then moved.
But mediawiki_history only records the last two, and the move is actually marked as a creation. It also doesn't include any of the initial creations.
select event_type, event_timestamp, event_user_text, page_title, page_title_historical, page_id from wmf.mediawiki_history where event_entity = "page" and wiki_db = "enwiki" and (page_title_historical = "Jeff_Caldwell_(soccer)" or page_title = "Jeff_Caldwell_(soccer)") and snapshot = "2018-08" event_type event_timestamp event_user_text \ 0 create 2018-07-19 13:00:57.0 Freefalling660 1 restore 2018-07-31 17:33:57.0 Hut 8.5 page_title page_title_historical page_id 0 Freefalling660/Jeff_Caldwell_(soccer) Jeff_Caldwell_(soccer) 57939448 1 Freefalling660/Jeff_Caldwell_(soccer) Jeff_Caldwell_(soccer) 57939448
mediawiki_page_history records a bunch more, but there are several duplicates and the schema is a lot more confusing to me (only including the query because the result is too long to print).
select page_id, page_id_artificial, page_title, page_title_historical, start_timestamp, end_timestamp, caused_by_event_type, caused_by_user_id from wmf.mediawiki_page_history where wiki_db = "enwiki" and (page_title_historical = "Jeff_Caldwell_(soccer)" or page_title = "Jeff_Caldwell_(soccer)") and snapshot = "2018-08" order by start_timestamp asc limit 1000
As another example, the the page ""Accidente ferroviario de Cerrillos de 1956" on eswiki has had quite a few events, but has no page events at all in mediawiki_history (same with mediawiki_page_history).
select event_type, event_timestamp, event_user_text, page_id from wmf.mediawiki_history where event_entity = "page" and wiki_db = "eswiki" and (page_title_historical = "Accidente ferroviario de Cerrillos de 1956" or page_title = "Accidente ferroviario de Cerrillos de 1956") and snapshot = "2018-08"
Is the data supposed to be this unreliable? Shouldn't mediawiki_history and mediawiki_page_history both be consistent?
On the wiki page, I see a note from almost a year ago saying that "History of pages with complex delete/restore patterns is on purpose not yet corretly worked. Will happen after Wikistats-2 release", but I feel like these issues are bigger than that implies.