Page MenuHomePhabricator

page_creation_timestamp not always correct in mediawiki_history
Open, HighPublic

Description

Not sure if this is related to T205594 but in my work on T213597 I found that page_creation_timestamp in wmf.mediawiki_history is not always correct:

USE wmf;
SELECT
  wiki_db,
  IF(event_timestamp = page_creation_timestamp, 'matches', 'does not match') AS initial_rev_page_creation_ts,
  COUNT(1) AS n_pages
FROM mediawiki_history
WHERE snapshot = '2018-12'
  AND wiki_db IN('enwiki', 'commonswiki')
  AND event_entity = 'revision'
  AND revision_parent_id = 0 -- initial rev
  AND NOT revision_is_deleted
GROUP BY wiki_db, IF(event_timestamp = page_creation_timestamp, 'matches', 'does not match');

So looks like it affects ~0.2% of pages on Commons and 1.2% of English Wikipedia pages (0.7% of articles specifically):

wiki_dbdoes not matchmatchesproportion of total
commonswiki134788692741590.194%
enwiki572631464578941.218%

Examples

USE wmf;
SELECT
  page_id, page_title,
  event_timestamp, page_creation_timestamp
FROM mediawiki_history
WHERE snapshot = '2018-12'
  AND wiki_db = 'commonswiki'
  AND event_entity = 'revision'
  AND revision_parent_id = 0 -- initial rev
  AND page_namespace = 6
  AND NOT revision_is_deleted
  AND event_timestamp != page_creation_timestamp
LIMIT 100;
page_idpage_titleevent_timestamppage_creation_timestamprevision history linkfirst entry in revision history
2721713U.S._Territorial_Acquisitions.en.alt1.jpg2007-09-10 16:27:25.02012-06-16 12:04:19.0revision history16:27, 10 September 2007
30269399UN_General_Assembly_Resolution_66_(1).pdf2013-12-21 08:19:34.02013-12-23 17:36:24.0revision history08:19, 21 December 2013