Page MenuHomePhabricator

search sorted by creation date missing some items
Closed, ResolvedPublic

Description

this search result should contain the pages "Wikipedia:Village pump (technical)/Archive 2" and "Wikipedia:Village pump (technical)/Archive 9", but doesn't do so. I've checked their histories and logs and can't find anything out of the ordinary.

Event Timeline

Graham87 created this task.Feb 7 2019, 7:27 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 7 2019, 7:27 AM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 7 2019, 8:33 AM
dcausse triaged this task as Normal priority.Feb 7 2019, 8:54 AM
dcausse moved this task from needs triage to elastic / cirrus on the Discovery-Search board.
dcausse added a subscriber: dcausse.

According to https://en.wikipedia.org/w/api.php?action=query&format=json&prop=cirrusbuilddoc%7Ccirrusdoc&titles=Wikipedia%3AVillage_pump_(technical)%2FArchive_9&formatversion=2 the create_timestamp field is not yet present in the index. I suspect our reindex process to have missed this page since I believe we waited for 2 month before announcing that this new field was available (2 month being the time we estimate a full refresh to take).
We'll have to investigate the logs to determine what happened but since it's a 2 month process clues may be hard to find.

Change 488967 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] saneitizer: Increment loop id when restarting

https://gerrit.wikimedia.org/r/488967

Ainali added a subscriber: Ainali.Feb 7 2019, 10:53 PM

Change 488967 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] saneitizer: Increment loop id when restarting

https://gerrit.wikimedia.org/r/488967

EBernhardson moved this task from elastic / cirrus to Current work on the Discovery-Search board.

Kind of good news / bad news. The good news is the patch is merged and will deploy this week. The bad news is the bug was in the process that backfill's old properties like the somewhat recently added page creation date. It's basically going to take 2 more months before these new property sorts take into account all pages.

Checked current query, as expected (it hasn't been two full months yet) there are still ~7 pages that match the example query that have not yet been reindexed, Expecting these 7 to be reindexed sometime in the next 4-5 weeks.

Example query now returns appropriate results. It seems the processes involved here are all working as intended, calling this complete.

debt closed this task as Resolved.Apr 19 2019, 8:59 PM
debt added a subscriber: debt.

"all good things take time" -- indexing for the win! :)