Page MenuHomePhabricator

Category graph includes deleted categories
Closed, DuplicatePublic



The category Category:Breakthrough_Prize_winners was deleted in 09:37, 19 June 2019.

The data even includes categories deleted in March (Category:Recipients of the Jeton de Vermeil)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev triaged this task as Medium priority.Aug 8 2019, 5:40 AM

Looks like there's some problem with deletion handling. E.g.,_2006 has been deleted and is listed in enwiki-20190826-daily.sparql.gz dump as deleted, but still present in the database. Strangely enough, the log shows the file was successfully processed - but somehow the results are not there. Will investigate further.

Looks like DELETE SPARQL clauses that the daily dump is generating are wrong... Weird I haven't noticed it.

Change 532824 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/core@master] Fix categories detele SPARQL clause

After the patch is merged and deployed, categories DB needs to be re-loaded according to procedure here:

I recommend doing it on wdqs1009 or wdqs1010 and then copy categories.jnl to other servers. Since categories are updated daily (see blazegraph cron) it is recommended to start the procedure so that there's enough time to copy the DB to all servers before it's time for the daily update. Since the DB is small, it should not be a problem to copy to all servers in a single day.

Change 532824 merged by jenkins-bot:
[mediawiki/core@master] Fix categories detele SPARQL clause

Smalyshev added a subscriber: Smalyshev.
dcausse added a subscriber: dcausse.

merged in T246568 which is where we'll announce that the full reload has been done.