Page MenuHomePhabricator

Ghost entries in bgwiki.categorylinks; wanted categories affected
Closed, ResolvedPublic

Description

Author: b.manolov

Description:
There are two categories on bgwiki (http://bg.wikipedia.org) which are listed as wanted categories but do not contain any real pages. This happens because the database table bgwiki.categorylinks contains entries for this categories where the column cl_from does not refer to any real page_id from the table bgwiki.page.

The two categories are "Транс-нептунови_обекти" and "Катедрала". Here is an extract from the table bgwiki.categorylinks.

+---------+------------------------+----------------------------+--------------------

cl_fromcl_tocl_sortkeycl_timestamp

+---------+------------------------+----------------------------+--------------------

50238Транс-нептунови_обектиХарон (спътник)2005-04-28 05:54:32
50239Транс-нептунови_обектиХарон (спътник)2005-04-28 05:54:32
53857КатедралаПарижката Света Богородица2005-05-27 16:48:23

+---------+------------------------+----------------------------+--------------------

These ghost entries should be deleted manually from the table bgwiki.categorylinks as there is no such possibility to do this through the MediaWiki interface.


Version: unspecified
Severity: trivial
URL: http://bg.wikipedia.org/wiki/Special:Wantedcategories

Details

Reference
bz12168

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:59 PM
bzimport set Reference to bz12168.

Sounds like it needs a refresh. May already be clear by now, this was a while ago...

b.manolov wrote:

Currently there are 25 entries in the bgwiki.categorylinks table with a cl_from key which doesn't correspond to any page_id from the page table. Their timestamps lie between 2005-04-28 and 2005-06-01. Hopefully they will not become evergreens. :-)

Only two of the categories from these entries do not exist so they show up on the Wanted Categories list. These are our old friends, Транс-нептунови_обекти and Катедрала.

A quick check SQL query:

SELECT c.* FROM categorylinks c
LEFT JOIN page ON cl_from = page_id
WHERE page_id IS NULL AND cl_from > 0;

I am currently running the refreshLinks script against bgwiki. Unfortunatly, this will take atleast a couple of hours to run. Once it is finished, I will close out this ticket.

(Currently running script in a screen session on Hume)

mysql> SELECT c.* FROM categorylinks c

-> LEFT JOIN page ON cl_from = page_id
-> WHERE page_id IS NULL AND cl_from > 0;

Empty set (5.98 sec)

Looks like this was fixed.