Page MenuHomePhabricator

hewiki's categorylinks shown as not empty though it is; purging does not help
Closed, ResolvedPublic

Description

The category used for speed deletion in Hebrew Wikipedia is shown as not empty although it is empty for few days, and purging the category page doesn't change it.

Based on database query

select * from categorylinks where cl_to like '%למחיקה%'

the files are:
https://he.wikipedia.org/wiki/%D7%A7%D7%95%D7%91%D7%A5:Sg_logo.PNG (deleted on October 9)
https://he.wikipedia.org/wiki/%D7%A7%D7%95%D7%91%D7%A5:KKS_Lech_Pozna%C5%84.gif (deleted on October 14)

Restoring, modifying and re-deleting the above files doesn't help

Event Timeline

eranroz raised the priority of this task from to Needs Triage.
eranroz updated the task description. (Show Details)
eranroz added a project: acl*sre-team.
eranroz added subscribers: eranroz, IKhitron.
Aklapper renamed this task from Database corruption in hewiki (categorylinks) to hewiki's categorylinks shown as not empty though it is; purging does not help.Oct 16 2015, 11:11 AM

Referring to LinksDeletionUpdate.php - are there cleanup triggers configurated on hewiki? are they enabled?

It's empty now. And purging it wouldn't help ever; you'd need to purge every file shown in it

Categorylinks isn't empty.
{{PAGESINCATEGORY:ויקיפדיה: למחיקה מהירה}} return 2
select * from categorylinks where cl_to like '%למחיקה%';

The page associated with those categorylinks are deleted - so it looks like there are no members, but there are.

The hewiki master shows 7

mysql:wikiadmin@db1062 [hewiki]> select * from categorylinks where cl_to like '%למחיקה%'
    -> ;
+---------+-------------------------------------------+--------------------------------------------+---------------------+-------------------+--------------+---------+
| cl_from | cl_to                                     | cl_sortkey                                 | cl_timestamp        | cl_sortkey_prefix | cl_collation | cl_type |
+---------+-------------------------------------------+--------------------------------------------+---------------------+-------------------+--------------+---------+
|  408236 | ויקיפדיה:_למחיקה_מהירה                    | GFATOTAL.GIF                               | 2015-10-16 09:40:25 |                   | uppercase    | file    |
|  510006 | ויקיפדיה:_למחיקה_מהירה                    | KKS LECH POZNAŃ.GIF                        | 2015-10-14 17:35:39 |                   | uppercase    | file    |
|  695697 | ויקיפדיה:_למחיקה_מהירה                    | SG LOGO.PNG                                | 2015-10-08 19:24:25 |                   | uppercase    | file    |
|  933942 | ויקיפדיה:_למחיקה_מהירה                    | CSKABASKET.JPG                             | 2015-10-16 07:38:49 |                   | uppercase    | file    |
| 1201221 | ויקיפדיה:_למחיקה_מהירה                    | הפועל ירושלים כדורסל.JPG                   | 2015-10-17 12:21:10 |                   | uppercase    | file    |
| 1319470 | ויקיפדיה:_למחיקה_מהירה                    | SG LOGO.PNG                                | 2015-10-16 04:19:34 |                   | uppercase    | file    |
| 1319471 | ויקיפדיה:_למחיקה_מהירה                    | KKS LECH POZNAŃ.GIF                        | 2015-10-16 04:22:06 |                   | uppercase    | file    |
+---------+-------------------------------------------+--------------------------------------------+---------------------+-------------------+--------------+---------+
7 rows in set (1.31 sec)

mysql:wikiadmin@db1062 [hewiki]>

Most or all of teh cl_from are deleted pages. e.g:

select * from categorylinks left join page on cl_from=page_id where cl_to like '%למחיקה%';

What is needed from ops here if anything?

added MediaWiki-JobQueue, removed operations. per comment from Aaron this should be a duplicate

Please cleanup categorylinks that MW should have deleted e.g:

delete from categorylinks where cl_from not in (select page_id from page)

(CAUTION! I didn't test it)

Once we clean this up we can follow up:

  • Either it will not occur anymore and this was due to one of the outages that happened recently and that's it (hopefully)
  • It will come back and we will be sure it is a software bug or missing triggers

Please cleanup categorylinks that MW should have deleted e.g:

delete from categorylinks where cl_from not in (select page_id from page)

(CAUTION! I didn't test it)

Once we clean this up we can follow up:

  • Either it will not occur anymore and this was due to one of the outages that happened recently and that's it (hopefully)
  • It will come back and we will be sure it is a software bug or missing triggers

Shouldn't be needed. When the refreshLinks.php maintenance script is run on the wiki, it will do it. Noting it's on a regular cron for all wikis

https://github.com/wikimedia/operations-puppet/blob/812f280d16acfe3083259e8dfa7ce12ebf71da87/modules/mediawiki/manifests/maintenance/refreshlinks.pp#L28-L29

7th November

(Noting this would likely be a problem on all wikis)

eranroz claimed this task.