Page MenuHomePhabricator

missing database entries at categorylinks table on dewiki db
Closed, InvalidPublic

Description

My bot searches for articles without categories on dewiki. But the database returns wrong results because of missing entries:

mysql -hs5.labsdb -vve "select * from dewiki_p.categorylinks where cl_from=8389350";

select * from dewiki_p.categorylinks where cl_from=8389350

Empty set (0.00 sec)

But the articles has four categories added since 9. September 2014, 21:28:40 :
http://de.wikipedia.org/w/index.php?title=Eberhard_R%C3%B6ssler&diff=133880251&oldid=133879668 , so there are four rows missing.

and there is no replag:
mysql -hs5.labsdb -ve "select rc_timestamp from dewiki_p.recentchanges order by rc_timestamp desc limit 1";
+----------------+

rc_timestamp

+----------------+

20140911122626

+----------------+

Please add the missing entries at categorylinks table and prevent such error for the future.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72226
https://bugzilla.wikimedia.org/show_bug.cgi?id=71084

Details

Reference
bz70711

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:52 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz70711.

Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility.

Very interested to hear if anyone observes this with recent (<1week) data.

Today this happened again with http://de.wikipedia.org/wiki/Matw%C3%A9_Middelkoop (pageid: 8468612)

article has categories, no replag, but no entries in categorylinks table. purge and nulledit does not help.

Is this still an issue after the changes effected by @Springle?

Same problem again after two rollbacks:

$ mysql -hs5.labsdb -vvve "select page_id, page_latest, cl_to from dewiki_p.page left join dewiki_p.categorylinks on page_id=cl_from where page_id IN(1887,5976)";
--------------
select page_id, page_latest, cl_to from dewiki_p.page left join dewiki_p.categorylinks on page_id=cl_from where page_id IN(1887,5976)
--------------

+---------+-------------+-------+
| page_id | page_latest | cl_to |
+---------+-------------+-------+
|    1887 |   142365806 | NULL  |
|    5976 |   142365230 | NULL  |
+---------+-------------+-------+
2 rows in set (0.00 sec)

Same result on s1 and s3.

Both pages have categories in current version shown in page_latest column.

And currently again:

$ mysql -hs5.labsdb -vvve "select page_id, page_latest, cl_to from dewiki_p.page left join dewiki_p.categorylinks on page_id=cl_from where page_id IN(79660,165384,4948096)";
+---------+-------------+-------+
| page_id | page_latest | cl_to |
+---------+-------------+-------+
|   79660 |   142745685 | NULL  |
|  165384 |   142747515 | NULL  |
| 4948096 |   142743883 | NULL  |
+---------+-------------+-------+
3 rows in set (0.00 sec)

@Merl: That query produces exactly the same result in production - whatever the issue you expect may be, it is not related to replication.

jcrespo raised the priority of this task from Low to Needs Triage.
jcrespo moved this task from Triage to Backlog on the DBA board.
jcrespo added a subscriber: jcrespo.

I've checked all the queries given by the user (note: one provides no results as that article does not exist) and I've found identical results on production and on labsdb* hosts.

Either there was a problem that was long time resolved or, more likely, the user did not have into account that jobs updating pages like "what links here" are not syncronous- independently of replication.