Page MenuHomePhabricator

Missing linktarget rows on wiki replicas
Closed, InvalidPublic

Description

MariaDB [enwiki_p]> select * from linktarget where lt_namespace=10 and lt_title="19th_Avenue/M_Ocean_View_Project";
Empty set (0.003 sec)

Here's the same query against a production replica:

wikiadmin2023@db1196.eqiad.wmnet(enwiki)> select * from linktarget where lt_namespace=10 and lt_title="19th_Avenue/M_Ocean_View_Project";
+--------+--------------+----------------------------------+
| lt_id  | lt_namespace | lt_title                         |
+--------+--------------+----------------------------------+
| 747361 |           10 | 19th_Avenue/M_Ocean_View_Project |
+--------+--------------+----------------------------------+
1 row in set (0.031 sec)

The page in question is here: https://en.wikipedia.org/wiki/Template:19th_Avenue/M_Ocean_View_Project

In the query that I'm debugging there's like 150 other missing rows, but I'm only looking at a small section of the template namespace, so it could be more widespread.

I started digging into this because of weird issues related to a database report (https://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports#Wikipedia:Database_reports/Uncategorized_templates_missed_its_weekly_update) in which results were dramatically changing day-by-day, which suggests that one replica fell out of sync or something, but if I read puppet correctly, there's only one backing server for analytics (clouddb1017) and one for web (clouddb1013), so it wouldn't be that...but regardless, there's still a discrepancy between the replica and production.

Event Timeline

The views are identical on both clouddb1013 and clouddb1017 and match what's in Puppet:

root@clouddb1013:s1[(none)]> show create table enwiki_p.linktarget\G
*************************** 1. row ***************************
                View: linktarget
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `enwiki_p`.`linktarget` AS select `enwiki`.`linktarget`.`lt_id` AS `lt_id`,`enwiki`.`linktarget`.`lt_namespace` AS `lt_namespace`,`enwiki`.`linktarget`.`lt_title` AS `lt_title` from `enwiki`.`linktarget` where exists(select 1 from `enwiki`.`templatelinks` where `enwiki`.`templatelinks`.`tl_target_id` = `enwiki`.`linktarget`.`lt_id` limit 1) or exists(select 1 from `enwiki`.`pagelinks` where `enwiki`.`pagelinks`.`pl_target_id` = `enwiki`.`linktarget`.`lt_id` limit 1)

root@clouddb1017:s1[(none)]> show create table enwiki_p.linktarget\G
*************************** 1. row ***************************
                View: linktarget
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `enwiki_p`.`linktarget` AS select `enwiki`.`linktarget`.`lt_id` AS `lt_id`,`enwiki`.`linktarget`.`lt_namespace` AS `lt_namespace`,`enwiki`.`linktarget`.`lt_title` AS `lt_title` from `enwiki`.`linktarget` where exists(select 1 from `enwiki`.`templatelinks` where `enwiki`.`templatelinks`.`tl_target_id` = `enwiki`.`linktarget`.`lt_id` limit 1) or exists(select 1 from `enwiki`.`pagelinks` where `enwiki`.`pagelinks`.`pl_target_id` = `enwiki`.`linktarget`.`lt_id` limit 1)

Per https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:19th_Avenue/M_Ocean_View_Project there are no current transclusions for that page which means no templatelinks rows, and the pagelinks migration is still in progress (per T345733):

mysql:research@dbstore1003.eqiad.wmnet [enwiki]> select * from pagelinks where pl_namespace = 10 and pl_title = '19th_Avenue/M_Ocean_View_Project';
+----------+--------------+----------------------------------+-------------------+--------------+
| pl_from  | pl_namespace | pl_title                         | pl_from_namespace | pl_target_id |
+----------+--------------+----------------------------------+-------------------+--------------+
| 23394133 |           10 | 19th_Avenue/M_Ocean_View_Project |                 4 |         NULL |
| 36311056 |           10 | 19th_Avenue/M_Ocean_View_Project |                 2 |         NULL |
| 59967623 |           10 | 19th_Avenue/M_Ocean_View_Project |                11 |         NULL |
| 68456107 |           10 | 19th_Avenue/M_Ocean_View_Project |                 2 |         NULL |
| 71927542 |           10 | 19th_Avenue/M_Ocean_View_Project |                 4 |         NULL |
| 71927548 |           10 | 19th_Avenue/M_Ocean_View_Project |                 4 |         NULL |
+----------+--------------+----------------------------------+-------------------+--------------+
6 rows in set (0.003 sec)

So that explains why you would see those rows, but doesn't explain those fluctuations.

I didn't realize the view only surfaces linktarget rows if there's a corresponding *links row, that feels weird to me but makes sense. I'm not sure why it's fluctuating either, but I think we can partially/mostly chalk this up to a busted/nonsensical query, which I've now fixed to not do that.