Page MenuHomePhabricator

Potential templatelinks data integrity issue on Tool Labs' enwiki_p
Closed, ResolvedPublic

Description

With https://en.wikipedia.org/w/index.php?title=Talk:Buddy_Baker&diff=675416729&oldid=353868410, this talk page was changed no longer transclude the {{BLP}} (biographies of living people) template.

If you look at the list of templates currently used on the page at https://en.wikipedia.org/w/index.php?title=Talk:Buddy_Baker&action=edit, "Template:BLP" is not listed.

However, enwiki_p's templatelinks on Tool Labs seems to think it's still there:

MariaDB [enwiki_p]> select tl_namespace, tl_title from page join templatelinks on tl_from = page_id where page_namespace = 1 and page_title = 'Buddy_Baker' and tl_namespace = 10 and tl_title = 'BLP';
+--------------+----------+
| tl_namespace | tl_title |
+--------------+----------+
|           10 | BLP      |
+--------------+----------+
1 row in set (0.01 sec)

Can someone please check the enwiki master and verify that this is a discrepancy between the replicated databases available on Labs and production?

Event Timeline

MZMcBride raised the priority of this task from to Needs Triage.
MZMcBride updated the task description. (Show Details)
MZMcBride added projects: Toolforge, DBA.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

From the enwiki master:

mysql:wikiadmin@db1052 [enwiki]> select tl_namespace, tl_title from page join templatelinks on tl_from = page_id where page_namespace = 1 and page_title = 'Buddy_Baker' and tl_namespace = 10 and tl_title = 'BLP';
Empty set (0.00 sec)

db1055 returns empty set, but the three labsdb hosts all give the result shown in the task description. I was wondering about db1069 (the host between db1055 and labsdbs in the replication tree), but it looks like only ops can connect there.

jcrespo triaged this task as Medium priority.
jcrespo moved this task from Triage to In progress on the DBA board.
jcrespo set Security to None.

I checked the checksums, and this table was checked/fixed on 10th August. The fact that this happened again, but only on db1069 and bellow means that the filtering done for privacy is not replication-safe. I will check the table again, and fix all the errors I can find, but this will happen again until we can substitute the filtering for something else and/or we can switch to ROW-based replication.

This should be fixed now, but I am performing another checksum just to be sure.

MariaDB [enwiki_p]> select tl_namespace, tl_title from page join templatelinks on tl_from = page_id where page_namespace = 1 and page_title = 'Buddy_Baker' and tl_namespace = 10 and tl_title = 'BLP';
Empty set (0.00 sec)

This looks much better now. Thanks!

Do the other tables need to be checked as well? pagelinks, categorylinks, etc.?

I am building an automatic checker on T104459. Sadly I can only work on that when there is nothing else more important. This should also help: T109179.