Page MenuHomePhabricator

Database replication issues with deleted pages (affecting Tool Labs and Analytics Store)
Closed, ResolvedPublic

Description

It seems that some pages which have been deleted from English Wikipedia still exist in the page tables on some database replicas:

Production:

mysql:wikiadmin@db1089 [enwiki]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
Empty set (0.00 sec)

Tool Labs:

MariaDB [enwiki_p]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| page_id  | page_namespace | page_title    | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+
| 49995280 |              0 | BatissForever |                   |            0 |                0 |           0 | 0.638177166992 | 20160331025325 | 20160330185410     |   712718594 |      212 | wikitext           |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+

Analytics Store:

mysql:research@analytics-store.eqiad.wmnet [enwiki]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id  | page_namespace | page_title    | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| 49995280 |              0 | BatissForever |                   |            0 |                0 |           0 | 0.638177166992 | 20160331025325 | 20160330185410     |   712718594 |      212 | wikitext           | NULL      |
+----------+----------------+---------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+

But apparently s1-analytics slave matches Production and doesn't have this discrepancy:

mysql:research@s1-analytics-slave.eqiad.wmnet [enwiki]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
Empty set (0.02 sec)

Some other pages that also appear on Tool Labs, but not in Production: Cyclone_Victor(1986), Mercurialsoft_inc

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

FWIW, this doesn't seem to be a lag issue as all the pages affected (that I've found) were deleted about a year ago.

kaldari triaged this task as High priority.May 24 2017, 3:20 AM

Marking high priority since this is affecting current research efforts.

jcrespo claimed this task.

This is a known issue, was already fixed on the new servers: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database/Replica_drift

root@labsdb1009[(none)]> use enwiki
Database changed
root@labsdb1009[enwiki]> select * from page where page_namespace = 0 AND page_title LIKE 'BatissForever';
Empty set (0.00 sec)

labsdb1001 was reimported from dbstore1001, so that would explain the similarities. Please report the drift on this task: T138967 with this query.

Just for the record of this ticket, Jaime kindly fixed it on: T138967#3288156

This is a known issue, was already fixed on the new servers: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database/Replica_drift

...
(Thanks for the explanations! Watching the main task now.)

labsdb1001 was reimported from dbstore1001, so that would explain the similarities.

Just to understand this remark, what's the relation between dbstore1001 and and analytics-store? (I thought the latter was an alias for dbstore1002, not ...001.)

Also, are we reasonably confident that s1-analytics-slave/db1047 is unaffected by T138967 and should match production currently?

what's the relation between dbstore1001 and and analytics-store

I mistakenly meant dbstore1002, the analytics store.

If production is 100% confident, analytics-store is 98% because it uses tokudb and is not in read only mode. Old labsdbs are 23% reliable. New labsdbs are 99% reliable.

To clarify my previous statement, analytics hosts are considered part of production, and they are regularly checked with tools to assure integrity. See for example: T162807 Those tools do not work with labs because of the sanitation/filtering process and special replication required.