Page MenuHomePhabricator

Data missing from June 11/12 on s3.labsdb
Closed, ResolvedPublic

Description

A fellow Wikipedian noticed while using this tool on Tool Labs that there are some articles of his missing in the list.

After some inspection, I found out that there's about 19 hours' worth of missing revisions in revision table on the Labs replica (s3.labsdb) around June 12, 2015 (from about 8:20pm on June 11 to about 3:20 pm on June 12).

MariaDB [srwiki_p]> select max(rev_timestamp) as max_rev from revision where rev_timestamp like '20150611%';
+----------------+
| max_rev        |
+----------------+
| 20150611202101 |
+----------------+
MariaDB [srwiki_p]> select min(rev_timestamp) as min_rev from revision where rev_timestamp like '20150612%';
+----------------+
| min_rev        |
+----------------+
| 20150612152251 |
+----------------+

I tested this with srwiki, but apparently, it's the case with all the databases on s3.labsdb. Other servers don't seem to be affected.

Event Timeline

dungodung raised the priority of this task from to Needs Triage.
dungodung updated the task description. (Show Details)
dungodung added a project: Cloud-Services.
dungodung added a subscriber: dungodung.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 14 2015, 8:47 PM
Krenair added a subscriber: Krenair.
chasemp triaged this task as Low priority.Nov 30 2015, 4:42 PM
chasemp added a subscriber: chasemp.
jcrespo moved this task from Triage to Backlog on the DBA board.Mar 17 2016, 5:12 PM
jcrespo moved this task from Backlog to In progress on the DBA board.Mar 19 2016, 2:14 PM
jcrespo moved this task from In progress to Next on the DBA board.Apr 1 2016, 4:17 PM

T133469 concerns a very similar issue but on an s7 wiki - I know that the DNS setup in labs encourages use of the same underlying server for both of those:

krenair@tools-bastion-03:~$ host s3.labsdb
s3.labsdb has address 10.64.37.5
krenair@tools-bastion-03:~$ host s7.labsdb
s7.labsdb has address 10.64.37.5
krenair@tools-bastion-03:~$ host 10.64.37.5
5.37.64.10.in-addr.arpa domain name pointer labsdb1003.eqiad.wmnet.

So perhaps the issue is wider in scope than just s3, but also covering s7, and potentially also s5/s6 (which are also typically served by labsdb1003)?

So perhaps the issue is wider in scope than just s3, but also covering s7, and potentially also s5/s6 (which are also typically served by labsdb1003)?

No, this only affected s3 shard- it was a replication problem for that shard only. Now, I cannot guarantee the accuracy of other shards, that is why I am reimporting all, starting from enwiki/s1. As I said, it can take weeks (months?) to reimport all tables due to labs filtering, one by one, but it is happening now.

Anomie added a subscriber: Anomie.Apr 28 2016, 2:21 PM

Now, I cannot guarantee the accuracy of other shards, that is why I am reimporting all, starting from enwiki/s1.

I can confirm there was some sort of similar issue on the enwiki replica.

tools.anomiebot@tools-bastion-03:~$ sql enwiki 'select * from page where page_id>=50271944 and page_id<=50275119;'
page_id	page_namespace	page_title	page_restrictions	page_counter	page_is_redirect	page_is_new	page_random	page_touched	page_links_updated	page_latest	page_len	page_content_model
50271944	10	2011-12_Football_League_Two_table		0	1	1	0.848858636718	20160422091535	20160422091539	716545485	141	wikitext
50275119	10	1968-69_in_English_football		0	1	1	0.567951926164	20160422124451	NULL	716566974	129	wikitext

That should have output 3069 lines, not just two. Also,

MariaDB [enwiki_p]> select rev_id, rev_page, rev_timestamp from revision where rev_id>=716545495 and rev_id<=716566961;
+-----------+----------+----------------+
| rev_id    | rev_page | rev_timestamp  |
+-----------+----------+----------------+
| 716545495 | 12264894 | 20160422091542 |
| 716561763 | 50201281 | 20160422115519 |
| 716563248 | 50201281 | 20160422121036 |
| 716566961 |  8368016 | 20160422124443 |
+-----------+----------+----------------+
4 rows in set (0.00 sec)

There should be 20997 rows there, not just four.

jcrespo moved this task from Next to In progress on the DBA board.May 4 2016, 9:50 AM
jcrespo closed this task as Resolved.Dec 21 2016, 5:30 PM
jcrespo claimed this task.

These are the results on the new replica service servers:

root@localhost[srwiki_p]> select max(rev_timestamp) as max_rev, min(rev_timestamp) as min_rev from revision where rev_timestamp like '20150611%';
+----------------+----------------+
| max_rev        | min_rev        |
+----------------+----------------+
| 20150611235958 | 20150611000406 |
+----------------+----------------+
1 row in set (0.00 sec)
  • For enwiki:

master:

root@localhost[enwiki]> select count(*) from page where page_id>=50271944 and page_id<=50275119;
+----------+
| count(*) |
+----------+
|     2999 |
+----------+
1 row in set (0.01 sec)

new replica servers:

root@localhost[enwiki_p]> select count(*) from page where page_id>=50271944 and page_id<=50275119;
+----------+
| count(*) |
+----------+
|     2999 |
+----------+
1 row in set (0.00 sec)

I consider this as solved, documentation and announcement on how the new system will work will arrive soon (please be patient). Please ask if you really need the fixed data sooner.