No data after 20170517193000 available via Quarry from tables (recentchanges, revision, logging) for several Mediawiki databases (svwiki_p, fiwiki_p, nowiki_p, ...)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Larske
	May 18 2017, 11:01 PM

Description

When using Quarry to retrieve data from Mediawiki database tables, no data after 20170517193000 is available in the recentchanges, revision or logging tables for several databases xxwiki_p (where xx = sv, fi, no, nl, pl, tr, it, pt, ...).
From other databases like yywiki_p (where yy=en, de, ft, es, da, ru, et, la, lt, lv, nn, rp, ceb, ja ...) data is available as usual.

SQL:
use svwiki_p;
SELECT NOW(), rc_timestamp FROM recentchanges
ORDER BY rc_timestamp DESC
LIMIT 5

just gives this

NOW(),rc_timestamp
2017-05-18T22:54:38,20170517192932
2017-05-18T22:54:38,20170517192932
2017-05-18T22:54:38,20170517192932
2017-05-18T22:54:38,20170517192921
2017-05-18T22:54:38,20170517192921

i.e. data for more than 27 hours is missing.

Edit:
It seems the databases are now gradually being updated. (I only checked the svwiki_p table recentchanges)
15 minutes ago the most recent data was 16.1 hours old and right now the most recent data is 14.6 hours old.
With that pace the "catch up" will be completed and the function back to normal in 2-3 hours time.

It would be interesting to know why this affected some language versions but not other.

Related Objects

Mentioned Here: T140788: Labs databases rearchitecture (tracking)

Event Timeline

Larske created this task.May 18 2017, 11:01 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 18 2017, 11:01 PM

Larske renamed this task from No data after 20170517193000 available via Quarry from tables (recentchanges, revisions, logging) for several Mediawiki databases (svwiki_p, fiwiki_p, nowiki_p, ...) to No data after 20170517193000 available via Quarry from tables (recentchanges, revision, logging) for several Mediawiki databases (svwiki_p, fiwiki_p, nowiki_p, ...) .May 18 2017, 11:03 PM

Larske updated the task description. (Show Details)

Larske updated the task description. (Show Details)May 19 2017, 8:13 AM

You can check the replication lag at https://tools.wmflabs.org/replag/ (or better, directly by querying the heartbeat_p.heartbeat table). With the current infrastructure, it is impossible to avoid lag, whenever something in production changes the structure of the tables (schema change). That is going to change with the new architecture planned on T140788

In particular, there was ongoing a production schema change on s2, which includes the following projects: https://noc.wikimedia.org/db.php#tabs-2 According to my monitoring, the schema change finished and now it should catch up.

Note that only 1 server (c1) was affected. c3 was unaffected, and that could have been used temporarily. The aim with the new architecture is not to avoid lag (that is not possible), but to change the active server to a non-lagged server transparently.

Thanks for the prompt response with explanation on what was ongoing. The cluster s2 now seems to be fully updated.

I highly recomend your code to integrate some kind of check for the heartbeat_p.heartbeat table to produce warnings or user notices when appropiate. More on that: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Identifying_lag

Lag will also eventually show up at our graphs, on places like: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1001 (but not at the moment)

No data after 20170517193000 available via Quarry from tables (recentchanges, revision, logging) for several Mediawiki databases (svwiki_p, fiwiki_p, nowiki_p, ...) Closed, ResolvedPublicActions

Description

Related Objects

Event Timeline

No data after 20170517193000 available via Quarry from tables (recentchanges, revision, logging) for several Mediawiki databases (svwiki_p, fiwiki_p, nowiki_p, ...)
Closed, ResolvedPublic
Actions