Page MenuHomePhabricator

adywiki and jamwiki are missing the associated *_p databases with appropriate views
Closed, ResolvedPublic

Description

The maintain-replicas.pl script is used to create a number of views on labsdb. This script has not been run for some new labs replicas (including adywiki) and thus they lack those views.

$ sql adywiki_p
ERROR 1049 (42000): Unknown database 'adywiki_p'

Event Timeline

MaxSem created this task.May 11 2016, 6:14 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 11 2016, 6:14 PM
Krenair added a subscriber: Krenair.

Am guessing this needs ops to run maintain-replicas.pl

Dzahn added a subscriber: Dzahn.May 11 2016, 11:07 PM

it's a "labsdb" thing afaict.

one hit in entire wikitech " springle: ran operations/software maintain-replicas.pl and fedtables.pl on labsdbs"

Dzahn changed the task status from Open to Stalled.May 11 2016, 11:13 PM
debt added a subscriber: debt.May 13 2016, 3:05 PM

Hi!

Checking on the progress of this ticket...

We're waiting for this to be looked at / fixed before we can update the wikipedia.org portal stats.

Thanks!

Dzahn removed a subscriber: Dzahn.May 13 2016, 3:16 PM

Checking on the progress of this ticket...

task status is stalled, we're waiting for ops

Dzahn added a subscriber: Volans.May 13 2016, 9:38 PM

@Volans could you take a look maybe?

jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.EditedMay 16 2016, 2:38 PM
jcrespo added a subscriber: jcrespo.

This is not a DBA ticket, I made sure that replica was working and safe (filtered) literally hours after the wiki was setup:

$ mysql -h labsdb1001 adywiki -e "SELECT max(rc_timestamp) FROM recentchanges"
+-------------------+
| max(rc_timestamp) |
+-------------------+
| 20160516125342    |
+-------------------+

I suppose some cron/script maintenance on Cloud-Services may be failing to execute?

What worries me is that, what I suppose is a core production service depends on a non-core service; and it is not running on the analytics or vslow core production host(!). Of course this particular issue should be fixed, but could you clarify the usage needs for the portal?

Krenair added a comment.EditedMay 16 2016, 2:41 PM

This is not a DBA ticket, I made sure that replica was working and safe (filtered) literally hours after the wiki was setup:

$ mysql -h labsdb1001 adywiki -e "SELECT max(rc_timestamp) FROM recentchanges"
+-------------------+
| max(rc_timestamp) |
+-------------------+
| 20160516125342    |
+-------------------+

That's useless without the adywiki_p views.

I suppose some cron/script maintenance on Cloud-Services may be failing to execute?

I'm not aware of anything failing to execute, I think ops haven't attempted to execute the script.

What worries me is that, what I suppose is a core production service depends on a non-core service; and it is not running on the analytics or vslow core production host(!). Of course this particular should be fixed, but could you clarify the usage needs for the portal?

The portal thing? I think they generate the stats in labs and upload static pages to production.

Can someone update this task description? As-is I have no idea what's going on here. I know this is in large part because of my own lack of contextual knowledge on the labsdb use case in question but this task is cryptic at best.

Am guessing this needs ops to run maintain-replicas.pl

What is this intended to do?

debt added a comment.May 16 2016, 5:14 PM

Hi there - we need to update the stats on the wikipedia.org portal page on a regular basis. We've recently found that the script we run to do this isn't actually catching the updated stats for us to put into production.

chasemp updated the task description. (Show Details)May 16 2016, 6:15 PM
MaxSem updated the task description. (Show Details)May 16 2016, 6:17 PM

I've edited the description back. The issue in this ticket is unavailability of adywiki replicas. The updates are discussed in a blocked ticket. Portals is far from the only thing that's affected or can possibly be affected by this issue.

This is not a DBA ticket, I made sure that replica was working and safe (filtered) literally hours after the wiki was setup:

$ mysql -h labsdb1001 adywiki -e "SELECT max(rc_timestamp) FROM recentchanges"
+-------------------+
| max(rc_timestamp) |
+-------------------+
| 20160516125342    |
+-------------------+

I suppose some cron/script maintenance on Cloud-Services may be failing to execute?
What worries me is that, what I suppose is a core production service depends on a non-core service; and it is not running on the analytics or vslow core production host(!). Of course this particular issue should be fixed, but could you clarify the usage needs for the portal?

@jcrespo this is a bit shrouded in mystery with no documentation. It seems post replication someone would run maintain-replicas.pl manually but no one around now may have ever done it.

@jcrespo this is a bit shrouded in mystery with no documentation. It seems post replication someone would run maintain-replicas.pl manually but no one around now may have ever done it.

@coren , any chance you can help a bit with that ? I think just a few guiding pointers would be greatly appreciated.

chasemp added a subtask: Restricted Task.May 18 2016, 2:37 PM
jcrespo closed subtask Restricted Task as Resolved.May 18 2016, 3:04 PM
Gehel added a subscriber: Gehel.Jun 7 2016, 6:07 PM
Gehel added a comment.Jun 10 2016, 9:08 AM
This comment was removed by Gehel.
Gehel renamed this task from No replica for adywiki to adywiki is missing the associated adywiki_p database with appropriate views.Jun 21 2016, 6:33 PM
Gehel updated the task description. (Show Details)
MaxSem renamed this task from adywiki is missing the associated adywiki_p database with appropriate views to adywiki and jamwiki are missing the associated *_p databases with appropriate views.Jun 21 2016, 6:56 PM
Gehel added a subscriber: ori.Jun 21 2016, 9:58 PM

Summary of a discussion with @ori:

The maintain-replicas script creates a new schema that contains only tables that needs to be replicated and only the columns from those tables that needs to be replicated, ensuring in the process that sensitive data is not made available to labs users.

The current maintain-replicas script needs some refactoring / rewriting. The lack of separation between configuration and logic makes it difficult to update the list of tables / columns replicated. The list of databases is read directly from the script, it should be externalized so that we can run that script on a specific database.

At the minimal, we need to be able to run that script on a single database to limit risk.

I agree, although it needs to be a separate ticket and I don't think we can justify blocking this on a full rewrite, given that it's been run before. The script not only needs to deal with those views but it also needs to update the meta_p table. I guess we could could replace the section where it reads all.dblist with some hack setting up just adywiki and jamwiki, and remove the sql("DELETE FROM meta_p.wiki;");.

Change 295564 had a related patch set uploaded (by Ori.livneh):
[NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki

https://gerrit.wikimedia.org/r/295564

ori added a comment.Jun 22 2016, 6:51 PM

Sample invocation:

$ DB_USER=root DB_PASS=password perl maintain-replicas.pl 2> T135029.sql

Mentioned in SAL [2016-06-22T22:25:04Z] <ori> Ran hacked maintain-replicas.pl on labsdb100[13] for T135029

ori added a comment.Jun 22 2016, 10:28 PM

I ran it against labsdb1001 and labsdb1003 and it seems to have done the trick. labsdb1002's MySQL instance isn't running, so that one was not updated.

Confirmed working. Let's keep this bug open until labsdb1002 also gets fixed.

Thank you @ori! I believe the issues with labsdb1002 are T126946: disk failure on labsdb1002

jcrespo added a comment.EditedJun 23 2016, 7:18 AM

labsdb1002 will never get fixed. It is being unracked and replaced as we speak.

Jdforrester-WMF closed this task as Resolved.Jun 23 2016, 8:09 AM
Jdforrester-WMF claimed this task.
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

In that case, I'm declaring this fixed.

Krenair reassigned this task from Jdforrester-WMF to ori.Jun 23 2016, 1:58 PM

Thanks to everyone who helped get this unstuck and fixed!

Change 295564 abandoned by Ori.livneh:
[DNM] hack maintain-replicas.pl for adywiki/jamwiki

https://gerrit.wikimedia.org/r/295564

Does this patch being abandoned mean that this issue is no longer fixed?

No, the proper version of the script was replaced in T138450, Ori's DNM version of the old script was useful for this but is no longer necessary.