Page MenuHomePhabricator

labtestweb2001 is sending updates to a read-only db host: db2037
Closed, ResolvedPublicPRODUCTION ERROR

Description

I'm seeing quite a few read-only errors in fatalmonitor. I didn't hold up the train because the failures don't appear to be related to the deployment of 1.32.0-wmf.15 (and the error rate hasn't changed after deploying to group2)

These are apparently all coming from labtestweb2001

channel:exception
[f812f324c280f1d54cec7be6] [no req]   JobQueueError from line 828 of /srv/mediawiki/php-1.32.0-wmf.15/includes/jobqueue/JobQueueDB.php: Wikimedia\Rdbms\DBQueryError:
A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? `
Query: UPDATE  `job` SET job_token = 'f06fded40378d34ed3a7621fd30d58d0',job_token_timestamp = '20180802191202',job_attempts = job_attempts+1 WHERE job_cmd = 'cirrusSearchCheckerJob' AND job_id = '3487' AND job_token = ''`

Function: JobQueueDB::claimRandom

Error: 1290 The MariaDB server is running with the --read-only option so it cannot execute this statement` (10.192.32.8)

See https://logstash.wikimedia.org/goto/d53c9f8625b96c9107ba83dcc72c97ac

Event Timeline

db2037 is read-only because it is m5 codfw master, and as codfw is the passive DC, nothing should write to it.
Whatever is trying to write to it, instead of using m5-master (db1073) probably needs to be reviewed.
Maybe @Andrew can help with this?

jcrespo subscribed.

I am going to guess that is labtestwiki, which was added to db2037, but cannot live there, as there can only be one active (read-write) master at the moment, and that is m5-master.eqiad.wmnet . This is something that cloud people were already told, and we offered to setup a separate instance somewhere else (can be even the same host, but not the same instance).

mmodell renamed this task from db2037 is read only? to labtestweb2001 is sending updates to a read-only db host: db2037.Aug 3 2018, 6:44 PM
Krinkle updated the task description. (Show Details)

The context here is that a while ago I moved the local labtestwikitech database off of labtestweb2002 because Jaime asked me to 'productionize' it and I misunderstood his request. I could certainly just move it back there (although that would leave me with the puzzle of what 'productionize' means) or we could move labtestwikitech to db1073 (which would require some kind of tunnel encryption) or... I'm open to suggestions.

It is not currently a problem to have labtestwikitech be read-only, other than that the error messages seem to alarm people. At some point when I take back up the task of moving wikitech to an SUL wiki I'll probably want it read/write again.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM
Andrew claimed this task.

using a new, local-to-codfw1dev-database now