Page MenuHomePhabricator

x1 cannot be set to read only on MW
Closed, DeclinedPublic

Description

x1 is the only MW section that cannot be set to read-only from MW side.
Everytime we need to do maintenance on x1 master (ie: master swaps) we need to go hard read-only and set it directly on MW level.

There should be a way to set up x1 to read only like we do with any other section using:

dbctl --scope eqiad section x1 ro "Maintenance" dbctl config commit -m "Set x1 eqiad as read-only for maintenance"

And then remove it with

dbctl --scope eqiad section x1 rw
dbctl config commit -m "Set section read-write"

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle triaged this task as Medium priority.Jan 10 2022, 8:04 PM

We discussed this in the team meeting and as I understand it, Tim and Aaron both believe this to be intentional.

The external dbs are shared by all wikis. If they are unwritable then that would effectively put all wikis in read-only mode. If that is needed, then that can be configured using wgReadOnly in db-production.php.

As I understand it, the issue with supporting readonly mode in LBFactoryConf for something other than core sections, is that the information would not be used when you want it to. The main benefit of our passive readonly mode is to allow user interfaces to recognise the wiki to be in read-only mode before we attempt to establish connections etc. This passive readonly mode is read from the config based only on which wiki we are, not based on the individual db host or external dbs. By the time that we're past the passive readonly check, and have already started a write transaction on the core db, and then possibly write to an external db, that's already quite late into the process. If by then we discover one to be in read-only, we might as well discover it from mysql telling us rather than passive, since we've already lost the benefits of the passive check.

Thanks @Krinkle - so do be fully sure, MW handles mysql being on read_only (without a MW flag) on a nice way and does whatever it needs to do to show a proper exception to the user for x1?

Thanks @Krinkle - so do be fully sure, MW handles mysql being on read_only (without a MW flag) on a nice way and does whatever it needs to do to show a proper exception to the user for x1?

Maybe. Depends on what we consider "nice". I would probably say, no, it's not nice.

When viewing the edit form on an article, we check if the wiki is in read-only mode. This is on a GET request when serving the edit form, not only during a write. When in read-only, we inform people of this with a notice so that they know the wiki is temporarily in maintenance. This site-wide check considers wgReadOnly, and readOnlyBySection for the current wiki, as well as a 5-second cached result of querying read_only from the current wiki's primary core db.

So, yes, if a core db were suddenly set in read_only without MW flag, then it would look nice and the same with the MW flag. The difference is that without the MW flag, we rely on establishing a connection to the primary db and waiting for the result of querying read_only. If the primary db is going to be restarted, or temporarily unreachable or slow, then we would not want to rely on this as doing so would be risky and cause timeouts for end-users, piling up web requests. It's better to let writes fail fast without trying first. But, if the db is expected to stay up and responsive during the read-only mode, then it's probably fine not to set the MW flag for core dbs.

For a non-core db, we only know which db gets involved when we're already in the middle of saving an edit, and have already started a write transaction on the core db. We don't support a way to configure read-only there, and generally don't query read_only here either. We just try to write. Having said that, MW will understand the sql query failure and respond with a DB error page. However, this will render as an unexpected DB error, with an HTTP 5xx status code, similar to a timeout or sql syntax error. If this is planned for more than a few minutes, it would benefit the user experience by setting relevant wikis in read-only mode via config instead so that they won't get surprised with system errors late in their process.

If this is planned for more than a few minutes, it would benefit the user experience by setting relevant wikis in read-only mode via config instead so that they won't get surprised with system errors late in their process.

That's the whole goal of this task, being able to set x1 on RW for maintenance (switchovers, schema changes, deamon restarts etc) that need to happen on the master
Some schema changes are particularly hard do deploy with replication enabled, as x1 uses RBR some schema changes are hard to deploy without this flag: T255174.

That's the whole goal of this task, being able to set x1 on RW for maintenance.

I understand but x1 is used by all wikis. There are no requests where we only write to x1 without also needing to write to a core db during the same virtually shared transaction (and shared rollback if one fails). To set all wikis in friendly maintenance mode, we have wgReadOnly. To set wgReadOnly for all wikis at once, one can either do so via db-production.php with a MW config change, or via Etcd from the command-line with confctl1 by setting eqiad/ReadOnly (wikitech:MediaWiki_and_EtcdConfig).

If this is a proposal for Rdbms, I consider this resolved. We have wgReadOnly, which is for this kind of maintenance.

If this is a proposal for dbctl, I suggest declining it but I'll let you decide. If we were to build a way to set x1 in friendly readonly, the only correct behaviour would be for this to effectively set all wikis in readonly mode. I suspect that if an attempt to set x1 to readonly via dbctl would quietly do that in response, it could be confusing or surprising to an unsuspecting operator. Maybe dbctl could instead print a message to remind the operator of this constraint.

In this regard, x1 is similar to ES hosts, (soon) x2, and ParserCache hosts (though PC is secondary and fails gracefully). These were all modelled with the expectation that schema changes and other maintenance can happen over a master switch, which naturally avoids user-impacting readonly mode or other downtime.

I recognise that x1 replication setup is incompatible with performing the currently pending schema change over a master switch. Maybe that should inform a higher-level conversation about this replication setup and/or adjust schema change policy such that we require a more complex migration plan that makes the schema changes compatible with this replication setup. Or maybe we need to revisit whether we want to support cross-wiki functionality through a shared database. But as it works today, x1 is a shared database, and to set a friendly maintenance mode for it, all wikis need to be set in read-only. This isn't a limitation in the Rdbms code, but a logical constraint based on what x1 is. There is no scenario in which MW starts responding to a web request and be interested in x1's readonly mode specifically. This flag would either be unused, or we'd proactively consider it on all requests as an indirect way of setting wgReadOnly in which case it seems preferable for understanding impact to set that directly.

That's the whole goal of this task, being able to set x1 on RW for maintenance.

I understand but x1 is used by all wikis. There are no requests where we only write to x1 without also needing to write to a core db during the same virtually shared transaction (and shared rollback if one fails). To set all wikis in friendly maintenance mode, we have wgReadOnly. To set wgReadOnly for all wikis at once, one can either do so via db-production.php with a MW config change, or via Etcd from the command-line with confctl1 by setting eqiad/ReadOnly (wikitech:MediaWiki_and_EtcdConfig).

If this is a proposal for Rdbms, I consider this resolved. We have wgReadOnly, which is for this kind of maintenance.

Yeah, having to set all production wikis to read-only is a no go for me. I had the hope that x1 could be treated as "standalone" in that regard.

If this is a proposal for dbctl, I suggest declining it but I'll let you decide. If we were to build a way to set x1 in friendly readonly, the only correct behaviour would be for this to effectively set all wikis in readonly mode. I suspect that if an attempt to set x1 to readonly via dbctl would quietly do that in response, it could be confusing or surprising to an unsuspecting operator. Maybe dbctl could instead print a message to remind the operator of this constraint.

I don't really mind if it needs to be done via dbctl or MW via scap. But it looks like it is not going to happen anyways cause of how MW is built.

In this regard, x1 is similar to ES hosts, (soon) x2, and ParserCache hosts (though PC is secondary and fails gracefully). These were all modelled with the expectation that schema changes and other maintenance can happen over a master switch, which naturally avoids user-impacting readonly mode or other downtime.

I recognise that x1 replication setup is incompatible with performing the currently pending schema change over a master switch. Maybe that should inform a higher-level conversation about this replication setup and/or adjust schema change policy such that we require a more complex migration plan that makes the schema changes compatible with this replication setup. Or maybe we need to revisit whether we want to support cross-wiki functionality through a shared database. But as it works today, x1 is a shared database, and to set a friendly maintenance mode for it, all wikis need to be set in read-only. This isn't a limitation in the Rdbms code, but a logical constraint based on what x1 is. There is no scenario in which MW starts responding to a web request and be interested in x1's readonly mode specifically. This flag would either be unused, or we'd proactively consider it on all requests as an indirect way of setting wgReadOnly in which case it seems preferable for understanding impact to set that directly.

We could also explore setting x1 to SBR instead of RBR which would allow some schema changes to happen the way we normally do it (replicas and the master switchover). There are historic reasons why RBR was set on x1, but I don't think we would have much problems moving it back to SBR (like the majority of our sections).

I guess this is a "won't fix" from MW point of view then.

I feel your frustration. I'm not tied to how any x1-hosted features work. I know that (apart from Echo) most are standalone and could have their own readonly mode.

A few more spurious notes:

  • Echo can't be read-only by itself as it reacts to edits. For planned maintenance it seems desirable to not lose Echo notifications. While Echo is essential to edits, its logic is decoupled and asynchronous. We could have it write only via the job queue. Then we can adopt a practice of pausing the job queue during x1 maintenance, and keep core dbs writable. This would work if the maintenance window is short enough to be the delay of notification delivery. Would that work?
  • Flow and CX (also on x1) are more traditional web apps and need synchronous writes. But, these are standalone and can have their own readonly mode. That would take the form of something like wgFlowReadOnly (which exists and is mostly done), to be used for similar RBR-incompatible schema changes to Flow tables.

There are historic reasons why RBR was set on x1, but I don't think we would have much problems moving it back to SBR (like the majority of our sections).

From my perspective, the inflexibility with x1 seems to be a consequence of what we put there and how the dbhost was subsequently changed to RBR. It does not appear to be related to how the nine extensions using x1 work, or MW itself. Nothing in MW requires a shared database. There is no expectation for x1-hosted features to be on the same host, afaik.

I think people thought it was useful to colocate these to save on resources, and that this didn't compromise flexibility so long as the extensions have no coupling (we can move any of the x1 dbs elsewhere, anytime we want) and we can do maintenance without downtime using master switches. RBR kind of changes everything.

For example, we have some unrelated extension tables that are even under the same database name within x1 (wikitech:x1.wikishared). There is no reason for a super broad "wikishared" db to exist, except that nobody had a reason to avoid it I guess.

I'm curious what you'd recommend as database best practice. I understand that RBR is generally preferable over SBR when possible, for better consistency. However with its limitation around schema changes, what does that mean for zero-downtime migrations? Does the industry not aim for zero-downtme when combined with RBR? Or do we go in the direction of more localised readonly mode? Is it feasible for schema changes to be prepared in a different way that would make it compatible with RBR? I wouldn't mind adopting a more strict or complex schema change policy if that's what we need.

I also wouldn't mind more localised readonly mode, but that would be a major shift. Not a major shift for MW, but a major shift for where we host the individual dbs, and would likely want much less sharing to avoid x1 being "too big to fail". Perhaps standalone and readonly-capable features like Flow, CX, ReadingLists, could remain, and then features that integrate with core actions (like Echo) go elsewhere.

@Krinkle wrote:

We could have Echo write only via the job queue. Then we can adopt a practice of pausing the job queue during x1 maintenance, and keep core dbs writable. Would that work?

I suspect that is still insufficient given read_only set on the dbhost as a whole, right? Or can we skip setting read_only if we're confident the table being changed won't be written to, keeping the other x1 dbs writable and replicating? (I'm aware tables under the same dbname share a replication stream, I'm unsure about dbs on the same host, maybe as well?)

I'm a bit confused what you mean by "zero-downtime migrations". All primary switches involve downtime (in the form of a brief windows of r/o mode). You also seem to be thinking that we only do primary switches for schema changes. The next one for x1 (T300472) is for upgrading the OS to debian bullseye. RBR vs SBR has no effect there. We also do them for kernel and firmware upgrades, for example.

I feel your frustration. I'm not tied to how any x1-hosted features work. I know that (apart from Echo) most are standalone and could have their own readonly mode.

A few more spurious notes:

  • Echo can't be read-only by itself as it reacts to edits. For planned maintenance it seems desirable to not lose Echo notifications. While Echo is essential to edits, its logic is decoupled and asynchronous. We could have it write only via the job queue. Then we can adopt a practice of pausing the job queue during x1 maintenance, and keep core dbs writable. This would work if the maintenance window is short enough to be the delay of notification delivery. Would that work?
  • Flow and CX (also on x1) are more traditional web apps and need synchronous writes. But, these are standalone and can have their own readonly mode. That would take the form of something like wgFlowReadOnly (which exists and is mostly done), to be used for similar RBR-incompatible schema changes to Flow tables.

There are historic reasons why RBR was set on x1, but I don't think we would have much problems moving it back to SBR (like the majority of our sections).

From my perspective, the inflexibility with x1 seems to be a consequence of what we put there and how the dbhost was subsequently changed to RBR. It does not appear to be related to how the nine extensions using x1 work, or MW itself. Nothing in MW requires a shared database. There is no expectation for x1-hosted features to be on the same host, afaik.

I think people thought it was useful to colocate these to save on resources, and that this didn't compromise flexibility so long as the extensions have no coupling (we can move any of the x1 dbs elsewhere, anytime we want) and we can do maintenance without downtime using master switches. RBR kind of changes everything.

For example, we have some unrelated extension tables that are even under the same database name within x1 (wikitech:x1.wikishared). There is no reason for a super broad "wikishared" db to exist, except that nobody had a reason to avoid it I guess.

I'm curious what you'd recommend as database best practice. I understand that RBR is generally preferable over SBR when possible, for better consistency. However with its limitation around schema changes, what does that mean for zero-downtime migrations? Does the industry not aim for zero-downtme when combined with RBR? Or do we go in the direction of more localised readonly mode? Is it feasible for schema changes to be prepared in a different way that would make it compatible with RBR? I wouldn't mind adopting a more strict or complex schema change policy if that's what we need.

I also wouldn't mind more localised readonly mode, but that would be a major shift. Not a major shift for MW, but a major shift for where we host the individual dbs, and would likely want much less sharing to avoid x1 being "too big to fail". Perhaps standalone and readonly-capable features like Flow, CX, ReadingLists, could remain, and then features that integrate with core actions (like Echo) go elsewhere.

@Krinkle wrote:

We could have Echo write only via the job queue. Then we can adopt a practice of pausing the job queue during x1 maintenance, and keep core dbs writable. Would that work?

I suspect that is still insufficient given read_only set on the dbhost as a whole, right? Or can we skip setting read_only if we're confident the table being changed won't be written to, keeping the other x1 dbs writable and replicating? (I'm aware tables under the same dbname share a replication stream, I'm unsure about dbs on the same host, maybe as well?)

So many questions here! :-)
We need to keep in mind that we'll always need switchovers, and the idea I had behind this task was to make them as less painful as possible for the users by capturing the read-only in a nicer way that we do now as we simply need to set the database read-only the hard way.

Normally a switchover doesn't take longer than one minute, and sometimes even less than 45 seconds, which is not 0 downtime but I doubt we can do it even faster, and faster meaning maybe 30 seconds but it is hard to make it even less. Being 45 seconds already good from my point of view.
When things go wrong, then the window obviously gets larger and we care more about the integrity of the data than capturing and showing nice errors.

RBR is not always the best solution, sometimes it is but it really depends on the workload of your environment. For data integrity it is as it will let you know if the data between the master and the slave is inconsistent, it will let you know by breaking replication, which can of course create other problems on MW side. But it is the way RBR has to let you know that the data isn't the same, for better or worse :-)
The reason we don't run RBR everywhere is because we are not 100% sure our data is consistent across all the hosts - we have done lots of efforts to address those issues and most of our massive and important tables have been fixed, but with such huge amount of data it is impossible to be 100% sure, especially after more than 20 years storing data. Having RBR could be a nice way to detect that, but having replication broken will trigger all sort of other cascade failures in MW.

For x1, we could temporarily switch from RBR to SBR (it is a live change config in mysql, which in the past didn't work great, but I am sure it's improved now, it is a matter of testing and regaining confidence again on it). I am not too worried about this specific case (schema changes) but about the fact that we have such infrastructure that is hard to put in "maintenance" mode without affecting the rest of the infra (all the other core sections).

You've explained with great detail why it cannot be done (easily), and I am thankful for that!. There's probably not much left on this ticket to discuss (I am fine if we want to discuss the future of x1 somewhere else though!).

I'm a bit confused what you mean by "zero-downtime migrations".

You're right, this wasn't the best phrase to use. Especially given that even the act of deploying code still regularly results in increased HTTP 5xx errors (something I consider feasible and expected for a majority of deployments to avoid).

What I meant was a schema migration, however we apply it, where any write requests that could fail as result of the operation are small enough in number that they are within a hypothetical error budget.

As I understand it, most schema changes are written in a backward-compatible manner for the core databases, and applied to a gracefully depooled DB host without needing the primary DB or web service to enter a read-only mode.

There is then eventually a brief switch of the primary host to complete the migration, which you've improved to happen in under a minute. That's short enough that it might even be reasonable to consider doing this only at the DB level (e.g. pool new r-o primary for future reqs first and/or fail on-going writes once). This would depend on how commonly the DB in question is actually written to. For example, while s7 may be written to by any wiki from any request (for centralauth and such), in practice most requests don't do this so we might not need to set all the wikis in read-only mode when switching s7's primary DB.

You also seem to be thinking that we only do primary switches for schema changes.

I do not think that. We've gone through many such switches for other reasons. If my comment suggested otherwise, I was wrong :)

Yeah, having to set all production wikis to read-only is a no go for me. […]

We could also explore setting x1 to SBR instead of RBR which would allow some schema changes to happen the way we normally do it (replicas and the master switchover).

[…] Normally a switchover doesn't take longer than one minute, […].

I discussed this with @tstarling again in today's perf meeting. I suspect your "no-go" assessment was based on the change taking more than a minute, is that right? From our perspective it seems fine to have a minute of read-only mode for all wikis with SBR mode. This can be done fairly rapidly from the command-line using MWs etcd config, without delay and overhead from Scap. I believe this would not warrant site-wide announcements. We automatically induce similar short-lived maintenance windows through replication lag detection, for example.

To my knowledge we do not have strategies prepared or prior experience with schema changes while in SBR mode. That's something we can explore in the future if desired/needed.

Yeah, having to set all production wikis to read-only is a no go for me. […]

We could also explore setting x1 to SBR instead of RBR which would allow some schema changes to happen the way we normally do it (replicas and the master switchover).

[…] Normally a switchover doesn't take longer than one minute, […].

I discussed this with @tstarling again in today's perf meeting. I suspect your "no-go" assessment was based on the change taking more than a minute, is that right? From our perspective it seems fine to have a minute of read-only mode for all wikis with SBR mode. This can be done fairly rapidly from the command-line using MWs etcd config, without delay and overhead from Scap. I believe this would not warrant site-wide announcements. We automatically induce similar short-lived maintenance windows through replication lag detection, for example.

No, it won't take a minute, it can take a lot longer, for T255174, the table is around 6GB, so I would expect that to take like 5-10 minutes (and then it will replicate to the slaves, so we'd need to also account for the lag it will create - not sure how x1 handles the lag, but on sX sections it makes the section go read only if all hosts are lagging behind).

If we use SBR, we can execute the change on the replicas and then do a switchover.

The original point of this task wasn't to talk about schema changes specifically, but to be able to set read-only on a nice way on x1. As I expressed on T298876#7674469, I am fine declining this ticket.

I'll close this then. Thanks for explaining, I learned a thing or two.