Page MenuHomePhabricator

Merging s6 wikis into s5
Closed, DeclinedPublic

Description

@Ladsgroup and myself have been talking about the possibility of merging s6 into s5.
s6 at the moment has 4 wikis (labswiki, frwiki jawiki and ruwiki) and it is not a very big section and the usage isn't massive.
In terms of disk space, it only has around 700GB.

s5 is also small in terms of space (around 800GB) and it is not very busy (it just has lots of tiny wikis as it's been the new default for wikis for a few years).
By merging s6 into s5 we'd simplify things in terms of sections (less masters) and we could relocate quite lots of hosts to other sections/usages or even not refreshing them and saving some budget.

Let's explore the possibility of doing this, which is a bit of hard work but doable if we believe it is worth the effort.

Event Timeline

Marostegui triaged this task as Medium priority.Oct 30 2025, 4:40 PM
Marostegui moved this task from Triage to Refine on the DBA board.

I believe we can instead either move some wikis to s6 (to relieve the number of open files to s3, which is a problem - but previously a proposal to move more wikis to s5 is declined), or reserve it for a future wiki with potentially high loads. E.g.

  • Move 50% of wikis from s3 to s6 (this may be potentially disruptive if things do not work well)
  • Move all closed wikis in s3 to s6 - much less disruptive
  • Move all private (and potentially fishbowl) wikis to s6
  • Reserve s6 for the upcoming Abstract Wikipedia wiki (and potentially move wikifunctions to s6)

Thanks for the comment @Bugreporter - this is just early stages of the conversation.

While splitting wikis is something more or less easy, merging wikis (especially active ones) is a complex and delicate operation, which is doable, but very risky so we really need to think about it. The good thing about moving s6 wikis is that we only would have 4 wikis to move, although large, we could do it wiki by wiki (it would be impossible to do 4 at the same time without causing issues in production).
Even doing 1 by 1 is likely to cause some degradation and that's why we need to evaluate if it is worth it or not.

s3 is a problem but for a few years we've not been using s3 as default so while the the amount of files is a problem, it is not a problem that is growing, it is under control at the moment. Moving hundreds of wikis is something very unlikely that we will do, as it is again (for a split), a very complex setup (with lots of replication filters involved and lots of things that could go wrong).
By moving closed wikis we would not be gaining much in that sense, if anything, we'd move active ones, private wikis could be a good candidate to move but again, I am not sure we are going to help much in s3 by just moving those.
Last time we moved just two of the very very small ones with pretty much no activity T259004 and that was hard and had to be coordinated with the DC switchover - so imagine if we have to move hundreds or dozens of them (we discarded moving more at the time just because of the complexity T226950). As I mentioned earlier, even moving these 4 would be difficult.

Again, I am not sure if we'll do any of this, this is very early conversations, but if we don't and we need extra HW for Abstract Wikipedia and/or wikifunctions, we will certainly have it.

it is not a problem that is growing, it is under control at the moment.

It may be an issue in long-term since new tables are added to MediaWiki and new extensions requiring tables are installed.

it is not a problem that is growing, it is under control at the moment.

It may be an issue in long-term since new tables are added to MediaWiki and new extensions requiring tables are installed.

Yes, of course, what I meant is that we are not adding new wikis (with all the set of tables all at once). As I said, we are good at the moment, and moving hundreds or dozens of wikis is a complex operation which I don't think is worth the risk at the moment.

s3 is a different problem requiring different solutions. We have made a lot of progress in that regard already (T397367#11245844), let's not try to fix too many issues with one action specially when it's basically pushing a square peg into a round hole.

I've spent quite sometime thinking about how to do this in the safest way and if it is really worth it, and I've come to the conclusion that it is not really worth the effort.
First of all, there's not an easy way to do this without RO time (and long one).
Of course all wikis would need to be migrated at different times because of their size:

# du -sh *wiki
214G	frwiki
89G 	jawiki
2.9G	labswiki
169G	ruwiki

The normal way we'd have to do this would be: take a logical backup from a replica (or the master) in s6 with its coordinates and place them in ALL replicas in s5 (this would take days), and once done, we'd need to start replication from s5 to replicate from s6 and pt-heartbeat would say days of lag, which would be interpreted by MW as real lag (because it is real lag - even if no one would be reading from those new wikis).
This would have to be done for each wiki and for each s5 replica.

Fast forwarding to the moment all the data is there, we'd have to set RO and change MW (which would just be a few mins).

Another possibility would be to disable a full DC before doing all this, and replicating only in the secondary DC, and coordinate with the DC switchover to do the MW change when we switch DC and then later, reclone the whole new secondary DC (again, this means leaving the DC depooled for days/weeks).

The only benefit of doing this would be to save some money on some HW, as we would not need to refresh all of them (although some would need to be moved to s5 to account for the additional reads).
Overall, I don't think this is really worth the risk and time.

If anything, what we could do if we can to maximize and balance things would be in a year or so, make s6 the new default.

I am considering declining this task if no one has strong opinions that this must be done.

I don't think we should make sections larger. If any, we should make them smaller so they can fit better on the same hardware or cheaper. Ideally no section should be larger than 300GB and fit on 256GB memory servers for faster operations and recovery.

There are things that could be done in that direction, plus reducing the number of objects per db, like splitting out centralauth, moving closed or tiny wikis to virtual machines, and so.

I've spent quite sometime thinking about how to do this in the safest way and if it is really worth it, and I've come to the conclusion that it is not really worth the effort.
First of all, there's not an easy way to do this without RO time (and long one).
Of course all wikis would need to be migrated at different times because of their size:

# du -sh *wiki
214G	frwiki
89G 	jawiki
2.9G	labswiki
169G	ruwiki

The normal way we'd have to do this would be: take a logical backup from a replica (or the master) in s6 with its coordinates and place them in ALL replicas in s5 (this would take days), and once done, we'd need to start replication from s5 to replicate from s6 and pt-heartbeat would say days of lag, which would be interpreted by MW as real lag (because it is real lag - even if no one would be reading from those new wikis).
This would have to be done for each wiki and for each s5 replica.

Fast forwarding to the moment all the data is there, we'd have to set RO and change MW (which would just be a few mins).

Another possibility would be to disable a full DC before doing all this, and replicating only in the secondary DC, and coordinate with the DC switchover to do the MW change when we switch DC and then later, reclone the whole new secondary DC (again, this means leaving the DC depooled for days/weeks).

The only benefit of doing this would be to save some money on some HW, as we would not need to refresh all of them (although some would need to be moved to s5 to account for the additional reads).
Overall, I don't think this is really worth the risk and time.

If anything, what we could do if we can to maximize and balance things would be in a year or so, make s6 the new default.

I am considering declining this task if no one has strong opinions that this must be done.

A rather radical way to do it is to do build a mechanism in mediawiki to copy over a whole wiki from one section to another and make sure writes are duplicated until we cut the old section. It might sound too crazy but 1- We do a similar thing for most of our data migration already. Only on column level 2- I will have to build a table duplicator for commons db split anyway (T408137) and making it work with all tables of a wiki is actually not too hard. I want to specifically start by moving small closed wikis out of s3 into a new "archive" section (and the writes on those wikis are so tiny that the move would be quite straightforward in terms of load). If it's stable enough, then I think we can do a move of labswiki from s6 to s5 first, cut the cord, then do frwiki, then cut the cord, etc. Once done, we kill s6.

In the longer term, I really hope we can move wikis around easily, the same way we move replicas from one section to another, I want us to be able to move a db from one section to another. Making the infra much more elastic.

I've spent quite sometime thinking about how to do this in the safest way and if it is really worth it, and I've come to the conclusion that it is not really worth the effort.
First of all, there's not an easy way to do this without RO time (and long one).
Of course all wikis would need to be migrated at different times because of their size:

# du -sh *wiki
214G	frwiki
89G 	jawiki
2.9G	labswiki
169G	ruwiki

The normal way we'd have to do this would be: take a logical backup from a replica (or the master) in s6 with its coordinates and place them in ALL replicas in s5 (this would take days), and once done, we'd need to start replication from s5 to replicate from s6 and pt-heartbeat would say days of lag, which would be interpreted by MW as real lag (because it is real lag - even if no one would be reading from those new wikis).
This would have to be done for each wiki and for each s5 replica.

Fast forwarding to the moment all the data is there, we'd have to set RO and change MW (which would just be a few mins).

Another possibility would be to disable a full DC before doing all this, and replicating only in the secondary DC, and coordinate with the DC switchover to do the MW change when we switch DC and then later, reclone the whole new secondary DC (again, this means leaving the DC depooled for days/weeks).

The only benefit of doing this would be to save some money on some HW, as we would not need to refresh all of them (although some would need to be moved to s5 to account for the additional reads).
Overall, I don't think this is really worth the risk and time.

If anything, what we could do if we can to maximize and balance things would be in a year or so, make s6 the new default.

I am considering declining this task if no one has strong opinions that this must be done.

A rather radical way to do it is to do build a mechanism in mediawiki to copy over a whole wiki from one section to another and make sure writes are duplicated until we cut the old section. It might sound too crazy but 1- We do a similar thing for most of our data migration already.

You'd still have the initial data import problem, your whole replication chain would show lag until you are caught up with the master and then you can start writing to both masters at the same time (which have lots of problems itself, but we can discuss that in a different task).
On the other hand, if your idea is to start from scratch with the table structure fully and fill that via a MW mechanism from one wiki to another, you'd have to be able to insert faster than production as otherwise it'd never catch up.

Splitting things is way faster in production and doesn't have so much risk from our side (not talking about the MW side of the duplicator)

I think your merge idea needs lots of other discussions (including if it is worth the effort) so I am going to close this as declined for now (as the idea for the task was very specific with wikis from s6 to s5). If you want to discuss your other ideas in a different task (which is probably an epic) please subscribe me there!