Page MenuHomePhabricator

Setup separate logical External Store for Flow in production
Closed, DeclinedPublic

Description

Cluster name: flow_cluster1


This is in-progress work to see if we can set up a separate logical External Store cluster for Flow.

The first statement of https://gerrit.wikimedia.org/r/#/c/223724/1/puppet/modules/role/templates/externalstore/conf.php.erb indicates basically what is needed to set up a new logical cluster (except it would have some Flow-specific name on the first line).

To research before Tuesday:

Rows/size needed for Flow (number of distinct rev_content and rev_content_length should approximate this):

mysql:research@x1-analytics-slave [flowdb]> SELECT COUNT(DISTINCT rev_content), SUM(rev_content_length) from flow_revision;
+-----------------------------+-------------------------+
| COUNT(DISTINCT rev_content) | SUM(rev_content_length) |
+-----------------------------+-------------------------+
|                      186806 |                41118230 |
+-----------------------------+-------------------------+
1 row in set (2.14 sec)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

"@jcrespo indicated this should be fine, as long as it doesn't require new hardware, which it does not"

I didn't say that. I do not have a call on what hardware is purchased or not, that is operations manager or someone elses's manager.

I'm sorry if I didn't describe our conversation clearly. I certainly didn't mean to misrepresent it.

I thought by "I will figure out where is the best place to do it from a DBA/ops point of view (machines, replication, etc.)" you were saying that it's possible, but not necessarily on dedicated hardware.

(I was just trying to clarify in the description that the Collaboration team is not saying we require dedicated hardware).

I understand you were not approving a hardware purchase, and didn't mean to indicate you were.

Mattflaschen-WMF renamed this task from Setup separate External Store for Flow to Setup separate logical External Store for Flow.Jul 31 2015, 7:18 PM

I finally understood what is the requirement- you need to separate away Flow data from regular page content, for many reasons, but specially to unblock the compression process.

My suggestion is to separate it logically, but not physically for 2 reasons:

  • Right now, the amount of data in flow is very small, and it would be unreasonable to purchase a new hardware cluster for 40MB of data (while a cluster can handle half a dozen of TBs). That could change in the future, and the steps would be the same- logic separation first. Please note that this is also a guarantee of its maintenance, as the service will be as important (and well or bad maintained) as the rest of the ES content, as it is physically on the same place.
  • With the new server purchase, you will want to take advantage of the improved performance of the new servers. Staying on the same cluster(s) will actually be preferable than migrating to any other place, as these servers will be the newest of all databases. Also, the intended new storage will be 6T per server in RAID 10, with enough growth room for 3+ years.

It will also be easier to, at least for now, reuse all possible code of the ES server, and just store it on a separate table.

The current setup of ES servers is:

  • Datacenters: 2 geographically separated datacenters with everything replicated to allow for business continuity in case of a large disaster
  • Shards: 1-3; 1 is a read-only shard for old revisions, 2-3 r/w are separated physically to improve the high availability and performance.
  • Each shard has a master (where writes go) and several slaves, which allow read-write split for further redundancy and performance.
  • On each host, there is a database per wiki (enwiki, dewiki, commonswiki, etc.)
  • On each database, there are 1 or several tables in the form: blobs_clusterXX, where XX is just a separation to avoid very large tables, which easies maintenance.

The idea is to separate rows from flow from blobs_clusterXX into "blobs_flowXX" (or whatever name you prefer). You can continue using the same configuration variable or have a separate one which happens to have the same IP values, as you prefer (reusing it avoids errors, creating a new one easies future hardware separation).

On migration from one state to another you will have the support of the DBAs to make sure that the transition can be done both reliably and with the least amount of downtime possible, either by commenting on the proposed scripts to make sure nothing dangerous is performed, or by relaying on us to do it for you.

You will probably have to coordinate with other mediawiki developers too in order to make sure your changes are compatible with the newest proposals in content/model types.

Mattflaschen-WMF lowered the priority of this task from Unbreak Now! to High.Aug 5 2015, 11:11 PM

My suggestion is to separate it logically, but not physically for 2 reasons:

I think this should be fine.

The idea is to separate rows from flow from blobs_clusterXX into "blobs_flowXX" (or whatever name you prefer). You can continue using the same configuration variable or have a separate one which happens to have the same IP values, as you prefer (reusing it avoids errors, creating a new one easies future hardware separation).

This sounds good. We will use a different variable ($wgFlowExternalStore) as we already do.

On migration from one state to another you will have the support of the DBAs to make sure that the transition can be done both reliably and with the least amount of downtime possible, either by commenting on the proposed scripts to make sure nothing dangerous is performed, or by relaying on us to do it for you.

Yep, we will definitely coordinate with you. I expect this task to get re-activated after the migration to the new hardware.

You will probably have to coordinate with other mediawiki developers too in order to make sure your changes are compatible with the newest proposals in content/model types.

If you're referring to https://www.mediawiki.org/wiki/Requests_for_comment/Content_model_storage , I don't think it impacts that (the author of that RFC is on our team and CCed).

DannyH lowered the priority of this task from High to Medium.Aug 21 2015, 11:14 PM
DannyH raised the priority of this task from Medium to High.
Mattflaschen-WMF added a project: DBA.

@jcrespo, I think this is the next concrete step (T119568: Run External Store migration for real on Beta will get QA-ed, but that's not a blocker).

I'm assigning it to you.

I suggest the name flow_cluster1.

Let me know if you want to resync. I'll be at Wikimania next week, and on vacation after that. However, I will still be on IRC and email intermittently while at Wikimania.

Mattflaschen-WMF renamed this task from Setup separate logical External Store for Flow to Setup separate logical External Store for Flow in production.Jun 16 2016, 11:16 PM
Mattflaschen-WMF updated the task description. (Show Details)

Do do know when was flow enabled for the first time/what is the oldest content we will find?

The very first Flow commit was Wed Jul 10 23:05:11 2013 +0100.
It was enabled on labs on Wed Sep 4 16:28:43 2013 -0700

The oldest content (by creation date) is LQT conversions (LQT first comment was in 2007).
But during the import, Flow created new ExternalStore entries for those.
We started working on the import script on Wed Jan 15 11:08:58 2014, so all import entries are more recent than that.

By timestamp:

Converted from UUID with:

Flow\Model\UUID::create( strtolower( '<HEX>' ) )->getTimestamp();

:

Earliest rev_id on x1: 2009-11-08 090609
Earliest rev_id on officewiki: 2010-04-07 215825

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(rev_id) FROM flow_revision ORDER BY rev_id LIMIT 1;
+------------------------+
| HEX(rev_id)            |
+------------------------+
| 04934C2C4FE49E53037756 |
+------------------------+
1 row in set (0.00 sec)

mysql:research@x1-analytics-slave [flowdb]> Bye
mattflaschen@stat1003:~$ mysql -h s2-analytics-slave officewiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 67781541
Server version: 5.5.5-10.0.22-MariaDB MariaDB Server

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:research@s2-analytics-slave [officewiki]> SELECT HEX(rev_id) FROM flow_revision ORDER BY rev_id LIMIT 1;
+------------------------+
| HEX(rev_id)            |
+------------------------+
| 049F69201A1547E44879C2 |
+------------------------+
1 row in set (0.05 sec)

The question was because, unlike beta, the setup is a bit more complex: we have read-write and read-only servers, and they have each more than one "cluster". I will see where things are now.

Yeah. I checked what the first use of External Store was just now:

Seems it was used since the beginning on both (these are the same IDs):

mysql:research@x1-analytics-slave [flowdb]> SELECT HEX(rev_id), rev_content, rev_flags FROM flow_revision WHERE rev_flags LIKE '%external%' ORDER BY rev_id LIMIT 1;
+------------------------+-----------------------+------------------------------+
| HEX(rev_id)            | rev_content           | rev_flags                    |
+------------------------+-----------------------+------------------------------+
| 04934C2C4FE49E53037756 | DB://cluster25/613992 | utf-8,gzip,wikitext,external |
+------------------------+-----------------------+------------------------------+
mysql:research@s2-analytics-slave [officewiki]> SELECT HEX(rev_id), rev_content, rev_flags FROM flow_revision WHERE rev_flags LIKE '%external%' ORDER BY rev_id LIMIT 1;
+------------------------+----------------------+--------------------------+
| HEX(rev_id)            | rev_content          | rev_flags                |
+------------------------+----------------------+--------------------------+
| 049F69201A1547E44879C2 | DB://cluster25/21925 | utf-8,gzip,html,external |
+------------------------+----------------------+--------------------------+
1 row in set (0.09 sec)

That is good news, if I understand it correctly it means there is no revisions in cluster23 or lower.

Indeed:

mysql:wikiadmin@10.64.16.18 [flowdb]> SELECT DISTINCT SUBSTR(rev_content, 1, 15) FROM flow_revision;
+----------------------------+
| SUBSTR(rev_content, 1, 15) |
+----------------------------+
| DB://cluster25/            |
| DB://cluster24/            |
|                            |
+----------------------------+
3 rows in set (0.66 sec)

(I don't know what that empty string is doing there; probably from when we lost some data a few years ago)

I have not forgotten about this, it is on 'Next', blocked on me having proper time (there is not much to do by itself, but being production, and the importance of these database, I need to have proper backups first)- but I hope to be able to work on this soon.

Let's proceed with this, also let's create new clusters to avoid over-sized tables. CC @Anomie because he was working on some maintenance too, so he is on the know.

This is probably repeating what everyone already knows, but just in case I believe the process here would be:

  • Create a table named something like "blobs_flow1" in each database on one of the existing read/write ES sets. The existing sets appear to be:
    • cluster24: es1011 es1013 es1015 es2016 es2014 es2015
    • cluster25: es1014 es1017 es1019 es2017 es2018 (and es2019 is currently commented out with a note pointing to T153440)
  • In operations/mediawiki-config, files wmf-config/db-eqiad.php and wmf-config/db-codfw.php,
    • Add to $wgLBFactoryConf['externalLoads'] an entry for flow_cluster1 mapping to whichever set of servers was used
    • Add to $wgLBFactoryConf['templateOverridesByCluster'] an entry for flow_cluster1 with [ 'blobs table' => 'blobs_flow1' ].
  • Adjust Flow's configuration to start writing flow_cluster1.
  • Run some maintenance script to migrate all the old Flow ES entries to the new cluster.
  • (Maybe) delete all the old Flow ES entries from cluster24 and cluster25. Or maybe running core's recompression would automatically leave those behind, I haven't checked.

BTW, in looking at this I noticed something odd: Flow writes its flow_revision table on one central flowdb database for all wikis, but writes the actual content into per-wiki ExternalStore and doesn't seem to record which wiki was used or try to use the correct wiki on read-back. Is that what you intended to do, or did you not realize that ES is per-wiki?

That last suggestion looks like a blocker to me, at least to check it before doing anything.

Apparently @Mattflaschen-WMF is no more in charge, who is in charge of flow maintenance now, maybe Growth-Team ? To clarify, we DBAs are no longer a blocker for this task, and this seemed important, someone acting on T107610#4112346 is needed.

Not doing this may soon block T106386 and maybe some MCR related work (the part with handling different content types).

@jcrespo yes, Growth-Team is handling StructuredDiscussions.

Not doing this may soon block T106386

Could you please give us an idea for what "soon" is, or a specific deadline for completing this? We have a lot of tasks in our backlog :)

@kostajh I don't have a say on that, was just pointing we are waiting for someone to take a lead, and other teams that may be interested on this- pointing we are already around for when you are ready :-)

This is probably repeating what everyone already knows, but just in case I believe the process here would be:

  • Create a table named something like "blobs_flow1" in each database on one of the existing read/write ES sets. The existing sets appear to be:
    • cluster24: es1011 es1013 es1015 es2016 es2014 es2015
    • cluster25: es1014 es1017 es1019 es2017 es2018 (and es2019 is currently commented out with a note pointing to T153440)
  • In operations/mediawiki-config, files wmf-config/db-eqiad.php and wmf-config/db-codfw.php,
    • Add to $wgLBFactoryConf['externalLoads'] an entry for flow_cluster1 mapping to whichever set of servers was used
    • Add to $wgLBFactoryConf['templateOverridesByCluster'] an entry for flow_cluster1 with [ 'blobs table' => 'blobs_flow1' ].
  • Adjust Flow's configuration to start writing flow_cluster1.
  • Run some maintenance script to migrate all the old Flow ES entries to the new cluster.
  • (Maybe) delete all the old Flow ES entries from cluster24 and cluster25. Or maybe running core's recompression would automatically leave those behind, I haven't checked.

This looks right to me, assuming this is the same as what was done on beta as part of T119568: Run External Store migration for real on Beta.

BTW, in looking at this I noticed something odd: Flow writes its flow_revision table on one central flowdb database for all wikis, but writes the actual content into per-wiki ExternalStore and doesn't seem to record which wiki was used or try to use the correct wiki on read-back. Is that what you intended to do, or did you not realize that ES is per-wiki?

It obviously does use the correct wiki on read-back, otherwise Flow would have completely fallen apart by now. So I investigated why it works.

flow_revision rows are not directly tied to a wiki in theory (although in practice, rev_user_wiki is always the correct wiki), but every revision is indirectly the child of a flow_workflow row (which represents a topic or board), and that table has a workflow_wiki field. In practice, we use the correct wiki for read-back because each wiki will only try to read revisions related to workflows that it owns, and Flow calls ExternalStore::batchFetchFromURLs() to resolve these URLs without passing a 'wiki' option, which means the local wiki is used.

It's very possible that the people who made this schema 5-6 years ago didn't realize ES was per-wiki (and I didn't either), but since cross-wiki features in Flow have never materialized, and every revision is tied to a workflow that specifies a wiki, everything works fine.

I'm ready to act on this any time, if someone can assure me that a backup is in place. @jcrespo , would you have time to chat on IRC some time today / this week / next week (or the week after, if you're too busy with the data center switchover; whenever is a good time for you) to make sure we have all our ducks in a row? Then once we've done that, and the tables have been created (the first step in Brad's list), I can take care of the MediaWiki config stuff and run the script. I'd begin with dry runs, then real runs, on testwiki (3616 revisions) and test2wiki (92451 revisions).

would you have time to chat on IRC some time today / this week / next week (or the week after

Let's schedule it for the week of the 14, although probably either @Marostegui or @Banyek should be involved. I will make sure to check we have fresh backups by then.

I will make sure to check we have fresh backups by then.

@jcrespo wanted to check if there are fresh backups.

@Banyek @Catrope do you all want to work on this in the coming week or two?

Sorry for missing the "week of the 14th" message, we were at our team offsite at the time. I'm ready to work on this any time.

As I plan to get involved this I read back the ticket, and put the following tldr together.
Did I missed something? Please let me know. Also I propose to change the ticket description (or create a sub-ticket) with the tldr part.
My proposal is to talk about this here, then set up a meeting, prepare everything and execute coordinated.
tldr:
What we need is to separate Flow data from the rest of the data on the ES cluster.
For achieving this:

@Banyek That sounds right to me. The migration script already exists and was used successfully in beta. That was in 2016, so hopefully it still works, but at least we won't have to write it first.

What do you think the right order of operations is? The order that seems most natural to me is:

  • Create new tables (DBA)
  • Update MW config so new content gets written to the new tables (Growth)
  • Ensure old tables are backed up (DBA)
  • Migrate old tables to new tables (Growth)

But maybe there's a reason I didn't think of for doing the backup step earlier?

So maybe you (DBA team) can asynchronously create the new tables, then tell us what the new tables are called and which ES cluster you created them on so we (Growth) can line up the config change, and then we can get together and execute the rest of the plan in a coordinated way?

Sorry for the late answer @Catrope,

The bullets you provided seems good to me, I'd say we could take them as a plan, but I also propose that we should postpone this until January (I know. :( ) because @jcrespo is not with us in this month, but he is the owner of this task from the DBA team.
I also think we are just closing to the end of the year, code freeze is almost here, so it's might not the best time for do such a big change.
(We can do the preparations though.)

Reviewing the discussion so far, these are the concrete steps, and some questions about them:

  • Make a backup.
    • This was done in T153440, but that was over a year ago. I imagine we'd want a more recent one? Ping @jcrespo.
  • Create the tables. We'd probably want to set up flow_cluster1 to match cluster24 (es2). That is, create a blobs table on all the servers in that cluster (per db-eqiad.php and db-codfw.php, those are es1011, es1013, es1015, es2014, es2015 and es2016), in all the per-wiki databases. To match the existing naming conventions (blobs tables are called blobs_cluster24), the tables should be called blobs_flow1.
    • Do we want a table in all wikis, or only the Flow-enabled ones? I'd guess all as it's less hassle to manage, and the addWiki.php logic for creating the blobs table(s) does not care about the wiki being Flow-enabled either.
    • There are two clusters in the normal ES, with writes (if I understand correctly) randomly going into either. I assume that's for handling load (of which there's not much in Flow) so we are fine with just one?
  • Configure the new cluster.
  • Switch Flow config to the new cluster.
  • Dry-run the migration script, [[https://gerrit.wikimedia.org/g/mediawiki/extensions/Flow/+/73f8c5937def28946973c85711718ab5f0b71370/maintenance/FlowExternalStoreMoveCluster.php|FlowExternalStoreMoveCluster.php]]. This will actually write all the Flow content to the new cluster, but won't update the storage locations in the Flow revision table.
  • Run the migration script.
  • Decide if we need/want to delete the content from the old clusters.

We will have to do most of this anyway for T226704.

There are two clusters in the normal ES, with writes (if I understand correctly) randomly going into either. I assume that's for handling load (of which there's not much in Flow) so we are fine with just one?

No, that is not for performance (although it would help if we had a huge amount of inserts), it is done for availability reasons. If one cluster goes down, reads will be impossible, but "inserts" should continue unaffected for the other. This is not a theoretical example, 2 set of servers are needed so we can set one in read only for maintenance easily (e.g. to renew hardware), so it should be on both.

We will have to do most of this anyway for T226704.

Does that mean we should wait for that to happen and then use the es4 servers instead of the es2 ones?

No, that is not for performance (although it would help if we had a huge amount of inserts), it is done for availability reasons. If one cluster goes down, reads will be impossible, but "inserts" should continue unaffected for the other. This is not a theoretical example, 2 set of servers are needed so we can set one in read only for maintenance easily (e.g. to renew hardware), so it should be on both.

Thanks, so I guess we need a separate flow_cluster1 (using the same servers as es2 (or es4?)) and flow_cluster2 (using the same servers as es3 (or es5?)).

  • Do we want a table in all wikis, or only the Flow-enabled ones? I'd guess all as it's less hassle to manage, and the addWiki.php logic for creating the blobs table(s) does not care about the wiki being Flow-enabled either.

Or do we want one table that's shared by all wikis? That would make Flow's existing cross-wiki architecture not be broken if something ever tries to actually use it.

Or do we want one table that's shared by all wikis? That would make Flow's existing cross-wiki architecture not be broken if something ever tries to actually use it.

It doesn't look like ExternalStoreDB currently supports overriding the dbname, it always uses $wgDBname (or an LBFactory thing that corresponds to it).

The new hosts for es4 and es5 have been ordered and will be most likely set up (installed and racked) in January, so we might want to enable this along with them as Jaime pointed out on T107610#5434011 which might easier than doing so for the existing production ones.

It doesn't look like ExternalStoreDB currently supports overriding the dbname, it always uses $wgDBname (or an LBFactory thing that corresponds to it).

Can you not set $wgLBFactoryConf['templateOverridesByCluster']['flow_cluster1']['dbname'] to do it?

It seems to work in my local dev environment, I have my blob tables in a MySQL database named 'es', even though the wikis have dbname 'mywiki', 'mywiki2', and for that matter 'pgwiki' and so on that aren't even in MySQL.

The new hosts for es4 and es5 have been ordered and will be most likely set up (installed and racked) in January, so we might want to enable this along with them as Jaime pointed out on T107610#5434011 which might easier than doing so for the existing production ones.

@Tgr thoughts?

@Marostegui does this task block setting up the new servers? Or does it become easier if done during that setup?
It's still not clear to me who needs this and by when (cf T106363#5495185).
(Sorry for missing your ping in December.)

@Marostegui does this task block setting up the new servers? Or does it become easier if done during that setup?
It's still not clear to me who needs this and by when (cf T106363#5495185).
(Sorry for missing your ping in December.)

It does not block setting up the new hosts. It is simply a lot easier to enable this along with the new servers.
The new servers (es4/es5) will start fully empty, so it will be easier to create the needed flow tables, and enable the configuration at the same time as we enable es4/es5 as new writable clusters.
It is also not clear to me who needs it, I thought it was you who was sort of leading this :-)

We still don't have eqiad servers physically ready yet, hopefully in a 2-3 weeks we'll have them, so we have time to sort the flowdb stuff out meanwhile I think.

Thanks!

It is also not clear to me who needs it, I thought it was you who was sort of leading this :-)

Implementing it, yes. But if no one actually needs it, I'd rather find something more important to do.

It is also not clear to me who needs it, I thought it was you who was sort of leading this :-)

Implementing it, yes. But if no one actually needs it, I'd rather find something more important to do.

I cannot answer that question unfortunately, do you happen to know who could help us here to see if it is worth implementing it to deploy it along with es4 and es5?

This blocks T106363: Migrate Flow content to new separate logical External Store in production which blocks T106386: Compress data at external storage. If we want to do T106386 then this needs to be done.

I don't know what plans might exist to pick up T106386 once it gets unblocked (that'd be a question for PMs), but if it does get picked up we (Platform Engineering) will probably be involved in working on it.

@jcrespo do you still want us to do the compression in T106386? Are the storage constraints still relevant?

@jcrespo do you still want us to do the compression in T106386? Are the storage constraints still relevant?

I don't think this is necessary anymore.
Right now the old ES hosts are read only, so those won't grow.
es2/es3 will go read only as soon as we deploy the new hardware and es4/es5 will start fully empty.

MMiller_WMF added a subscriber: MMiller_WMF.

Given that it doesn't sound like this is needed. I'm declining the task. Please let us know if this changes!