Migrate Flow content to new separate logical External Store in production
Closed, DeclinedPublic
Actions

Description

ExternalStore is nearly full and ops will either buy more storage and/or compress the data.
According to Tim, the script to compress ES data omits entries missing from text table.

~~I believe we are not currently storing references in text. Let's make sure we do that soon enough.~~ <- outdated, might go with another approach

We plan to solve this by setting up a new External Store (one that will only be used by Flow) then migrating Flow to use that (details at T107610: Setup separate logical External Store for Flow in production). That will then free up the non-Flow one to use the normal compression procedure.

Use:

foreachwikiindblist flow.dblist extensions/Flow/maintenance/FlowExternalStoreMoveCluster.php

Details

	Subject	Repo	Branch	Lines +/-
	Move Flow ExternalStore entries to separate cluster	mediawiki/extensions/Flow	master	+240 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T106386 Compress data at external storage
Stalled	None	T106388 Audit all existing code to ensure that any extension currently or previously adding blobs to ExternalStore has been registering a reference in the text table (and fix up if wrong)
Declined	None	T106363 Migrate Flow content to new separate logical External Store in production
Resolved	• Mattflaschen-WMF	T119568 Run External Store migration for real on Beta
Resolved	• Mattflaschen-WMF	T119567 Run Flow External Store migration in dry-run mode on Beta
Resolved	• Mattflaschen-WMF	T119566 Add dry-run mode to Flow External Store migration script
Resolved	• Mattflaschen-WMF	T128417 Set up Flow-specific External Store cluster on Beta (secondary to the main one)
Resolved	• Mattflaschen-WMF	T95871 Use External Store on Beta Cluster
Resolved	• Mattflaschen-WMF	T136887 External Store dry run wrongly detects failed insert if $wgCompressRevisions is true
Resolved	matthiasmullie	T133074 Script to allow migrating Flow content between External Store clusters
Declined	None	T138049 Dry run of Flow External Store migration in production
Declined	Tgr	T107610 Setup separate logical External Store for Flow in production
Resolved	jcrespo	T153440 Create a full backup of all external storage records that would be easy to restore/setup a temporary delayed slave
Resolved	Daimona	T34478 AbuseFilter not setting utf-8 flag
Resolved	Urbanecm	T246539 Dry-run, then actually run updateVarDumps
Declined	Daimona	T246938 How to update/delete ExternalStore entries?
Resolved	Daimona	T252696 Find a good way to run the updateVarDumps script on large wikis

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

matthiasmullie moved this task from In Development to Needs Review on the Collaboration-Team-Archive-2015-2016 board.Jul 28 2015, 12:28 PM

• Mattflaschen-WMF added a parent task: T95870: E10. Have Flow use ExternalStore on MediaWiki-Vagrant.Jul 29 2015, 5:51 AM

• Mattflaschen-WMF added a parent task: T95871: Use External Store on Beta Cluster.

• Mattflaschen-WMF lowered the priority of this task from Unbreak Now! to High.Aug 5 2015, 11:12 PM

• Mattflaschen-WMF removed a project: Collaboration-Team-Archive-2015-2016.Aug 6 2015, 5:54 PM

• Mattflaschen-WMF renamed this task from Don't block trackBlobs.php and recompressTracked.php to Migrate Flow content to new separate logical External Store.Aug 6 2015, 11:42 PM

• Mattflaschen-WMF updated the task description. (Show Details)

• Mattflaschen-WMF added a subscriber: tstarling.

• DannyH added a project: Collaboration-Team-Archive-2015-2016.Aug 11 2015, 7:36 PM

matthiasmullie removed a project: Collaboration-Team-Archive-2015-2016.Aug 19 2015, 12:06 PM

• DannyH moved this task from Current workboard to Team discussion on the Collaboration-Team-Triage board.Aug 20 2015, 11:42 PM

• DannyH moved this task from Team discussion to Near-Term Interest on the Collaboration-Team-Triage board.Aug 21 2015, 9:31 PM

• DannyH added a project: OKR-Work.Aug 27 2015, 7:06 PM

• DannyH edited projects, added Essential-Work; removed OKR-Work.

• DannyH edited projects, added Collaboration-Team-Archive-2015-2016; removed Collaboration-Team-Triage.Sep 9 2015, 4:09 PM

• DannyH edited projects, added Collaboration-Team-Triage; removed Collaboration-Team-Archive-2015-2016.Sep 9 2015, 4:16 PM

T105843#1654271 indicates the new hardware phase is almost done, so adding this back to Current.

• Mattflaschen-WMF added a project: Blocked-on-Operations.Nov 4 2015, 6:38 PM

The script will only be run in production after careful testing elsewhere.

The initial steps will be T119566: Add dry-run mode to Flow External Store migration script and T119567: Run Flow External Store migration in dry-run mode on Beta.

• Mattflaschen-WMF removed a project: Blocked-on-Operations.Nov 24 2015, 10:18 PM

Change 226544 merged by jenkins-bot:
Move Flow ExternalStore entries to separate cluster

https://gerrit.wikimedia.org/r/226544

ReleaseTaggerBot added a project: MW-1.27-release (WMF-deploy-2015-12-08_(1.27.0-wmf.8)).Nov 24 2015, 11:00 PM

Catrope closed this task as Resolved.Dec 10 2015, 4:33 AM

Catrope subscribed.

It hasn't actually been migrated yet.

• Mattflaschen-WMF mentioned this in T94574: Switch Flow from ExternalStore to RESTBase.Feb 3 2016, 7:41 PM

Catrope moved this task from Needs Review to Blocked on the Collaboration-Team-Archive-2015-2016 board.Feb 19 2016, 10:53 PM

Thanks to @Volans, new codfw external storage servers were setup. Would the old servers be helpful in any way for this task? (they would have similar specs and contents to real servers- but they are not in production right now. Otherwise, they may be erased, etc: T129452

Yes, they could potentially be used for the dedicated Flow External Store.

@matthiasmullie No, those can be around for testing for a bit longer, but they are old and out of warranty (which means they cannot be used for production, only for testing). I was asking if they could be used as test hosts for the script (before being fully decommissioned).

You really want to use the new servers for the final service (10x times faster) and we have 20 TB free on those (and 0 on these).

In T106363#2107702, @jcrespo wrote:

@matthiasmullie No, those can be around for testing for a bit longer, but they are old and out of warranty (which means they cannot be used for production, only for testing). I was asking if they could be used as test hosts for the script (before being fully decommissioned).

You really want to use the new servers for the final service (10x times faster) and we have 20 TB free on those (and 0 on these).

Okay. I forget, are we doing a test run in production, or just Beta? If we're doing one on prod, we could use it.

(BTW, you @-ed Matthias, when I think you meant to reply to me; we both worked on it though).

Sorry about that, I just preset tab.

are we doing a test run in production

Well, considering we now have a reasonable way to test it closer to production, I was just providing more options in case they were needed (if only to see how much it would take on non-trivial datasets like beta). But it was just a suggestion/offering (which we didn't have before).

• Mattflaschen-WMF removed a parent task: T95870: E10. Have Flow use ExternalStore on MediaWiki-Vagrant.Mar 15 2016, 9:57 PM

• Mattflaschen-WMF added a project: Collab-Team-2016-Apr-Jun-Q4.Apr 5 2016, 12:24 AM

• Mattflaschen-WMF removed a parent task: T95871: Use External Store on Beta Cluster.Apr 11 2016, 8:49 PM

• Mattflaschen-WMF added a subtask: T119568: Run External Store migration for real on Beta.

• jmatazzoni removed a project: Collaboration-Team-Archive-2015-2016.Apr 12 2016, 6:50 PM

• Mattflaschen-WMF renamed this task from Migrate Flow content to new separate logical External Store to Migrate Flow content to new separate logical External Store in production.Apr 12 2016, 9:12 PM

• Mattflaschen-WMF added a subtask: T133074: Script to allow migrating Flow content between External Store clusters.Apr 19 2016, 5:40 PM

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 5:40 PM

• Mattflaschen-WMF claimed this task.Apr 19 2016, 5:41 PM

• jmatazzoni moved this task from Untriaged to In Development on the Collab-Team-2016-Apr-Jun-Q4 board.Apr 29 2016, 12:45 AM

• Mattflaschen-WMF updated the task description. (Show Details)Jun 16 2016, 8:41 PM

• Mattflaschen-WMF created subtask T138049: Dry run of Flow External Store migration in production.Jun 16 2016, 10:58 PM

• jmatazzoni closed subtask T119568: Run External Store migration for real on Beta as Resolved.Jun 22 2016, 10:48 AM

• Mattflaschen-WMF edited projects, added Collab-Team-Q1-July-Sep-2016; removed Collab-Team-2016-Apr-Jun-Q4.Aug 2 2016, 12:51 AM

• Mattflaschen-WMF moved this task from Untriaged to In Development on the Collab-Team-Q1-July-Sep-2016 board.

• jmatazzoni removed a project: Collab-Team-Q1-July-Sep-2016.Oct 4 2016, 10:54 PM

Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptOct 4 2016, 10:54 PM

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-Q2-Oct-Dec-2016); removed Collaboration-Team-Triage.Oct 4 2016, 10:54 PM

• jmatazzoni moved this task from Untriaged to In Development on the Collaboration-Team-Triage (Collab-Team-Q2-Oct-Dec-2016) board.

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017); removed Collaboration-Team-Triage (Collab-Team-Q2-Oct-Dec-2016).Jan 5 2017, 10:16 PM

• jmatazzoni moved this task from Untriaged to QA Review on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.Jan 5 2017, 10:31 PM

• jmatazzoni moved this task from QA Review to Code Review Started on the Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017) board.Jan 6 2017, 12:57 AM

This ticket still needs to happen. However, I am thinking if we should refactor the external storage servers in a different way other than regular compression (parent ticket). Making the external storage transparent for the application (a pure key-value store) and use a similar strategy, but implemented by opensource mainstream software such as RocksDB or other. I will ask around.

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017); removed Collaboration-Team-Triage (Collab-Team-Q3-Jan-Mar-2017).Apr 18 2017, 1:12 AM

• jmatazzoni moved this task from Untriaged to Code Review Started on the Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017) board.

• jmatazzoni moved this task from Code Review Started to Untriaged on the Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017) board.

• jmatazzoni moved this task from Untriaged to Code Review Started on the Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017) board.

• jmatazzoni edited projects, added Collaboration-Team-Triage (Collab-Team-This-Quarter); removed Collaboration-Team-Triage (Collab-Team-Q4-Apr-Jun-2017).Jul 14 2017, 12:27 AM

• jmatazzoni moved this task from Untriaged to Code Review Started on the Collaboration-Team-Triage (Collab-Team-This-Quarter) board.

jcrespo mentioned this in T183419: Determine how to update old compressed ExternalStore entries for T181555.Dec 21 2017, 9:17 AM

Removing @Mattflaschen-WMF as task assignee to avoid cookie-licking.
(Matt, if you still like/plan to work on this, feel very welcome to re-claim via your personal Phab account - thanks!)

jcrespo mentioned this in T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production).Oct 2 2018, 9:56 AM

Pinging TechCom for a quick check-in on this.

Restricted Application added a project: Growth-Team. · View Herald TranscriptAug 27 2019, 6:16 PM

@daniel & TechCom most of the setup work is happening on T107610, fyi.

I discussed this a little bit with @kostajh and @Tgr. Summarizing my comments here.

Back in 2015, SRE/CPT wanted to recompress the ExternalStore data to make more space. The script that performs the recompression assumes that all ES URLs pointing to blobs are in the text table. It's able to find "orhpaned" blobs that aren't pointed to from the text table and preserve them, but the recompression process changes the URLs of each blob. The URLs are updated in the text table, but for orphaned blobs there is no text table row to update. MW core's revision storage uses the text table (although T183490 proposes to change that), as does AbuseFilter. The only thing that stores content in ES but doesn't use the text table is Flow, which instead puts ES URLs directly in the flow_revision table (kind of like what T183490 proposes, except without a separate content table). That means that running the recompression script as-is would cause us to lose all the Flow data in ES (i.e. the content of all Flow posts).

The initial proposal was to have Flow add rows to the text table when it inserts things into ES, and backfill the existing ES pointers into the text table. This was rejected because it would mean moving the source of truth for these pointers, and using a per-wiki table while everything else in Flow uses global tables (see T106386#1487961).

It was then proposed to move all Flow entries to a separate ES cluster, so that the original ES cluster only contains text-table-tracked blobs and can be safely recompressed. This is what's currently planned. It's already been done in beta labs, but hasn't been done in production yet, mostly because this doesn't seem to be a priority for anyone. (The Growth team hasn't proactively worked on it for a while, and SRE/CPT haven't asked us to.)

An alternative approach would be to add a hook to the recompression script notifying Flow of changes in orphaned blob URLs and allowing it to update them itself, but that could be more work than performing the separate store migration that we already have code for.

daniel moved this task from Inbox to Watching on the TechCom board.Aug 28 2019, 8:27 PM

In T106363#5445137, @Catrope wrote:

An alternative approach would be to add a hook to the recompression script notifying Flow of changes in orphaned blob URLs and allowing it to update them itself, but that could be more work than performing the separate store migration that we already have code for.

Also it would mean having a different setup on beta and production, unless we undo the beta migration somehow.
And hooks seem like a fragile mechanism for something that would cause content loss on failure.

In T106363#5445137, @Catrope wrote:

It's already been done in beta labs, but hasn't been done in production yet, mostly because this doesn't seem to be a priority for anyone. (The Growth team hasn't proactively worked on it for a while, and SRE/CPT haven't asked us to.)

@jcrespo / @daniel, do you have any feedback on the priority of this task?

@jcrespo Priority or availability to work on it (they are not the same)? CC @Marostegui

I mean, this task only exists because T106386: Compress data at external storage exists. Is that something intended to happen soon? Or is it something that's a good idea in theory but no one really cares about it ATM? How urgent is it to fix Flow being a blocker?

Untagging TechCom, since this has been decoupled from the text table and content table.

daniel mentioned this in T213478: purgeRedundantText: Potential data loss.Sep 23 2019, 6:03 PM

Aklapper removed a project: Collaboration-Team-Triage (Collab-Team-This-Quarter).Oct 12 2019, 4:01 AM

Tgr mentioned this in T107610: Setup separate logical External Store for Flow in production.Feb 12 2020, 4:41 AM

MMiller_WMF closed subtask T107610: Setup separate logical External Store for Flow in production as Declined.Feb 12 2020, 9:11 PM

Per T107610#5878347. Feel free to reopen if there's a clear need and timeframe for this.

Tgr closed subtask T138049: Dry run of Flow External Store migration in production as Declined.Feb 12 2020, 11:58 PM

TerraCodes unsubscribed.Feb 13 2020, 1:05 AM

	matthiasmullie
	Jul 21 2015, 3:30 AM

Migrate Flow content to new separate logical External Store in productionClosed, DeclinedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Migrate Flow content to new separate logical External Store in production
Closed, DeclinedPublic
Actions

Related Objects
Search...