Migrate Parsoid (incl. stashing) from legacy storage (Cassandra 2.x), to the new strategy and cluster (Cassandra 3.x). This is planned to happen in two steps, first all non-Wikipedia groups, then the Wikipedias. Between these two steps, cluster reshaping will occur, with capacity decommissioned from the legacy cluster, and bootstrapped into the new.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • mobrovac | T179416 Program 7 Outcome 2 Objective 1, Q2: Develop a scalable and cost-effective storage solution for backing the REST API | |||
Resolved | • mobrovac | T179417 Migrate Parsoid from legacy to new storage | |||
Resolved | Eevans | T179422 Reshape RESTBase Cassandra clusters | |||
Resolved | Eevans | T180568 Aberrant load on instances involved in recent bootstrap | |||
Declined | fgiunchedi | T180562 Degraded RAID on restbase2004 | |||
Resolved | • Pchelolo | T182770 meta property="dc:modified" may be absent |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2017-11-02T11:21:05Z] <mobrovac@tin> Started deploy [restbase/deploy@f6c4e2d]: Parsoid module: use the Cassandra 2 tables as fallback when needed - T179417
Mentioned in SAL (#wikimedia-operations) [2017-11-02T11:29:21Z] <mobrovac@tin> Finished deploy [restbase/deploy@f6c4e2d]: Parsoid module: use the Cassandra 2 tables as fallback when needed - T179417 (duration: 08m 15s)
Change 388036 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Config: Parsoid: Switch all but WPs to the next-gen storage
Change 388036 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Parsoid: Switch all but WPs to the next-gen storage
Mentioned in SAL (#wikimedia-operations) [2017-11-02T11:57:57Z] <mobrovac@tin> Started deploy [restbase/deploy@9314cf6]: Parsoid: Switch all but WPs to use the next-generation storage - T179417
Mentioned in SAL (#wikimedia-operations) [2017-11-02T12:06:39Z] <mobrovac@tin> Finished deploy [restbase/deploy@9314cf6]: Parsoid: Switch all but WPs to use the next-generation storage - T179417 (duration: 08m 43s)
All but WPs (and the global domain, technically) have been switched to use the new storage schema with Cassandra 3. We need to keep the old contents around for the next 24h, though, before we can get rid of the data in Cassandra 2. data tables in the following keyspaces can be truncated (not dropped!) after that period elapses:
"local_group_default_T_parsoid_dataLpBGD5XFAMFsTr" "local_group_default_T_parsoid_htmliZ1mueNVmW9MlJ" "local_group_default_T_parsoid_section_okYE_jMlE6" "local_group_default_T_parsoid_stash_datU81yvllO3" "local_group_default_T_parsoid_stash_datWH8IDUS9S" "local_group_default_T_parsoid_stash_html" "local_group_default_T_parsoid_stash_htmmXxc_uDhg" "local_group_default_T_parsoid_stash_sec2ACMDK1DR" "local_group_default_T_parsoid_stash_wikf0PBY8UXq" "local_group_default_T_parsoid_stash_wikitext" "local_group_default_T_parsoid_wikitext" "local_group_phase0_T_parsoid_dataLpBGD5XFAMFsTr8" "local_group_phase0_T_parsoid_htmliZ1mueNVmW9MlJq" "local_group_phase0_T_parsoid_section_ofkYE_jMlE6" "local_group_phase0_T_parsoid_stash_dataU81yvllO3" "local_group_phase0_T_parsoid_stash_dataWH8IDUS9S" "local_group_phase0_T_parsoid_stash_html" "local_group_phase0_T_parsoid_stash_htmlmXxc_uDhg" "local_group_phase0_T_parsoid_stash_sect2ACMDK1DR" "local_group_phase0_T_parsoid_stash_wikif0PBY8UXq" "local_group_phase0_T_parsoid_stash_wikitext" "local_group_phase0_T_parsoid_wikitext" "local_group_wikibooks_T_parsoid_dataLpBGD5XFAMFs" "local_group_wikibooks_T_parsoid_htmliZ1mueNVmW9M" "local_group_wikibooks_T_parsoid_section_kYE_jMlE" "local_group_wikibooks_T_parsoid_stash_daU81yvllO" "local_group_wikibooks_T_parsoid_stash_daWH8IDUS9" "local_group_wikibooks_T_parsoid_stash_html" "local_group_wikibooks_T_parsoid_stash_htmXxc_uDh" "local_group_wikibooks_T_parsoid_stash_se2ACMDK1D" "local_group_wikibooks_T_parsoid_stash_wif0PBY8UX" "local_group_wikibooks_T_parsoid_stash_wikitext" "local_group_wikibooks_T_parsoid_wikitext" "local_group_wikimedia_T_parsoid_dataLpBGD5XFAMFs" "local_group_wikimedia_T_parsoid_htmliZ1mueNVmW9M" "local_group_wikimedia_T_parsoid_section_kYE_jMlE" "local_group_wikimedia_T_parsoid_stash_daU81yvllO" "local_group_wikimedia_T_parsoid_stash_daWH8IDUS9" "local_group_wikimedia_T_parsoid_stash_html" "local_group_wikimedia_T_parsoid_stash_htmXxc_uDh" "local_group_wikimedia_T_parsoid_stash_se2ACMDK1D" "local_group_wikimedia_T_parsoid_stash_wif0PBY8UX" "local_group_wikimedia_T_parsoid_stash_wikitext" "local_group_wikimedia_T_parsoid_wikitext" "local_group_wikinews_T_parsoid_dataLpBGD5XFAMFsT" "local_group_wikinews_T_parsoid_htmliZ1mueNVmW9Ml" "local_group_wikinews_T_parsoid_section_kYE_jMlE6" "local_group_wikinews_T_parsoid_stash_daU81yvllO3" "local_group_wikinews_T_parsoid_stash_daWH8IDUS9S" "local_group_wikinews_T_parsoid_stash_html" "local_group_wikinews_T_parsoid_stash_htmXxc_uDhg" "local_group_wikinews_T_parsoid_stash_se2ACMDK1DR" "local_group_wikinews_T_parsoid_stash_wif0PBY8UXq" "local_group_wikinews_T_parsoid_stash_wikitext" "local_group_wikinews_T_parsoid_wikitext" "local_group_wikiquote_T_parsoid_dataLpBGD5XFAMFs" "local_group_wikiquote_T_parsoid_htmliZ1mueNVmW9M" "local_group_wikiquote_T_parsoid_section_kYE_jMlE" "local_group_wikiquote_T_parsoid_stash_daU81yvllO" "local_group_wikiquote_T_parsoid_stash_daWH8IDUS9" "local_group_wikiquote_T_parsoid_stash_html" "local_group_wikiquote_T_parsoid_stash_htmXxc_uDh" "local_group_wikiquote_T_parsoid_stash_se2ACMDK1D" "local_group_wikiquote_T_parsoid_stash_wif0PBY8UX" "local_group_wikiquote_T_parsoid_stash_wikitext" "local_group_wikiquote_T_parsoid_wikitext" "local_group_wikisource_T_parsoid_dataLpBGD5XFAMF" "local_group_wikisource_T_parsoid_htmliZ1mueNVmW9" "local_group_wikisource_T_parsoid_sectionkYE_jMlE" "local_group_wikisource_T_parsoid_stash_dU81yvllO" "local_group_wikisource_T_parsoid_stash_dWH8IDUS9" "local_group_wikisource_T_parsoid_stash_hmXxc_uDh" "local_group_wikisource_T_parsoid_stash_html" "local_group_wikisource_T_parsoid_stash_s2ACMDK1D" "local_group_wikisource_T_parsoid_stash_wf0PBY8UX" "local_group_wikisource_T_parsoid_stash_wikitext" "local_group_wikisource_T_parsoid_wikitext" "local_group_wikiversity_T_parsoid_dataLpBGD5XFAM" "local_group_wikiversity_T_parsoid_htmliZ1mueNVmW" "local_group_wikiversity_T_parsoid_sectiokYE_jMlE" "local_group_wikiversity_T_parsoid_stash_2ACMDK1D" "local_group_wikiversity_T_parsoid_stash_f0PBY8UX" "local_group_wikiversity_T_parsoid_stash_html" "local_group_wikiversity_T_parsoid_stash_mXxc_uDh" "local_group_wikiversity_T_parsoid_stash_U81yvllO" "local_group_wikiversity_T_parsoid_stash_WH8IDUS9" "local_group_wikiversity_T_parsoid_stash_wikitext" "local_group_wikiversity_T_parsoid_wikitext" "local_group_wikivoyage_T_parsoid_dataLpBGD5XFAMF" "local_group_wikivoyage_T_parsoid_htmliZ1mueNVmW9" "local_group_wikivoyage_T_parsoid_sectionkYE_jMlE" "local_group_wikivoyage_T_parsoid_stash_dU81yvllO" "local_group_wikivoyage_T_parsoid_stash_dWH8IDUS9" "local_group_wikivoyage_T_parsoid_stash_hmXxc_uDh" "local_group_wikivoyage_T_parsoid_stash_html" "local_group_wikivoyage_T_parsoid_stash_s2ACMDK1D" "local_group_wikivoyage_T_parsoid_stash_wf0PBY8UX" "local_group_wikivoyage_T_parsoid_stash_wikitext" "local_group_wikivoyage_T_parsoid_wikitext" "local_group_wiktionary_T_parsoid_dataLpBGD5XFAMF" "local_group_wiktionary_T_parsoid_htmliZ1mueNVmW9" "local_group_wiktionary_T_parsoid_sectionkYE_jMlE" "local_group_wiktionary_T_parsoid_stash_dU81yvllO" "local_group_wiktionary_T_parsoid_stash_dWH8IDUS9" "local_group_wiktionary_T_parsoid_stash_hmXxc_uDh" "local_group_wiktionary_T_parsoid_stash_html" "local_group_wiktionary_T_parsoid_stash_s2ACMDK1D" "local_group_wiktionary_T_parsoid_stash_wf0PBY8UX" "local_group_wiktionary_T_parsoid_stash_wikitext" "local_group_wiktionary_T_parsoid_wikitext"
I've sampled a couple of hosts (one in eqiad and one in codfw), and see no write activity on the data tables of these keyspaces; I propose the following to truncate them:
Once complete, snapshots will need to cleared to see the space actually freed.
Mentioned in SAL (#wikimedia-operations) [2017-11-07T11:43:26Z] <mobrovac> restbase truncating cassandra 2 non-WP tables for T179417
Change 389699 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Config: Add wikidata.org to the next-gen Parsoid storage
Change 389699 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Add wikidata.org to the next-gen Parsoid storage
Mentioned in SAL (#wikimedia-operations) [2017-11-07T11:52:02Z] <mobrovac@tin> Started deploy [restbase/deploy@eab2948]: Use the new storage for wikidata.org - T179417
Mentioned in SAL (#wikimedia-operations) [2017-11-07T12:00:16Z] <mobrovac@tin> Finished deploy [restbase/deploy@eab2948]: Use the new storage for wikidata.org - T179417 (duration: 08m 14s)
All but the default group tables have been truncated. default still had activity due to the wikidata.org domain which had not been switched. We switched that one too, so we should be able to truncate that group tomorrow as well.
Mentioned in SAL (#wikimedia-operations) [2017-11-08T17:43:53Z] <mobrovac> restbase truncate the default parsoid storage group's tables for T179417
Mentioned in SAL (#wikimedia-operations) [2017-11-08T17:51:48Z] <urandom> Clearing snapshots in RESTBase legacy Cassandra cluster (T179417)
We are now ready to switch the WP Parsoid tables to Cassandra 3. I propose to do a coordinated deploy of Parsoid (bumping the HTML content version to 1.6.0) and RESTBase (moving to Cassandra 3) on Tuesday, 2017-12-12 during the regular services deploy window. Specifically, we need to:
- prepare the bump of the Parsoid and MCS versions in RESTBase
- prepare the storage switch to Cassandra 3 in RESTBase
- deploy Parsoid
- deploy MCS
- deploy RESTBase
- start HTML dumps for at least the biggest WPs to minimise client impact of the deployment
Furthermore, for RESTBase we need to ensure the transition period of 24h during which contents is read from both new and old storage to avoid edit failures.
Just in case others are confused like I was, RB is assuming a very generous 24-hour visual editor session, i.e. a VE edit session is started before deploy and is active upto 24 hours after deploy. They want to ensure that a save at any point in that 24+ hour period does the right thing by fetching data-parsoid from cassandra-2 for the HTML that was fetched from cassandra-2 before deploy.
Change 396473 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/restbase/deploy@master] [Config] Enable new storage for Parsoid for all domains
Change 397633 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Bump mobile-sections and definitions version
Change 397633 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Bump mobile-sections and definitions version
Change 396473 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] [Config] Enable new storage for Parsoid for all domains
Mentioned in SAL (#wikimedia-operations) [2017-12-12T21:23:51Z] <mobrovac@tin> Started deploy [restbase/deploy@dceab2e]: Switch to Parsoid content v1.6.0 and switch to Cassandra 3 storage - T179417
Mentioned in SAL (#wikimedia-operations) [2017-12-12T21:28:07Z] <mobrovac@tin> Finished deploy [restbase/deploy@dceab2e]: Switch to Parsoid content v1.6.0 and switch to Cassandra 3 storage - T179417 (duration: 04m 16s)
The storage back-end has been switched. I started dumps of en, de, fr, it and es WPs in a screen session on praseodymium targeting restbase.svc.codfw.wmnet. Once they are done, we need to merge and deploy PR #920 at which point this task can be considered done.
I had to stop them because they were hammering Cassandra 3 too much. We'll start them slowly and consecutively at a later point, probably tomorrow.
Change 398101 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/restbase/deploy@master] [Config] Remove new_storage_enabled_parsoid config stanza
Change 398101 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] [Config] Remove new_storage_enabled_parsoid config stanza
Mentioned in SAL (#wikimedia-operations) [2017-12-13T19:30:57Z] <ppchelko@tin> Started deploy [restbase/deploy@3f4bedc]: Remove references to Cassandra 2 from Parsoid storage T179417
Mentioned in SAL (#wikimedia-operations) [2017-12-13T19:35:40Z] <ppchelko@tin> Finished deploy [restbase/deploy@3f4bedc]: Remove references to Cassandra 2 from Parsoid storage T179417 (duration: 04m 43s)
The dumps for en, de, fr, es, it and he have been completed. Calling this one done \o/