Redesign and rebuild the wikireplicas service using a multi-instance architecture
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Bstorm
	Aug 13 2020, 7:55 PM

Description

In the interest of trying to keep or make the wikireplicas a performant and sustainable service, new backend architectures are needed. This task is to get to work designing and working on frontend and WMCS-supported scripting architectures to allow more backend flexibility with minimal loss of ease-of-use for end users of cloud services.

Since this task encompasses general access and orchestration, special access needs such as PAWS and Quarry, will need their own tasks, and there is probably room for a lot of other subtasks as well.

wikitech:News/Wiki_Replicas_2020_Redesign

Details

Subject	Repo	Branch	Lines +/-
Remove legacy wiki replicas	operations/cookbooks	master	+0 -2
wikireplicas-dns: condense repeated nodes for better failover	operations/puppet	production	+4 -36
wikireplicas: remove the old wikireplicas profile from the proxy	operations/puppet	production	+13 -40
wikireplicas: disable notifications on the old replica cluster	operations/puppet	production	+5 -0
wikireplicas: cut over the last IPs to the new cluster	operations/puppet	production	+16 -16
wikireplica-dns: Fix up the outlier dbs	operations/puppet	production	+20 -8
wikireplica-dns: Add the outlier CNAMES and correct fqdn	operations/puppet	production	+6 -8
wikireplica-dns: fix typo	operations/puppet	production	+2 -2
wikireplicas: redirect all database CNAMEs to the new system	operations/puppet	production	+920 -920
wikireplicas: switch sql command to multiinstance replicas	labs/toollabs	master	+1 -1
wikireplicas: fix the centralauth management bit of the view scripts	operations/puppet	production	+5 -1
wikireplicas: Work toward a proxy setup on multi-instance replicas	operations/puppet	production	+61 -0
multiinstance proxies: workaround puppetdb not being in cloud	labs/private	master	+30 -0
wikireplicas: fix the harvest-replicas functionality	operations/puppet	production	+5 -2
wikireplicas: extend maintain_dbusers to multiinstance	operations/puppet	production	+34 -14
wikireplicas: extend maintain_dbusers to multiinstance--test 2	operations/puppet	production	+1 -1
wikireplicas: extend maintain_dbusers to multiinstance replicas	operations/puppet	production	+51 -18
wikireplicas: extend maintain_dbusers to multiinstance replicas	operations/puppet	production	+46 -18
wikireplicas: modify views scripts to work on any replica style	operations/puppet	production	+254 -119
cumin: for new wmcs. prefix for cookbooks, grant access to wmcs-admins	operations/puppet	production	+44 -1
wikireplicas: fix typo in the dns script for wikireplicas	operations/cookbooks	master	+1 -1
wikireplicas: add wikireplica cookbook to add a wiki	operations/cookbooks	master	+64 -0
wikireplicas: create cumin aliases for wikireplica servers	operations/puppet	production	+4 -0

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Marostegui	T233766 labsdb1011 mariadb crashed
			Restricted Task
			Restricted Task
Open		None	T204950 Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users
Open		None	T215858 Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema
Resolved		fnegri	T280152 Mitigate breaking changes from the new Wiki Replicas architecture
			Unknown Object (Task)
Resolved		RobH	T260441 (Need By: ASAP) rack/setup/install clouddb10[13-20]
Resolved		• Bstorm	T260389 Redesign and rebuild the wikireplicas service using a multi-instance architecture
Resolved		• Bstorm	T260843 Set up roles for new wiki replicas layout
Resolved		Marostegui	T265135 wikireplicas: Define MW sections per host
Resolved		Marostegui	T267090 Productionize clouddb10[13-20]
Resolved		Marostegui	T268312 Deploy labsdbuser and views to new clouddb hosts
Resolved		• Bstorm	T269200 Create end-user accounts on the new clouddb hosts
Resolved		• Bstorm	T269620 maintain-dbusers doesn't close connections right on harvest-replicas
Resolved		MoritzMuehlenhoff	T268725 Include mail on standard_packages.pp
Resolved		Marostegui	T268742 Test upgrading sanitarium hosts to Buster + 10.4
Resolved		Marostegui	T272008 Move wikireplicas under the new sanitarium hosts (db1154, db1155)
Resolved		• Cmjohnson	T272125 Memory errors on clouddb1019
Resolved		dcaro	T272127 2021-01-15: PROBLEM alert - labstore1004/Ensure mysql credential creation for tools users is running is CRITICAL
Resolved		Marostegui	T280492 Upgrade all sanitarium masters to 10.4 and Buster
Resolved	Request	wiki_willy	T281794 decommission db1082.eqiad.wmnet
Resolved	Request	• Cmjohnson	T281959 decommission db1074.eqiad.wmnet
Resolved	Request	• Cmjohnson	T282079 decommission db1079.eqiad.wmnet
Resolved	Request	• Cmjohnson	T282093 decommission db1087.eqiad.wmnet
Resolved	Request	• Cmjohnson	T282096 decommission db1085.eqiad.wmnet
Declined		None	T267376 Set up IP addresses for the new wiki replicas setup
Resolved		• Kormat	T268098 Make innodb_change_buffering configurable
Resolved		ArielGlenn	T261145 Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts
Resolved		• Bstorm	T264254 Prepare Quarry for multiinstance wiki replicas
Declined		• Bstorm	T267989 Do some checks of how many Quarry queries will break in a multiinstance environment
Resolved		• Bstorm	T272723 Create a way to sample wikireplicas usage data
Resolved		• Bstorm	T266266 Improve dry-run behavior of wmcs.wikireplicas.add_wiki cookbook
Resolved		• Jhernandez	T268498 Feedback from Quarry and PAWS users, and other wiki editors affected by the new architecture
Resolved		• razzi	T269211 Convert labsdb1012 from multi-source to multi-instance
Resolved		Marostegui	T270473 Ensure InnoDB is compressed on the new clouddb hosts
Resolved		Milimetric	T274690 Update sqoop to work with multi-instance clouddb1021 mariadb host
Resolved		• Bstorm	T281287 Now that labsdb1012 is clouddb1021, connect it to maintain-dbusers
Resolved		• Bstorm	T269399 Set up a way to persist non-default number of wikireplicas connections across all instances
Resolved		• Bstorm	T260511 Parametrize wmf-pt-kill so it can connect to different sockets
Resolved		• Bstorm	T274044 Fix systemd and possibly logrotate around the wmf-pt-kill service for multi-instance wikireplicas
Resolved		• Bstorm	T271476 Iron out issues in the proxy structure for multi-instance wikireplicas
Resolved		Andrew	T272553 Enable the creation of the svc.wikimedia.cloud wikireplicas addresses in Designate
Resolved		• Bstorm	T272720 Allocate service IPs for new wikireplicas setup
Resolved		• ayounsi	T273248 wikireplicas last-minute infra work to discuss / resolve
Resolved		• Jhernandez	T272523 Early testing of the new Wiki Replicas multi-instance architecture
Resolved		Marostegui	T273593 Clean up heartbeat table on clouddb hosts
Resolved		Ragesoss	T278983 New Replica returns different data than old Replica for query using `change_tag` table
Resolved		• Bstorm	T276284 Establish a working setup for PAWS with multi-instance wikireplicas
Resolved		• Bstorm	T278252 Make alias for tools.db.svc.wikimedia.cloud
Resolved		• Bstorm	T281732 Check into the configuration, cause and usefulness of memory alerts for multiinstance replicas

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ArielGlenn closed subtask T261145: Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts as Resolved.Sep 29 2020, 5:15 AM

Doing the paws work on this ticket feels like it is burying things a bit. I'm going to open a separate ticket for the notes on my quarry work.

• Jhernandez subscribed.Oct 7 2020, 7:32 PM

bd808 mentioned this in T265430: SHOW EXPLAIN FOR via Quarry or sql-optimizer fails often due to load balancing.Oct 19 2020, 3:26 PM

• Bstorm mentioned this in rPAWSf0a27bbd35fa: multiinstance-replicas: separate prod overrides into separate file.Oct 26 2020, 11:46 PM

• Bstorm mentioned this in rPAWS6830b3b6cac8: multiinstance replicas: work around helm 3 bug.

• Bstorm mentioned this in rPAWS78cbe1bbb658: multiinstance-replicas: The real CI fix (possibly).Oct 27 2020, 12:10 AM

• Bstorm renamed this task from Experiment with and design options for multi-instance or multi-section wikireplicas frontend architecture to Redesign and rebuild the wikireplicas service using a multi-instance architecture.Oct 29 2020, 7:14 PM

• Bstorm added a parent task: T260441: (Need By: ASAP) rack/setup/install clouddb10[13-20].Oct 29 2020, 7:19 PM

• Bstorm closed subtask T266266: Improve dry-run behavior of wmcs.wikireplicas.add_wiki cookbook as Resolved.Nov 5 2020, 10:03 PM

• ayounsi mentioned this in T267376: Set up IP addresses for the new wiki replicas setup.Nov 9 2020, 12:16 PM

Base subscribed.Nov 11 2020, 2:05 PM

• Bstorm mentioned this in rPAWS22e54afa5db1: emergency change: temporarily rolling back multiinstance proxy.Nov 16 2020, 10:43 PM

Ok, apparently mysqlproxy is smarter than I thought, and it can tell that I'm not pointing it at different IP addresses when I give it a list of different names that point at the same IP. As of now, the otherwise working code to connect in the new way is commented out. I may need to add another switch that generates the set of proxies when there are multiple proxy addresses to connect to. At very least, the code is all in place to do this now for PAWS.

• Bstorm added a subtask: T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's.Nov 17 2020, 12:17 AM

• Bstorm added a parent task: Restricted Task.

• Bstorm removed a parent task: T180513: Document wiki-replicas architecture for future automation.Nov 17 2020, 12:22 AM

• Bstorm removed a subtask: T253134: Find an alternative solution for the mysql-proxy in PAWS.

Marostegui closed subtask T260843: Set up roles for new wiki replicas layout as Resolved.Nov 17 2020, 3:39 PM

Huji added a subtask: T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki.Nov 19 2020, 3:33 PM

Huji added a subtask: T268242: Provide a mechanism for detecting duplicate files in enwiki and another wikipedia.Nov 19 2020, 3:40 PM

Huji added a subtask: T268244: Provide a mechanism for accessing the names of image files on Commons when querying another wiki.Nov 19 2020, 3:47 PM

• Bstorm added a subtask: T268312: Deploy labsdbuser and views to new clouddb hosts.Nov 20 2020, 5:57 PM

Change 642503 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: modify views scripts to work on any replica style

https://gerrit.wikimedia.org/r/642503

Change 642570 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/642570

• Jhernandez updated the task description. (Show Details)Nov 23 2020, 3:55 PM

Change 642503 merged by Bstorm:
[operations/puppet@production] wikireplicas: modify views scripts to work on any replica style

https://gerrit.wikimedia.org/r/642503

Change 642570 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/642570

Change 644949 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/644949

Change 644949 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/644949

Change 644950 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance--test 2

https://gerrit.wikimedia.org/r/644950

Change 644950 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance--test 2

https://gerrit.wikimedia.org/r/644950

Change 644952 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance

https://gerrit.wikimedia.org/r/644952

Change 644952 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance

https://gerrit.wikimedia.org/r/644952

Change 645173 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fix the harvest-replicas functionality

https://gerrit.wikimedia.org/r/645173

Change 645173 merged by Bstorm:
[operations/puppet@production] wikireplicas: fix the harvest-replicas functionality

https://gerrit.wikimedia.org/r/645173

• Bstorm closed subtask T268312: Deploy labsdbuser and views to new clouddb hosts as Resolved.Dec 7 2020, 7:23 PM

Marostegui added a subtask: T260511: Parametrize wmf-pt-kill so it can connect to different sockets.Dec 16 2020, 9:29 AM

Change 651857 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/private@master] multiinstance proxies: workaround puppetdb not being in cloud

https://gerrit.wikimedia.org/r/651857

Change 651857 merged by Bstorm:
[labs/private@master] multiinstance proxies: workaround puppetdb not being in cloud

https://gerrit.wikimedia.org/r/651857

• Bstorm mentioned this in rLPRI3d65263049c7: multiinstance proxies: workaround puppetdb not being in cloud.Dec 24 2020, 12:00 AM

Change 627379 merged by Bstorm:
[operations/puppet@production] wikireplicas: Work toward a proxy setup on multi-instance replicas

https://gerrit.wikimedia.org/r/627379

Marostegui closed subtask T260511: Parametrize wmf-pt-kill so it can connect to different sockets as Resolved.Jan 12 2021, 6:58 AM

• Jhernandez mentioned this in T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema.Jan 13 2021, 3:29 PM

Tacsipacsi mentioned this in T272657: [SPIKE] What % of talk pages have not yet been created?.Jan 22 2021, 2:31 PM

• Bstorm closed subtask T269399: Set up a way to persist non-default number of wikireplicas connections across all instances as Resolved.Jan 27 2021, 10:34 PM

• Bstorm closed subtask T271476: Iron out issues in the proxy structure for multi-instance wikireplicas as Resolved.Feb 12 2021, 12:02 AM

• Bstorm added a subtask: T272723: Create a way to sample wikireplicas usage data.Feb 12 2021, 12:06 AM

• Bstorm mentioned this in rPAWS1b50d46b1c06: redeploy the multi-instance proxy as "beta-mysql-proxy".Feb 17 2021, 2:40 AM

Change 664860 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fix the centralauth management bit of the view scripts

https://gerrit.wikimedia.org/r/664860

Change 664860 merged by Bstorm:
[operations/puppet@production] wikireplicas: fix the centralauth management bit of the view scripts

https://gerrit.wikimedia.org/r/664860

dcaro subscribed.Mar 8 2021, 3:59 PM

Change 670907 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/toollabs@master] wikireplicas: switch sql command to multiinstance replicas

https://gerrit.wikimedia.org/r/670907

Change 670907 merged by jenkins-bot:
[labs/toollabs@master] wikireplicas: switch sql command to multiinstance replicas

https://gerrit.wikimedia.org/r/670907

• Bstorm closed subtask T276284: Establish a working setup for PAWS with multi-instance wikireplicas as Resolved.Mar 31 2021, 6:48 PM

• fdans closed subtask T269211: Convert labsdb1012 from multi-source to multi-instance as Resolved.Apr 8 2021, 4:24 PM

elukey reopened subtask T269211: Convert labsdb1012 from multi-source to multi-instance as Open.Apr 8 2021, 4:28 PM

Jony subscribed.Apr 11 2021, 12:15 AM

• Jhernandez mentioned this in T280152: Mitigate breaking changes from the new Wiki Replicas architecture.Apr 14 2021, 3:07 PM

• Jhernandez edited parent tasks, added: T280152: Mitigate breaking changes from the new Wiki Replicas architecture; removed: T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema.

• Jhernandez removed a subtask: T267992: Provide mechanism to detect name clashed media between Commons and a Local project, without needing to join tables across wiki-db's.Apr 14 2021, 3:10 PM

• Jhernandez removed a subtask: T268240: Provide a mechanism for detecting duplicate files in commons and a local wiki.

• Jhernandez removed a subtask: T268242: Provide a mechanism for detecting duplicate files in enwiki and another wikipedia.

• Jhernandez removed a subtask: T268244: Provide a mechanism for accessing the names of image files on Commons when querying another wiki.

elukey closed subtask T269211: Convert labsdb1012 from multi-source to multi-instance as Resolved.Apr 27 2021, 6:14 AM

• Jhernandez closed subtask T272523: Early testing of the new Wiki Replicas multi-instance architecture as Resolved.Apr 27 2021, 6:50 PM

Change 683929 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplicas: redirect all database CNAMEs to the new system

https://gerrit.wikimedia.org/r/683929

• Bstorm added a subtask: T281732: Check into the configuration, cause and usefulness of memory alerts for multiinstance replicas.May 3 2021, 4:13 PM

Change 683929 merged by Bstorm:

[operations/puppet@production] wikireplicas: redirect all database CNAMEs to the new system

https://gerrit.wikimedia.org/r/683929

Change 684999 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplica-dns: fix typo

https://gerrit.wikimedia.org/r/684999

Change 684999 merged by Bstorm:

[operations/puppet@production] wikireplica-dns: fix typo

https://gerrit.wikimedia.org/r/684999

Change 685012 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplica-dns: Add the outlier CNAMES and correct fqdn

https://gerrit.wikimedia.org/r/685012

Change 685012 merged by Bstorm:

[operations/puppet@production] wikireplica-dns: Add the outlier CNAMES and correct fqdn

https://gerrit.wikimedia.org/r/685012

Change 685109 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplica-dns: Fix up the outlier dbs

https://gerrit.wikimedia.org/r/685109

Change 685109 merged by Bstorm:

[operations/puppet@production] wikireplica-dns: Fix up the outlier dbs

https://gerrit.wikimedia.org/r/685109

• Bstorm closed subtask T278252: Make alias for tools.db.svc.wikimedia.cloud as Resolved.May 4 2021, 10:59 PM

Change 685947 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplicas: cut over the last IPs to the new cluster

https://gerrit.wikimedia.org/r/685947

Change 685947 merged by Bstorm:

[operations/puppet@production] wikireplicas: cut over the last IPs to the new cluster

https://gerrit.wikimedia.org/r/685947

Change 688368 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplicas: remove the old wikireplicas role from the proxy

https://gerrit.wikimedia.org/r/688368

Change 688443 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplicas: disable notifications on the old replica cluster

https://gerrit.wikimedia.org/r/688443

Change 688501 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wikireplicas-dns: condense repeated nodes for better failover

https://gerrit.wikimedia.org/r/688501

Change 688443 merged by Marostegui:

[operations/puppet@production] wikireplicas: disable notifications on the old replica cluster

https://gerrit.wikimedia.org/r/688443

Could the breakage of the Global User Contributions tool (GUC) be related to this? See T282557.

Change 688368 merged by Bstorm:

[operations/puppet@production] wikireplicas: remove the old wikireplicas profile from the proxy

https://gerrit.wikimedia.org/r/688368

Change 688501 merged by Bstorm:

[operations/puppet@production] wikireplicas-dns: condense repeated nodes for better failover

https://gerrit.wikimedia.org/r/688501

Rubin16 subscribed.May 22 2021, 1:07 PM

Nintendofan885 mentioned this in T285420: wmcounter.toolforge.org doesn't show the edit count.Jun 23 2021, 6:29 PM

Change 704348 had a related patch set uploaded (by Nskaggs; author: Nskaggs):

[operations/cookbooks@master] Remove legacy wiki replicas

https://gerrit.wikimedia.org/r/704348

Change 704348 merged by jenkins-bot:

[operations/cookbooks@master] Remove legacy wiki replicas

https://gerrit.wikimedia.org/r/704348

• nskaggs closed subtask T264254: Prepare Quarry for multiinstance wiki replicas as Resolved.Aug 2 2021, 8:13 PM

bd808 closed this task as Resolved.Oct 29 2021, 10:50 PM

bd808 closed subtask T268498: Feedback from Quarry and PAWS users, and other wiki editors affected by the new architecture as Resolved.

• nskaggs closed subtask T281732: Check into the configuration, cause and usefulness of memory alerts for multiinstance replicas as Resolved.Jan 25 2022, 4:09 PM

• nskaggs closed subtask T272723: Create a way to sample wikireplicas usage data as Resolved.

fnegri mentioned this in T322658: Improve LVS config for wikireplicas (dbproxy1018/dbproxy1019).Nov 8 2022, 4:47 PM

• Bstorm mentioned this in rCCKBd7e54dda47c9: wikireplicas: add wikireplica cookbook to add a wiki.Dec 14 2022, 3:26 PM

• Bstorm mentioned this in rCCKB8cfca0cf5567: wikireplicas: fix typo in the dns script for wikireplicas.

• Bstorm mentioned this in rCCKBadc2829c4bfb: Remove legacy wiki replicas.Dec 14 2022, 3:28 PM

Redesign and rebuild the wikireplicas service using a multi-instance architectureClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Redesign and rebuild the wikireplicas service using a multi-instance architecture
Closed, ResolvedPublic
Actions

Related Objects
Search...