Page MenuHomePhabricator

Provide cross-dc redundancy (active-active or active-passive) to all important misc services
Open, MediumPublic

Description

There are some non-core, non-mediawiki services that may or may not be desired, or may or may not be ready to switchover to codfw. These are not part of the goal, but it would be nice to know, for each one:

  1. This service is ready to switchover
  2. This service is not ready, but it would be desired
  3. This service is not either ready or intended to be switchover
ServiceStateComment
Cloud-related servicesNot ready or intended to switchover
AnalyticsNot ready or intended to switchover
DumpsNot ready or intended to switchover
PlanetReady https://gerrit.wikimedia.org/r/347892
fluorine / mwlog1001Ready T123728: replace fluorine with mwlog servers (was: Upgrade fluorine to trusty/jessie)
GerritReady T148186: Build warm slave for Gerrit in Dallas
PhabricatorNot ready, but intended In progress: T137928: Deploy phabricator to phab2001.codfw.wmnet / T164810: Switch phabricator production to codfw
NOC / dbtreeNot ready, but intended T163141: dbtree: make wasat a working backend and become active-active
tendrilReady* dbmonitor2001 is ready but passive T149557: Site: 2 VM request for tendril (switch tendril from einsteinium to dbmonitor*), * because replication doesn't work well with events, it requires app changes. Service can be easily failed over but past monitoring data would be lost
releasesReady T171917: setup releases2001.codfw.wmnet https://gerrit.wikimedia.org/r/#/c/368527/

Details

Related Gerrit Patches:

Related Objects

StatusAssignedTask
Resolvedmmodell
ResolvedPaladox
ResolvedDzahn
OpenDzahn
ResolvedDzahn
Resolvedmmodell
ResolvedJoe
OpenNone
OpenNone
ResolvedDzahn
Resolvedfgiunchedi
ResolvedRobH
Resolvedfgiunchedi
ResolvedRobH
ResolvedArielGlenn
ResolvedRobH
Resolveddemon
ResolvedPapaul
Resolvedfaidon
Declinedfgiunchedi
ResolvedRobH
ResolvedDzahn
Resolvedakosiaris
Resolvedakosiaris
Declinedakosiaris
Resolvedfgiunchedi
Resolvedhashar
ResolvedRobH
Resolvedfgiunchedi
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedhashar
ResolvedRobH
ResolvedRobH
ResolvedPapaul
ResolvedDzahn
StalledNone
Resolvedjcrespo
ResolvedDzahn
ResolvedDzahn
StalledNone
StalledNone
Resolvedmmodell
ResolvedRobH
ResolvedMoritzMuehlenhoff
ResolvedDzahn
InvalidNone
DeclinedDzahn
Resolvedmmodell
DeclinedNone
Resolvedmmodell
Openmmodell
Openmmodell

Event Timeline

jcrespo created this task.Feb 1 2017, 5:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 1 2017, 5:57 PM

The reason I created this ticket is because, as a DBA, I have to support some of those services below the app layer, so I need to know the state on dallas- but it not only restricted to m[1-5]-hosted services. I included release-engineering team because many non-core services are developer-supporting tools.

Paladox added a subscriber: Paladox.Feb 1 2017, 6:18 PM
Dzahn added a subscriber: Dzahn.Mar 22 2017, 11:15 PM

Change 347892 had a related patch set uploaded (by Dzahn):
[operations/puppet@production] planet/varnish-misc: switch planet to active-active

https://gerrit.wikimedia.org/r/347892

Change 347892 merged by Dzahn:
[operations/puppet@production] planet/varnish-misc: switch planet to active-active

https://gerrit.wikimedia.org/r/347892

jcrespo renamed this task from Understand the preparedness of misc services for datacenter switchover to Provide cross-dc redundancy (active-active or active-passive) to all important misc services.May 4 2017, 1:46 PM
jcrespo lowered the priority of this task from High to Medium.

Removing tag due to change in scope of the ticket.

jcrespo updated the task description. (Show Details)Jul 25 2017, 6:02 PM
Krinkle updated the task description. (Show Details)Jul 25 2017, 6:09 PM
Krinkle updated the task description. (Show Details)Jul 25 2017, 6:12 PM
Krinkle updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)Jul 25 2017, 6:15 PM
jcrespo updated the task description. (Show Details)

Change 368527 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] cache::misc: releases: add codfw backend, make active-active

https://gerrit.wikimedia.org/r/368527

Change 368527 merged by Dzahn:
[operations/puppet@production] cache::misc: releases: add codfw backend, make active-active

https://gerrit.wikimedia.org/r/368527

Dzahn updated the task description. (Show Details)Jul 28 2017, 11:44 PM
Dzahn updated the task description. (Show Details)
mmodell changed the status of subtask T137928: Deploy phabricator to phab2001.codfw.wmnet from Open to Stalled.Jul 31 2017, 6:46 PM
mmodell updated the task description. (Show Details)Aug 4 2017, 7:31 AM
hashar changed the status of subtask T150771: Secondary production Jenkins for CI from Open to Stalled.Oct 12 2017, 8:46 AM
mmodell changed the status of subtask T137928: Deploy phabricator to phab2001.codfw.wmnet from Stalled to Open.Mar 23 2018, 8:00 PM
jcrespo updated the task description. (Show Details)Jun 8 2018, 12:51 PM
jcrespo updated the task description. (Show Details)
Marostegui added a comment.EditedMar 18 2019, 1:54 PM

This could probably move forward once T218570 gets resolved.

I think those are different things. T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS, from my understanding, is a _new_ database misc cluster (writable) just for OpenStack.

Dzahn changed the status of subtask T164810: Switch phabricator production to codfw from Open to Stalled.Sep 13 2019, 7:11 PM
Dzahn changed the status of subtask T137928: Deploy phabricator to phab2001.codfw.wmnet from Stalled to Open.Sep 13 2019, 7:22 PM