Page MenuHomePhabricator

No mw canary servers in codfw
Closed, ResolvedPublic

Description

We currently don't have any servers with the mediawiki::appserver::canary_api and mediawiki::appserver:.canary roles in codfw. I'm pretty certain we had those in the past, but maybe there got dropped by means of hardware refreshment?

Given that there's a DC switchover coming, we should fix that.


canary appservers codfw:

mwdebug2001 (row A, ganeti VM)
mwdebug2002 (row B, ganeti VM)
mw2163 (C3, physical)
mw2164 (C3, physical)
mw2271 (D3, physical)
mw2272 (D3, physical)

canary API appservers codfw:

mw2215 (A3, physical)
mw2216 (A3, physical)
mw2244 (A4, physical)
mw2245 (A4, physical)

Details

Related Gerrit Patches:

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 13 2020, 1:55 PM
Dzahn added a subscriber: Dzahn.Jan 13 2020, 7:34 PM

Change 564175 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] define 2 API appservers per row in codfw as canary API appservers

https://gerrit.wikimedia.org/r/564175

MoritzMuehlenhoff triaged this task as Medium priority.Jan 14 2020, 10:27 AM
jijiki added a subscriber: jijiki.Jan 16 2020, 12:47 PM

Yeah, we need at least a total of 4 api and 4 app canary servers in codfw. In eqiad our canary app (5) and api (4) servers are in the same rack actually, we can spread them a bit when we install the new servers

Agreed, I think for our uses of the canaries, rack redundancy is not a must, but would still be nice to have when re-adding canaries to codfw.

Change 564175 merged by Dzahn:
[operations/puppet@production] define 2 API appservers per row in codfw as canary API appservers

https://gerrit.wikimedia.org/r/564175

Dzahn added a comment.Wed, Feb 5, 7:06 PM

The following are now declared canary API appservers in site.pp:

mw2215, mw2216 (rack A3)

mw2244, mw2245 (rack A4)

Change 570405 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: define 2 codfw appservers as canary_appservers

https://gerrit.wikimedia.org/r/570405

Dzahn updated the task description. (Show Details)Wed, Feb 5, 7:27 PM

Change 570405 merged by Dzahn:
[operations/puppet@production] site: define 2 codfw appservers as canary_appservers

https://gerrit.wikimedia.org/r/570405

Mentioned in SAL (#wikimedia-operations) [2020-02-06T22:13:40Z] <mutante> turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is (T242606)

Dzahn updated the task description. (Show Details)Thu, Feb 6, 10:20 PM
Dzahn added a comment.Thu, Feb 6, 10:22 PM

mw2163 and mw2271 have been turned into canary appservers now. As opposed to canary API appservers this means actual puppet changes which are:

  • mediawiki-testers shell access group gets added
  • scap sql scripts get removed
  • nginx, keepalive-requests value changes from 100 to 1000

Together with existing mwdebug2001 and mwdebug2002 this makes it 4 as well.

Is this resolved or would you really like them reimaged as mwdebug2003 and mwdebug2004 ?

Dzahn added a comment.Thu, Feb 6, 10:23 PM

@jijiki What do you think ? Is this good now? 4 of each type and in different rows/racks.

Given we have 5 canary appservers in eqiad + 2 debug servers, I would recommend we add another 2 in codfw

@jijiki Don't we have mwdebug2001 and mwdebug2002 in codfw too?

jijiki added a comment.EditedFri, Feb 7, 10:42 AM

@Urbanecm they do not get user traffic, so they are good enough for testing, but not good enough for canary deloys. When we switch to codfw, we will need them.

Is that different from what eqiad debug servers do? I'm trying to understand why you said "Given we have 5 canary appservers in eqiad + 2 debug servers" (emphasis mine).

@Urbanecm yes, so that is a total of 7 canary app servers in eqiad, of which 5 get real user traffic. Since we will be switching to codfw, it makes sense to have a similar setup in codfw.

Change 571366 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: define 2 more canary appservers in codfw

https://gerrit.wikimedia.org/r/571366

Change 571366 merged by Dzahn:
[operations/puppet@production] site: define 2 more canary appservers in codfw

https://gerrit.wikimedia.org/r/571366

Dzahn updated the task description. (Show Details)Mon, Feb 10, 10:37 PM
Dzahn updated the task description. (Show Details)

@jijiki @Urbanecm

I added 2 more canary appservers. now we have:

mwdebug2001 (row A, ganeti VM)
mwdebug2002 (row B, ganeti VM)
mw2163 (C3, physical)
mw2164 (C3, physical)
mw2271 (D3, physical)
mw2272 (D3, physical)

canary API appservers codfw:

mw2215 (A3, physical)
mw2216 (A3, physical)
mw2244 (A4, physical)
mw2245 (A4, physical)
jijiki closed this task as Resolved.Thu, Feb 13, 7:16 PM
jijiki claimed this task.

thank you daniel!