state with etcd or similar
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	BBlack
	Apr 23 2015, 2:30 PM

Description

We need something like an etcd cluster (or similar alternative) controlling the active cache lists at the various layers (pybal/LVS -> varnish, varnish -> varnish). Ideally the puppet nodelists would populate the basics nodelists in etcd (currently in hieradata/common/cache/*.yaml), and we'd have the data structure set up so that nodes can be runtime-depooled via etcd updates at runtime independently of that, and then wrap some tooling around it to easily depool a given cache node in both the frontend and backend senses globally. From there it would become much easier to script up daemon/host restarts with automatic depooling, and we could also have nodes self-(de|re)pool around clean reboots on their own as well via initscripts->etcd.

In code terms, the pybal integration could be direct, while varnish would probably have data-updated-triggered regeneration of a VCL fragment + reload-vcl.

Details

Subject	Repo	Branch	Lines +/-
varnish: default dynamic_directors true (changes eqiad)	operations/puppet	production	+1 -5
varnish: enable dynamic directors in esams	operations/puppet	production	+1 -0
varnish: enable dynamic directors in ulsfo	operations/puppet	production	+1 -4
varnish: enable dynamic directors for a subset of ulsfo hosts	operations/puppet	production	+4 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Joe	T97029 integrate (pybal\|varnish)->varnish backend config/state with etcd or similar
Resolved	Joe	T95656 Choose a consistent, distributed k/v storage for configuration management/discovery
Resolved	Joe	T96825 etcd evaluation
Resolved	Joe	T96832 consul evaluation
Resolved	Joe	T96839 zookeeper evaluation
Resolved	Joe	T97974 Create a confd puppet module
Resolved	Joe	T97975 Integrate confd into the varnish configuration to generate the list of active backends
Invalid	Joe	T97976 Figure out a data layout for etcd that can work for both varnish backends lists and for pybal pools
Resolved	Joe	T97978 Create a tool to sync static configuration from a repository to the consistent k/v store
Resolved	Joe	T97973 Create an etcd puppet module + find suitable servers for deployment
Resolved	MoritzMuehlenhoff	T98009 Allow creation of SRV records in labs.
Resolved	Joe	T97972 Figure out a security model for etcd
Resolved	Joe	T118830 Backport etcd 2.2 to jessie
Resolved	Joe	T118831 Upgrade the production etcd cluster to 2.2
Resolved	Joe	T118833 Upgrade conftool to support credentials form a config file
Resolved	Joe	T118834 Upgrade python-etcd to 0.4.2+
Resolved	Joe	T97970 Package a modern version of etcd for jessie, trusty
Resolved	Joe	T101713 Install etcd in multiple rows/racks
Resolved	• chasemp	T101858 Create a confd template for pybal files that will work with our etcd schema.

Event Timeline

BBlack created this task.Apr 23 2015, 2:30 PM

BBlack raised the priority of this task from to Low.

BBlack updated the task description. (Show Details)

BBlack added projects: acl*sre-team, Traffic.

BBlack added subscribers: BBlack, MoritzMuehlenhoff, Joe.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 23 2015, 2:30 PM

BBlack added a subtask: T95656: Choose a consistent, distributed k/v storage for configuration management/discovery.Apr 23 2015, 2:30 PM

BBlack moved this task from Backlog to Blocked on Internal on the Traffic board.

faidon subscribed.Apr 23 2015, 3:32 PM

BBlack moved this task from Blocked on Internal to Backlog on the Traffic board.Apr 24 2015, 3:52 PM

Joe closed subtask T95656: Choose a consistent, distributed k/v storage for configuration management/discovery as Resolved.May 4 2015, 7:29 AM

Joe added a subtask: T97973: Create an etcd puppet module + find suitable servers for deployment.May 4 2015, 7:46 AM

Joe added a project: discovery-system.May 4 2015, 2:28 PM

Joe set Security to None.

Joe removed a subtask: T97971: Properly package confd and its dependencies.May 4 2015, 4:52 PM

So, given we chose to go ahead with etcd, we will use confd for writing a single VCL fragment containing the backends info, and our traditional scripts to reload it.

an alternative approach for pybal would be to do the same thing as varnish: generate config files with confd and wait for pybal to pick them up on the local FS via file://, possibly at a shorter interval like 10s or immediately via inotify

Joe added a project: services-tooling.May 20 2015, 8:15 AM

Joe closed subtask T97973: Create an etcd puppet module + find suitable servers for deployment as Resolved.May 20 2015, 10:10 AM

Joe moved this task from Backlog to In progress on the discovery-system board.Jun 1 2015, 2:29 PM

Joe closed subtask T97974: Create a confd puppet module as Resolved.

• Gage subscribed.Jun 1 2015, 11:01 PM

Joe edited subtasks, added: T101713: Install etcd in multiple rows/racks; removed: T98165: Figure out an etcd deploy strategy that includes multi DC failure scenarios., T97972: Figure out a security model for etcd.Jun 8 2015, 4:06 PM

Joe removed a subtask: T97970: Package a modern version of etcd for jessie, trusty.

Joe closed subtask T97978: Create a tool to sync static configuration from a repository to the consistent k/v store as Resolved.Jun 11 2015, 10:36 AM

• chasemp closed subtask T101858: Create a confd template for pybal files that will work with our etcd schema. as Resolved.Jun 17 2015, 2:23 PM

Joe closed subtask T101713: Install etcd in multiple rows/racks as Resolved.Jun 29 2015, 11:46 AM

Restricted Application added a subscriber: Matanya. · View Herald TranscriptJun 29 2015, 11:46 AM

Joe closed subtask T97975: Integrate confd into the varnish configuration to generate the list of active backends as Resolved.Jul 1 2015, 8:08 AM

For Varnish switch: I am verifying that all hosts are represented correctly in the generated lists,

so far verified the text cluster and the lists are 1:1 with what we get from puppet.

I've applied all the custom hardware-based weighting that matters at all levels for nginx/varnish-* pools.

I'm auditing that data too, but at the confctl level rather than the output-file level. It all looked correct for nodelists, pooled=yes, weights. Only a few exceptions:

Findings that actually need cleaning:

dc=esams,cluster=cache_text,service=varnish-be
- bad entry (which I created when testing a tool): {"cp3011.esams.wmnex": {"pooled": "no", "weight": 128}}
dc=codfw,cluster=cache_text,service=varnish-be
- bad entry (same, but accidental): {"co2001.codfw.wmnet": {"pooled": "no", "weight": 100}}

Totally invalid sub-trees (source data corrected since, but useless/pointless keys still exist in data), the ones I know of are:

cluster=cache_bits,service=varnish-be (all dcs)

Change 223029 had a related patch set uploaded (by Giuseppe Lavagetto):
varnish: enable dynamic directors for a subset of ulsfo hosts

https://gerrit.wikimedia.org/r/223029

Change 223030 had a related patch set uploaded (by Giuseppe Lavagetto):
varnish: enable dynamic directors in ulsfo

https://gerrit.wikimedia.org/r/223030

Change 223029 merged by Giuseppe Lavagetto:
varnish: enable dynamic directors for a subset of ulsfo hosts

https://gerrit.wikimedia.org/r/223029

Joe mentioned this in rOPUP9b96084b99df: varnish: enable dynamic directors for a subset of ulsfo hosts.Jul 6 2015, 2:16 PM

Change 223030 merged by Giuseppe Lavagetto:
varnish: enable dynamic directors in ulsfo

https://gerrit.wikimedia.org/r/223030

Joe mentioned this in rOPUP12f4aa521d87: varnish: enable dynamic directors in ulsfo.Jul 6 2015, 3:13 PM

Change 223312 had a related patch set uploaded (by BBlack):
varnish: enable dynamic directors in esams

https://gerrit.wikimedia.org/r/223312

Change 223312 merged by BBlack:
varnish: enable dynamic directors in esams

https://gerrit.wikimedia.org/r/223312

BBlack mentioned this in rOPUPd68716c5c0fd: varnish: enable dynamic directors in esams.Jul 7 2015, 3:55 PM

Change 224649 had a related patch set uploaded (by BBlack):
varnish: default dynamic_directors true (changes eqiad)

https://gerrit.wikimedia.org/r/224649

Change 224649 merged by BBlack:
varnish: default dynamic_directors true (changes eqiad)

https://gerrit.wikimedia.org/r/224649

BBlack mentioned this in rOPUP0ea5ca804d57: varnish: default dynamic_directors true (changes eqiad).Jul 14 2015, 8:35 PM

fgiunchedi assigned this task to Joe.Jul 20 2015, 1:33 PM

fgiunchedi mentioned this in T102394: Implement pybal pool state monitoring and alerting via icinga.Jul 21 2015, 1:50 PM

Joe closed this task as Resolved.Nov 3 2015, 12:58 PM

BBlack moved this task from Backlog to Done on the Traffic board.Nov 30 2015, 6:02 PM

integrate (pybal|varnish)->varnish backend config/state with etcd or similarClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

integrate (pybal|varnish)->varnish backend config/state with etcd or similar
Closed, ResolvedPublic
Actions

Related Objects
Search...