⚓ T94620 [EPIC] The future of MediaWiki deployment: Tooling

• mmodell created this task.Mar 31 2015, 8:43 PM

• mmodell raised the priority of this task from to Needs Triage.

• mmodell updated the task description. (Show Details)

• mmodell added a project: Deployments.

• mmodell added subscribers: • mmodell, greg.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 31 2015, 8:43 PM

What is this task about? (Some conference presentation? Or just Epic with a corresponding epic title? :P )

greg triaged this task as Medium priority.Apr 2 2015, 5:08 PM

greg set Security to None.

greg moved this task from To Triage to Backlog (Tech) on the Deployments board.

• mobrovac subscribed.Apr 2 2015, 5:11 PM

@Aklapper: yeah it's epic

• mmodell renamed this task from The future of MediaWiki deployment: Tooling to EPIC: The future of MediaWiki deployment: Tooling.Apr 3 2015, 10:48 PM

• mmodell updated the task description. (Show Details)

• mmodell added subtasks: T93433: Evaluate Ansible as a deployment tool, Restricted Task.Apr 10 2015, 6:38 AM

• mmodell added subscribers: dduvall, • demon, thcipriani.

• mmodell updated the task description. (Show Details)Apr 10 2015, 7:00 AM

greg added a project: releng-201415-Q4.Apr 10 2015, 3:13 PM

• mmodell mentioned this in T93433: Evaluate Ansible as a deployment tool.Apr 14 2015, 5:53 PM

• mmodell added a subtask: T93428: Streamline our service development and deployment process.

• mmodell added a subscriber: • GWicke.

• mmodell added a subtask: T89945: Merge to deployed branches instead of cutting a new deployment branch every week..Apr 14 2015, 5:55 PM

@dduvall, @thcipriani, @demon: This is a fairly helpful high level overview/comparison of salt and ansible, maybe worth a read: http://jensrantil.github.io/salt-vs-ansible.html

In order to depool a server (and not trigger false alarms in the alerting system) we would like to have a deployment flag (a lock file or something) that causes the pybal check to fail but in a way that lets pybal know that this is a temporary and expected downtime. Then some custom logic in pybal would temporarily depool the server but continue checking for the status to return to normal (and bypass alerting)

So, in our current configuration, pybal checks the individual servers by ssh, which uses a "forced command" on the target server to run this:

uptime; touch /var/tmp/pybal-check.stamp

It also checks http but only on the varnish proxy not on the individual apache nodes.

Changing the force command to something like this would probably do the trick:

[ -f "/tmp/deploy.lock" ] && stat --format=deployment:%Y /tmp/deploy.lock && exit 123 || uptime;

This will exit with status 123 when the lock file exists, and outputs the last modified time of the lock file as "deployment:timestamp" so that pybal can know a) that it's down for deployment and b) when the deployment lock was last updated. We could then add some intelligence to pybal to alert if a host is stuck in deployment...the deployment process could periodically touch the lock file to indicate that it is in fact progressing and not hung somewhere.

Sound good?

So pybal already has a built in (and configurable) threshold limiting how many servers can be depooled at any given time. This addresses @chasemp's concern about silently depooling most of the cluster if something goes wrong with a deploy.

it looks like all we need is the change above to implement the depool during deploy via creating a simple lock file and removing it at the end of the deployment process.

We should have some sort of self-checks to be sure that everything is kosher before removing the lock.

fgiunchedi subscribed.Apr 20 2015, 9:54 AM

greg added a project: Release-Engineering-Team.Apr 27 2015, 3:36 PM

• mmodell added a subtask: T97068: Come up with an abstract deployment model that roughly addresses the needs of existing projects.Apr 27 2015, 5:02 PM

• mmodell mentioned this in T89945: Merge to deployed branches instead of cutting a new deployment branch every week..May 9 2015, 5:47 AM

greg mentioned this in T92565: Release/QA tasks at the Wikimedia Hackathon 2015.May 23 2015, 8:39 AM

I just put this as a session for tomorrow at 2pm Lyon time per https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2015/Program

JanZerebecki subscribed.May 27 2015, 5:01 PM

• mmodell closed subtask Restricted Task as Resolved.Jun 2 2015, 4:36 PM

• mmodell mentioned this in T101128: Create a phabricator application that simply summarizes which branches are currently deployed, in an easily understandable format..Jun 2 2015, 5:18 PM

• mmodell added a subtask: T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime.Jun 4 2015, 4:49 AM

@mmodell, re monitoring / depooling: You could consider directly checking etcd for this. See T100793.

greg reopened subtask Restricted Task as Open.Jun 4 2015, 3:50 PM

greg moved this task from INBOX to Backlog (ARCHIVED) on the Release-Engineering-Team board.Jun 5 2015, 3:46 PM

• mmodell closed subtask T97068: Come up with an abstract deployment model that roughly addresses the needs of existing projects as Resolved.Jun 8 2015, 5:56 PM

• mmodell moved this task from Backlog (Tech) to In-progress on the Deployments board.Jun 8 2015, 6:17 PM

ArielGlenn subscribed.Jun 11 2015, 5:39 PM

• mmodell added a parent task: T102550: Use semantic versioning for services (for consistency with mediawiki core).Jun 15 2015, 9:29 PM

• mmodell closed subtask Restricted Task as Resolved.Jun 23 2015, 3:46 PM

greg removed a project: releng-201415-Q4.Jun 26 2015, 11:02 PM

• mmodell added a subtask: T104352: Make scap able to depool/repool servers via the conftool API.Jul 1 2015, 10:13 PM

• mmodell updated the task description. (Show Details)Jul 1 2015, 10:16 PM

Liudvikas subscribed.Jul 19 2015, 4:15 AM

greg added a project: Release-Engineering-Epics.Sep 24 2015, 1:04 AM

greg removed a project: Release-Engineering-Team.

Krinkle closed subtask T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime as Resolved.Feb 26 2016, 1:58 PM

greg edited projects, added Release-Engineering-Team; removed Release-Engineering-Epics.Mar 11 2016, 10:08 PM

greg moved this task from Backlog (ARCHIVED) to Epics (ARCHIVED) on the Release-Engineering-Team board.Mar 11 2016, 10:09 PM

Luke081515 renamed this task from EPIC: The future of MediaWiki deployment: Tooling to [EPIC] The future of MediaWiki deployment: Tooling.Mar 22 2016, 6:26 PM

• mmodell mentioned this in T136015: thoroughly document the new branch cutting plan / strategy.May 23 2016, 4:38 PM

• mmodell closed subtask T89945: Merge to deployed branches instead of cutting a new deployment branch every week. as Declined.Feb 14 2017, 2:02 PM

• GWicke closed subtask T93433: Evaluate Ansible as a deployment tool as Declined.Apr 19 2017, 11:12 PM

• GWicke closed subtask T93428: Streamline our service development and deployment process as Resolved.Apr 19 2017, 11:29 PM

• Phabricator_maintenance moved this task from In-progress to Backlog (Tech) on the Deployments board.Sep 26 2017, 11:02 PM

• Phabricator_maintenance edited projects, added Release-Engineering-Team-TODO; removed Release-Engineering-Team.Jun 12 2019, 11:40 PM

• Phabricator_maintenance moved this task from Should be empty (use Release-Engineering-Team) to Epics on the Release-Engineering-Team-TODO board.Jun 12 2019, 11:41 PM

greg added a project: Release-Engineering-Team.Jun 21 2019, 10:35 PM

greg edited projects, added Release-Engineering-Team (Deployment services); removed Release-Engineering-Team.Aug 1 2019, 11:17 PM

I'm going to close this as it's no longer actively worked on and all direct subtasks have been closed.

Aklapper closed subtask T104352: Make scap able to depool/repool servers via the conftool API as Resolved.May 19 2021, 10:00 AM

[EPIC] The future of MediaWiki deployment: Tooling
Closed, ResolvedPublic
Actions

Description

Related Objects
Search...

Event Timeline

Status	Subtype	Assigned	Task
Declined		None	T49437 Consider a pipeline for enhanced minification (e.g. support UglifyJS)
Open		None	T104398 Deploy MW+Extensions by percentage of users (instead of by domain/wiki)
Resolved		• demon	T73313 Automatically clean up unused wmfXX versions
Declined		• mmodell	T98834 Use subrepos instead of git submodules for deployed MediaWiki extensions
Resolved		Legoktm	T67289 Use semantic versioning scheme for WMF (all) releases
Resolved		• GWicke	T102550 Use semantic versioning for services (for consistency with mediawiki core)
Resolved		• mmodell	T94620 [EPIC] The future of MediaWiki deployment: Tooling
			Restricted Task
			Restricted Task
Resolved		• GWicke	T93428 Streamline our service development and deployment process
Declined		• GWicke	T93433 Evaluate Ansible as a deployment tool
Resolved		thcipriani	T104276 Setup staging for testing RESTBase deploys
Declined		None	T93439 Evaluate Docker as a container deployment tool
Resolved		• mobrovac	T95533 Unify SCA Service Puppet Modules / Roles
Resolved		akosiaris	T97031 Define and then implement a way for a future service owner to provide the info required to have a new service brought into production
Resolved		akosiaris	T97036 Define and implement an automated process to ease the introduction of a new service into production
Declined		• mmodell	T89945 Merge to deployed branches instead of cutting a new deployment branch every week.
Invalid		None	T51392 Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched
Resolved		• mmodell	T67306 Adopt Semantic Versioning format for WMF deploy branches beginning with 1.27.0-wmf.1
Resolved		Jdforrester-WMF	T107192 Update ReleaseTaggerBot to deal with SemVer for WMF deployed branches (eg 1.23.0-wmf.6)
Declined		• mmodell	T136015 thoroughly document the new branch cutting plan / strategy
Resolved		• mmodell	T142880 Create `scap swat` command to automate patch merging & testing during a swat deployment
Resolved		• mmodell	T140918 create `scap branch` command (the successor to make-wmf-branch)
Resolved		• mmodell	T142590 make scap3 look in PWD to find local CLI extensions
Resolved		dduvall	T140921 Reduce static asset time on disk from five trains' worth to two
Resolved		Krinkle	T102578 Don't trash cache for front-end resources
Resolved		Krinkle	T99096 Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime
Resolved		• mmodell	T102991 Verify traffic to static resources from past branches does indeed drain
Duplicate		Krinkle	T98087 ResourceLoader module version must only change when effective output would change
Resolved		Krinkle	T104950 Make FileModule version hash deterministic
Resolved		Krinkle	T94810 User modules constantly invalidate their cache timestamp
Resolved	PRODUCTION ERROR	MaxSem	T90411 (4 hrs) ResourceLoader timestamp for mobile.usermodule changes constantly
Resolved		Krinkle	T94074 Refactor ResourceLoader versioning system to use hashes instead of timestamps
Resolved		Krinkle	T111481 Fix intermittend ghost entries in FileModule 'fileHashes' data
Resolved		Krinkle	T113868 File dependency tracking unstable (varies by language)
Resolved		Krinkle	T109394 ResourceLoaderModuleTest::testGetVersionHash is flaky
Resolved		Krinkle	T113092 Revise the design of ResourceLoader's MessageBlobStore
Resolved		• mmodell	T97068 Come up with an abstract deployment model that roughly addresses the needs of existing projects
Resolved		None	T104352 Make scap able to depool/repool servers via the conftool API
Resolved		Joe	T73212 Make it possible to quickly and programmatically pool and depool application servers
Resolved		None	T115899 Move scap target configuration to etcd
Resolved		Joe	T163565 Install conftool on deployment masters

[EPIC] The future of MediaWiki deployment: ToolingClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[EPIC] The future of MediaWiki deployment: Tooling
Closed, ResolvedPublic
Actions

Related Objects
Search...