Scap3: updates, upgrades, and challenges
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	thcipriani
	Sep 28 2015, 10:18 PM

Description

Scap3 has come quite a long way over the past quarter: https://phabricator.wikimedia.org/tag/scap3/

The first deploy via Scap3 happened on beta cluster last Wednesday (outcomes outlined here: https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/RESTBase_Beta_deploy). The cabal et al are moving forward with more deployments and by the time the dev summit rolls around there should be a lot to discuss. We'd love to spread the good word about the work that RelEng has been doing over several quarters with interested deployers and opsen that have concerns or want to help push the project forward.

The outline below tries to expand on some of the points for discussion in the task description. Basically, the session at the Dev Summit has a few audiences:

Opsen who can help move MediaWiki and other projects to a more automated deployment.
Repo deployers whose repositories haven't yet (by that point) moved to Scap3.

Dev summit discussion:

Explain how to move a repository from using Trebuchet to Scap3
Find some other "next-step" repos—there should be a first handful of repos on Scap by the time of the dev summit
How Scap3 would help prevent MediaWiki outages (see: T116593#1755029)
What has to happen to get Scap3 deploying MediaWiki?

Hopeful Dev-summit Outcomes

Clear path forward to deploying MediaWiki with Scap3
Clear path forward to reducing the number of deployment tools—how does Trebuchet go away?
Better understanding for interested deployers of how to port a repo from Trebuchet to Scap3 for those project maintainers that would be interested

Related Objects

Mentioned In: T119593: Define the list of "must have" sessions for WikiDev '16
T119032: WikiDev 16 working area: Software engineering
Mentioned Here: T116593: Exception caught inside exception handler

Event Timeline

thcipriani created this task.Sep 28 2015, 10:18 PM

thcipriani raised the priority of this task from to Needs Triage.

thcipriani updated the task description. (Show Details)

thcipriani added projects: Wikimedia-Developer-Summit-2016, Release-Engineering-Team.

thcipriani added subscribers: thcipriani, dduvall, greg and 2 others.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 28 2015, 10:18 PM

greg moved this task from INBOX to Next (ARCHIVED) on the Release-Engineering-Team board.Sep 28 2015, 11:35 PM

What does the migration process to Scap3 looks like?

That one can probably be presented soonish since we are apparently going to use scap3 to deploy RESTBase on beta cluster soonish.

Automated beta-cluster deployment via Jenkins

I am pretty sure we can do that in October/November. Depends on how well scap3/RESTBase went.

JanZerebecki subscribed.Sep 30 2015, 4:22 PM

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

Qgil moved this task from Backlog to Missing expected fields on the Wikimedia-Developer-Summit-2016 board.Oct 12 2015, 9:14 PM

Hi @thcipriani, this proposal is focusing on a Summit session but there is no indication about topics that could be discussed here before, and therefore it is missing active discussion now. Note that pre-scheduled Summit sessions are expected to be preceded by online discussions and a plan to reach to conclusions and next steps. It would be good to sort out these problems before the next deadline on November 6.

hashar unsubscribed.Oct 28 2015, 11:57 AM

Krenair subscribed.Oct 28 2015, 12:30 PM

The outline below tries to expand on some of the points for discussion in the task description. Basically, the session at the Dev Summit has a few audiences:

Opsen who can help move MediaWiki and other projects to a more automated deployment.
Repo deployers whose repositories haven't yet (by that point) moved to Scap3.

Dev summit discussion:

Explain how to move a repository from using Trebuchet to Scap3
Find some other "next-step" repos—there should be a first handful of repos on Scap by the time of the dev summit
How Scap3 would help prevent MediaWiki outages (see: T116593#1755029)
What has to happen to get Scap3 deploying MediaWiki?

Hopeful Dev-summit Outcomes

Clear path forward to deploying MediaWiki with Scap3
Clear path forward to reducing the number of deployment tools—how does Trebuchet go away?
Better understanding for interested deployers of how to port a repo from Trebuchet to Scap3 for those project maintainers that would be interested

fgiunchedi subscribed.Nov 2 2015, 6:05 PM

Scap3 was developed chiefly to replace (and improve upon) Trebuchet as a deployment tool for Services but with a general enough architecture to serve MediaWiki deployments. We've had invaluable insight on the latter from @mmodell throughout planning and implementation but, nonetheless, we feel that a focused conversation with other experienced MW engineers/opsen (e.g. @bd808, @Krenair, @csteipp, @Catrope, @GWicke) around @thcipriani's aforementioned topics would benefit all stakeholders (bingo, anyone?) of the Scap toolchain.

If current (and new) subscribers could signal their willingness/interest (or disinterest) to engage in such a conversation at the Dev Summit, we would greatly appreciate it. And we look forward to the conversation, whenever/wherever it may occur!

I am interested in this.

• demon awarded a token.Nov 2 2015, 7:40 PM

bd808 awarded a token.Nov 2 2015, 8:23 PM

• Babygirl.md7565 subscribed.Nov 4 2015, 4:27 AM

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 4 2015, 4:27 AM

• Babygirl.md7565 triaged this task as High priority.Nov 4 2015, 4:28 AM

• Babygirl.md7565 set Security to Access Request.

• Babygirl.md7565 added subtasks: T117651: Align mediawiki.Uri more with the native URL constructor, T117652: Get rid of mw.units.js in UploadWizard, T117621: Diffusion support for viewing the raw file in the browser, T117580: {{FULLPAGENAME}} on Sidebar shows completely random link, T117547: In the upload dialog, pre-populate date information from the file's EXIF data.Nov 4 2015, 4:31 AM

greg lowered the priority of this task from High to Medium.Nov 4 2015, 4:35 AM

greg changed Security from Access Request to None.

greg removed subtasks: T117547: In the upload dialog, pre-populate date information from the file's EXIF data, T117580: {{FULLPAGENAME}} on Sidebar shows completely random link, T117621: Diffusion support for viewing the raw file in the browser, T117652: Get rid of mw.units.js in UploadWizard, T117651: Align mediawiki.Uri more with the native URL constructor.

Qgil updated the task description. (Show Details)Nov 4 2015, 11:16 AM

Qgil moved this task from Missing expected fields to Missing active discussion on the Wikimedia-Developer-Summit-2016 board.

Qgil moved this task from Missing active discussion to On track on the Wikimedia-Developer-Summit-2016 board.Nov 6 2015, 11:20 AM

• RobLa-WMF mentioned this in T119032: WikiDev 16 working area: Software engineering.Nov 19 2015, 1:02 AM

• RobLa-WMF moved this task from On track to Software engineering on the Wikimedia-Developer-Summit-2016 board.Nov 24 2015, 6:37 AM

Krenair mentioned this in T119593: Define the list of "must have" sessions for WikiDev '16.Dec 1 2015, 5:41 PM

Etherpad: https://etherpad.wikimedia.org/p/MWDS2016-scap3

Notes from session:

Prompt: Talk about work with deploying MW w/ scap or migrating existing repo to use scap3?

Migrating

Similar concepts to Trebechet

config lives in code repo
git based deployment
some assumptions about deployment target
Migration from Trebuchet presents issues currently
- Trebuchet is salt based, flakey, different arch
- Existing Puppet provider exists for Trebuchet
  - Need to write one for scap3

Deployments with scap3

Atomic

checkouts go to new directory
updates single cache git directory and clones locally from that
should be pointed out that scap3 provides both serial and parallel deployment strategies
- configurable for each stage (e.g. fetch in parallel, promote serially)

Checks

health checks are executed after each stage of deployment
- can be commands or ops provided nagios checks

Config deployment

- w/ Trebuchet, code was deployed but not config
- keep templates in your repo
- can execute config deployment independently
- Jinja2 used for templating, and can reference (sensitive) variables that are supplied by ops/puppet
question: have you considered using pupppet for config deploy/templating
- considered but seemed too big of a dependency
- comment: sounds like we're building a config management system
  - sort of, a small piece is essentially config mgt

question: does it support fanout?

that's part of MW implementation

question: support for (de)pooling?

in the works, support for mocking in Beta Cluster

comments regarding pooling/depooling vs proxy/queues for requests until services are restarted

Canary deploys

you can define canary/deployment groups
- have tiered rollout of groups
- bail on the first failure
- rollback
- implemented as general deployment "groups"

RelEng is around to help teams migrate. Ask Greg G or anyone from RelEng. :)

One of the things Trebuchet had was a store for the current state of a deployment

How will scap3 provide this for newly provisioned nodes?
- Being worked on but we need to work out the kinks in the provider

Trebuchet failed because it coulded deploy MW

it would get into a bad state
pull and bad state is going to be a problem with any system
current idea for scap3 fanout is going to be pull based

We've been thinking about different ideas for transport of repo

bittorrent for example
we've been pretty agnostic about our transport implementation
- if we decide our current approach is wrong, we should be able to implement sometihng different
Biggest problem historically has been localization
Have you looked at ?-db (rocksDB?) to replace cdb?
We've gone the direction of git-annex
- that exists in Trebuchet using git-fat
We're looking into ways to make cdb unecessary for l10n cache using straight PHP/HHVM – https://phabricator.wikimedia.org/T99740
- Authoritative mode, not a prereq

BT transport

only one implementation support (bittornado)
you don't know when you're done seeding
could run on the system persistently
we've only played around with it, looks promissing but not the only way to go
doesn't include dot files by default. might have to patch
you might have to tar/untar compress/decompress

We haven't gotten into MW deploys too much

We mainly targetted replacing Trebuchet

offline --

Timo: Strategy needed for failed fetched/checkouts, pull vs. fetch

greg added a project: Deployments.Jan 5 2016, 8:11 PM

@thcipriani: do you want to claim and close this one? I think we can call this a success.

thcipriani closed this task as Resolved.Jan 12 2016, 12:38 AM

Scap3: updates, upgrades, and challengesClosed, ResolvedPublicActions