dduvall (Dan Duvall)
Automation Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.
User Since
Oct 7 2014, 4:24 PM (123 w, 6 d)
Availability
Available
IRC Nick
marxarelli
LDAP User
Dduvall
MediaWiki User
DDuvall (WMF)

Recent Activity

Thu, Feb 16

dduvall abandoned D418: Use USER environment variable for default ssh user.
Thu, Feb 16, 10:21 PM · Release-Engineering-Team

Tue, Feb 14

dduvall closed D558: Fix `failure_rate` percentages by committing rMSCA292d89e6da9a: Fix `failure_rate` percentages.
Tue, Feb 14, 12:41 AM · Release-Engineering-Team
dduvall committed rMSCA292d89e6da9a: Fix `failure_rate` percentages (authored by dduvall).
Fix `failure_rate` percentages
Tue, Feb 14, 12:41 AM
dduvall committed rMSCAcc58ae9f2891: Exclude empty deploy groups (authored by dduvall).
Exclude empty deploy groups
Tue, Feb 14, 12:41 AM
dduvall closed T157136: Error after "Finished deploy": <ValueError> xrange() arg 3 must not be zero as "Resolved" by committing rMSCAcc58ae9f2891: Exclude empty deploy groups.
Tue, Feb 14, 12:41 AM · Revision-Scoring-As-A-Service, ORES, Deployment-Systems, Release-Engineering-Team, scap2, Scap
dduvall closed D566: Exclude empty deploy groups by committing rMSCAcc58ae9f2891: Exclude empty deploy groups.
Tue, Feb 14, 12:41 AM · Release-Engineering-Team

Mon, Feb 13

dduvall added a comment to D561: Adds check to prevent xrange related crash. Fixes T157136.

@Halfak: The Deployment Cabal discussed this a bit more in our meeting this morning and decided that disallowing/excluding empty groups may indeed be a better long-term fix since empty input could lead to further unexpected behavior if left unvalidated.

Mon, Feb 13, 9:30 PM · Release-Engineering-Team
dduvall created D566: Exclude empty deploy groups.
Mon, Feb 13, 9:09 PM · Release-Engineering-Team
dduvall added a revision to T157136: Error after "Finished deploy": <ValueError> xrange() arg 3 must not be zero: D566: Exclude empty deploy groups.
Mon, Feb 13, 9:09 PM · Revision-Scoring-As-A-Service, ORES, Deployment-Systems, Release-Engineering-Team, scap2, Scap

Thu, Feb 9

dduvall resigned from D561: Adds check to prevent xrange related crash. Fixes T157136.
Thu, Feb 9, 8:45 PM · Release-Engineering-Team
dduvall added a comment to D561: Adds check to prevent xrange related crash. Fixes T157136.
In D561#11106, @dduvall wrote:
In D561#11102, @demon wrote:
In D561#11085, @Halfak wrote:

Not sure if we'd prefer this solution or to use max(self.size, 1) and skip the conditional.

I think this will be better, because it means we'll be doing an extra useless stage that we've already done. I'd like @thcipriani's input here though.

Perhaps put that in the constructor. A non-zero group size may cause other issues down the line and it would be best to sanitize the input early.

Thu, Feb 9, 8:23 PM · Release-Engineering-Team
dduvall added a comment to D561: Adds check to prevent xrange related crash. Fixes T157136.
In D561#11102, @demon wrote:
In D561#11085, @Halfak wrote:

Not sure if we'd prefer this solution or to use max(self.size, 1) and skip the conditional.

I think this will be better, because it means we'll be doing an extra useless stage that we've already done. I'd like @thcipriani's input here though.

Thu, Feb 9, 8:04 PM · Release-Engineering-Team
dduvall added a task to D561: Adds check to prevent xrange related crash. Fixes T157136: T123: Turn on "diffusion.allow-http-auth".

Oh and one little nit about the commit message: Please use the "Refs T123" or "Fixes T123" auto-close syntax to reference tasks. See https://secure.phabricator.com/book/phabricator/article/diffusion_autoclose/

Thu, Feb 9, 8:02 PM · Release-Engineering-Team
dduvall added a revision to T123: Turn on "diffusion.allow-http-auth": D561: Adds check to prevent xrange related crash. Fixes T157136.
Thu, Feb 9, 8:02 PM · Gerrit-Migration, Wikimedia Phabricator RfC
dduvall requested changes to D561: Adds check to prevent xrange related crash. Fixes T157136.

To clarify: I think your changes are beneficial and should stay as well. :) Can you please add a unit test to tests/scap/test_targets.py to verify the edge case? Thanks!

Thu, Feb 9, 7:54 PM · Release-Engineering-Team
dduvall added a comment to D561: Adds check to prevent xrange related crash. Fixes T157136.

I think the correct fix here would be to ensure a non-zero group size in the first place. Any idea how that may have happened? I suspect a percentage failure_rate that was converted to an integer and floored to 0 perhaps...

Thu, Feb 9, 7:52 PM · Release-Engineering-Team

Wed, Feb 8

dduvall created D558: Fix `failure_rate` percentages.
Wed, Feb 8, 9:06 PM · Release-Engineering-Team
dduvall committed rMSCAbf6443a7ba1b: Fix regression of deploy group continue prompt (authored by dduvall).
Fix regression of deploy group continue prompt
Wed, Feb 8, 8:41 PM
dduvall closed T156839: Saying yes (y) continues to all groups as "Resolved" by committing rMSCAbf6443a7ba1b: Fix regression of deploy group continue prompt.
Wed, Feb 8, 8:41 PM · Scap, Parsoid
dduvall closed D555: Fix regression of deploy group continue prompt by committing rMSCAbf6443a7ba1b: Fix regression of deploy group continue prompt.
Wed, Feb 8, 8:40 PM · Release-Engineering-Team
dduvall added a comment to D555: Fix regression of deploy group continue prompt.
  • I can't remember what we decided about failure percentage for groups where group_size is defined. In D490#10152 you said:
In D490#10152, @dduvall wrote:
  • Stage execution methods now considers the failure_limit for the entire original group, not individual subgroups

but that doesn't seem to be the case when you use a string like '50%' because of the line: https://github.com/wikimedia/scap/blob/master/scap/targets.py#L257

I may file a task for that 2nd one...

Wed, Feb 8, 8:40 PM · Release-Engineering-Team

Tue, Feb 7

dduvall updated the diff for D555: Fix regression of deploy group continue prompt.

An additional follow up to make the prompt conditional more clear

Tue, Feb 7, 7:29 PM · Release-Engineering-Team
dduvall updated the diff for D555: Fix regression of deploy group continue prompt.

Implemented suggested re-prompting following bad prompt input

Tue, Feb 7, 7:19 PM · Release-Engineering-Team
dduvall created D555: Fix regression of deploy group continue prompt.
Tue, Feb 7, 2:23 AM · Release-Engineering-Team
dduvall added a revision to T156839: Saying yes (y) continues to all groups: D555: Fix regression of deploy group continue prompt.
Tue, Feb 7, 2:23 AM · Scap, Parsoid

Mon, Feb 6

dduvall added a comment to T156839: Saying yes (y) continues to all groups.

Looks like I caused the regression rMSCA98247477db5d: Improve failure handling and rollback behavior and called it out in the commit message as a "behavior change". :)

Mon, Feb 6, 7:42 PM · Scap, Parsoid
dduvall claimed T156839: Saying yes (y) continues to all groups.
Mon, Feb 6, 6:15 PM · Scap, Parsoid

Wed, Jan 25

dduvall edited P4809 Masterwork From Distant Lands.
Wed, Jan 25, 11:24 PM

Jan 19 2017

dduvall archived P4770 Masterwork From Distant Lands.
Jan 19 2017, 11:38 PM
dduvall edited P4770 Masterwork From Distant Lands.
Jan 19 2017, 11:38 PM

Jan 10 2017

dduvall requested changes to D517: Unconditionally provision MW for testing.

Your changes look good but provision fails on account of checkoutMediaWiki not existing any longer:

Jan 10 2017, 6:43 PM · Release-Engineering-Team

Jan 4 2017

dduvall added a comment to T154612: Add DEPLOY_DIR env var to scap command checks.

Should we just make it the cwd during check execution?

Jan 4 2017, 9:05 PM · Scap

Dec 23 2016

dduvall committed rMSCA98247477db5d: Improve failure handling and rollback behavior (authored by dduvall).
Improve failure handling and rollback behavior
Dec 23 2016, 1:17 AM
dduvall closed T145512: Allow failures for a percentage of targets as "Resolved" by committing rMSCA98247477db5d: Improve failure handling and rollback behavior.
Dec 23 2016, 1:17 AM · User-mobrovac, Parsoid, Services, Scap
dduvall closed T145460: Rollback failed when target is down as "Resolved" by committing rMSCA98247477db5d: Improve failure handling and rollback behavior.
Dec 23 2016, 1:17 AM · Scap, Parsoid
dduvall closed D490: Improve failure handling and rollback behavior by committing rMSCA98247477db5d: Improve failure handling and rollback behavior.
Dec 23 2016, 1:17 AM · Release-Engineering-Team
dduvall closed T149008: Canary doesn't rollback if you don't continue as "Resolved" by committing rMSCA98247477db5d: Improve failure handling and rollback behavior.
Dec 23 2016, 1:17 AM · Scap, Parsoid
dduvall updated the diff for D490: Improve failure handling and rollback behavior.

Updating Diff summary to be consistent with most recent commit message. I'm probably doing this wrong! So weird.

Dec 23 2016, 1:14 AM · Release-Engineering-Team
dduvall added inline comments to D490: Improve failure handling and rollback behavior.
Dec 23 2016, 1:11 AM · Release-Engineering-Team
dduvall updated the diff for D490: Improve failure handling and rollback behavior.

Replace use of utils.ask with utils.confirm

Dec 23 2016, 1:08 AM · Release-Engineering-Team
dduvall committed rMSCAec8699367233: Rename test directory so nosetests will read it (authored by dduvall).
Rename test directory so nosetests will read it
Dec 23 2016, 12:56 AM
dduvall closed D523: Rename test directory so nosetests will read it by committing rMSCAec8699367233: Rename test directory so nosetests will read it.
Dec 23 2016, 12:56 AM · Release-Engineering-Team
dduvall updated the diff for D523: Rename test directory so nosetests will read it.

Edited Diff summary since it's not done automatically when you amend a commit message. :/

Dec 23 2016, 12:53 AM · Release-Engineering-Team

Dec 22 2016

dduvall committed rMSCA524ce660d613: Use `pip wheel` to manage CI pip cache (authored by dduvall).
Use `pip wheel` to manage CI pip cache
Dec 22 2016, 6:24 PM
dduvall closed D508: Use `pip wheel` to manage CI pip cache by committing rMSCA524ce660d613: Use `pip wheel` to manage CI pip cache.
Dec 22 2016, 6:23 PM · Release-Engineering-Team
dduvall updated the diff for D523: Rename test directory so nosetests will read it.

Implemented @hashar's recommended solution which is to simply add __init__.py files.

Dec 22 2016, 1:07 AM · Release-Engineering-Team
dduvall added a comment to D523: Rename test directory so nosetests will read it.
In D523#10364, @hashar wrote:

Thanks for the split! So the reason the tests under tests/scap is that it the paths are not python modules (they lack a __init__.py file).

TLDR:

touch tests/__init__.py
touch tests/scap/__init__.py`
Dec 22 2016, 12:58 AM · Release-Engineering-Team

Dec 21 2016

dduvall retitled D523: Rename test directory so nosetests will read it from to Rename test directory so nosetests will read it.
Dec 21 2016, 11:56 PM · Release-Engineering-Team
dduvall updated the diff for D508: Use `pip wheel` to manage CI pip cache.

Moving commit for rename of test directory to a separate Diff.

Dec 21 2016, 11:28 PM · Release-Engineering-Team
dduvall added a comment to D508: Use `pip wheel` to manage CI pip cache.
In D508#10330, @hashar wrote:

The rename of tests/scap to tests/scap_test is unrelated to the wheel/cache management. I would rather see that in a different differential. I am also wondering why nosetests would not find the tests, the default discovery filter is (?:^|[\b_\./-])[Tt]est which would match the test_*.py files

Dec 21 2016, 11:21 PM · Release-Engineering-Team

Dec 20 2016

dduvall updated the diff for D490: Improve failure handling and rollback behavior.

Yet another refactoring, simplification of rollback invocation using existing _execute_for_group method, and moved finalize stage to execute following primary stages for all groups which ensures rollback can actually function as expected.

Dec 20 2016, 10:53 PM · Release-Engineering-Team

Dec 15 2016

hashar awarded T153363: Spike: Evaluate containerized CI builds using Kubernetes a Barnstar token.
Dec 15 2016, 10:55 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall edited the description of T153363: Spike: Evaluate containerized CI builds using Kubernetes.
Dec 15 2016, 10:00 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall updated subscribers of T153363: Spike: Evaluate containerized CI builds using Kubernetes.
Dec 15 2016, 9:59 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall edited the description of T153363: Spike: Evaluate containerized CI builds using Kubernetes.
Dec 15 2016, 9:56 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall edited the description of T153363: Spike: Evaluate containerized CI builds using Kubernetes.
Dec 15 2016, 9:53 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall created T153363: Spike: Evaluate containerized CI builds using Kubernetes.
Dec 15 2016, 9:47 PM · Release Pipeline, Release-Engineering-Team, Continuous-Integration-Infrastructure
dduvall updated subscribers of T150501: Spike: Evaluate experimental Docker based CI w/ scap builds.

From our notes for the 12/8 meeting with Ops regarding the direction of CI:

Dec 15 2016, 8:47 PM · Continuous-Integration-Infrastructure
dduvall closed T150504: Define generic job that runs unit tests within a Docker container as "Resolved".
Dec 15 2016, 8:45 PM · Continuous-Integration-Infrastructure
dduvall closed T150504: Define generic job that runs unit tests within a Docker container, a subtask of T150501: Spike: Evaluate experimental Docker based CI w/ scap builds, as "Resolved".
Dec 15 2016, 8:45 PM · Continuous-Integration-Infrastructure
dduvall requested review of D490: Improve failure handling and rollback behavior.

Requesting re-review after some minor changes.

Dec 15 2016, 7:11 PM · Release-Engineering-Team
dduvall updated the diff for D490: Improve failure handling and rollback behavior.

Fixed a small bug in the rollback message for when failure_limit is exceeded

Dec 15 2016, 12:31 AM · Release-Engineering-Team

Dec 14 2016

mmodell awarded D490: Improve failure handling and rollback behavior a Cookie token.
Dec 14 2016, 11:26 PM · Release-Engineering-Team
dduvall accepted D513: Use python2/3 octals, not just python2.
Dec 14 2016, 11:16 PM · Release-Engineering-Team
dduvall updated the diff for D490: Improve failure handling and rollback behavior.
  • Refactored targets.TargetsList to no longer split groups but return first-class instances of a new targets.DeployGroup class
  • Stage execution methods now considers the failure_limit for the entire original group, not individual subgroups
  • Some additional behavior changed slightly as a result of this refactoring (see the commit message)
Dec 14 2016, 11:11 PM · Release-Engineering-Team
dduvall accepted D511: Fix sha1 regex.
Dec 14 2016, 8:28 PM · Release-Engineering-Team

Dec 13 2016

dduvall planned changes to D490: Improve failure handling and rollback behavior.
Dec 13 2016, 10:57 PM · Release-Engineering-Team
dduvall added inline comments to D490: Improve failure handling and rollback behavior.
Dec 13 2016, 10:57 PM · Release-Engineering-Team

Dec 12 2016

dduvall updated the diff for D508: Use `pip wheel` to manage CI pip cache.
  • Rename test directory so nosetests will read it
Dec 12 2016, 11:41 PM · Release-Engineering-Team
dduvall retitled D508: Use `pip wheel` to manage CI pip cache from to Use `pip wheel` to manage CI pip cache.
Dec 12 2016, 11:34 PM · Release-Engineering-Team

Dec 7 2016

dduvall closed D501: Invoke tox without --sitepackages to avoid warnings by committing rMSCA84808d22f77c: Invoke tox without --sitepackages to avoid warnings.
Dec 7 2016, 8:11 PM · Release-Engineering-Team
dduvall committed rMSCA84808d22f77c: Invoke tox without --sitepackages to avoid warnings (authored by dduvall).
Invoke tox without --sitepackages to avoid warnings
Dec 7 2016, 8:11 PM
dduvall retitled D501: Invoke tox without --sitepackages to avoid warnings from to Invoke tox without --sitepackages to avoid warnings.
Dec 7 2016, 7:52 PM · Release-Engineering-Team
dduvall closed D481: Added Dockerfile.ci to test Docker based CI by committing rMSCAc65beaaa9ebf: Added Dockerfile.ci to test Docker based CI.
Dec 7 2016, 7:38 PM · Release-Engineering-Team
dduvall committed rMSCAc65beaaa9ebf: Added Dockerfile.ci to test Docker based CI (authored by dduvall).
Added Dockerfile.ci to test Docker based CI
Dec 7 2016, 7:38 PM
dduvall updated the diff for D481: Added Dockerfile.ci to test Docker based CI.

Install newer pip using setuptools

Dec 7 2016, 7:22 PM · Release-Engineering-Team

Dec 5 2016

dduvall added inline comments to D490: Improve failure handling and rollback behavior.
Dec 5 2016, 6:27 PM · Release-Engineering-Team

Dec 2 2016

dduvall updated the diff for D490: Improve failure handling and rollback behavior.

Stage execution now correctly skips failed targets and rollback executes over all reachable target hosts

Dec 2 2016, 11:28 PM · Release-Engineering-Team
dduvall planned changes to D490: Improve failure handling and rollback behavior.

In testing this locally, I noticed a new issue with the failure_limit functionality. If a target fails during one stage and the threshold for failure isn't reached, the next stage will execute on said target. This is wrong obviously, but how to proceed? There are two options I can think of.

Dec 2 2016, 7:53 PM · Release-Engineering-Team
dduvall edited reviewers for D481: Added Dockerfile.ci to test Docker based CI, added: thcipriani, demon; removed: JakeTheDeveloper.
Dec 2 2016, 5:20 PM · Release-Engineering-Team
dduvall added inline comments to D490: Improve failure handling and rollback behavior.
Dec 2 2016, 4:34 AM · Release-Engineering-Team
dduvall retitled D490: Improve failure handling and rollback behavior from to Improve failure handling and rollback behavior.
Dec 2 2016, 4:26 AM · Release-Engineering-Team
dduvall added a revision to T145460: Rollback failed when target is down: D490: Improve failure handling and rollback behavior.
Dec 2 2016, 4:25 AM · Scap, Parsoid
dduvall added a revision to T145512: Allow failures for a percentage of targets: D490: Improve failure handling and rollback behavior.
Dec 2 2016, 4:25 AM · User-mobrovac, Parsoid, Services, Scap
dduvall added a revision to T149008: Canary doesn't rollback if you don't continue: D490: Improve failure handling and rollback behavior.
Dec 2 2016, 4:25 AM · Scap, Parsoid

Nov 30 2016

dduvall closed T149012: Scap rollback fails after promote completes as "Resolved".

Implemented in D439: Perform final deploy operations as formal stage

Nov 30 2016, 8:14 PM · Scap, Parsoid
dduvall added inline comments to D481: Added Dockerfile.ci to test Docker based CI.
Nov 30 2016, 12:45 AM · Release-Engineering-Team
dduvall updated the diff for D481: Added Dockerfile.ci to test Docker based CI.

Added reviewers

Nov 30 2016, 12:39 AM · Release-Engineering-Team
dduvall retitled D481: Added Dockerfile.ci to test Docker based CI from to Added Dockerfile.ci to test Docker based CI.
Nov 30 2016, 12:34 AM · Release-Engineering-Team

Nov 28 2016

dduvall updated the diff for D455: Testing Docker based CI.

Final working Dockerfile

Nov 28 2016, 7:53 PM · Release-Engineering-Team
dduvall closed T150505: Install and configure Jenkins Docker plugin, a subtask of T150501: Spike: Evaluate experimental Docker based CI w/ scap builds, as "Declined".
Nov 28 2016, 6:33 PM · Continuous-Integration-Infrastructure
dduvall closed T150505: Install and configure Jenkins Docker plugin as "Declined".

The plugin ended up providing very little value over a small shell script, and had some intractable issues related to shared volume and effective uid/gid permissions. See {T150504#2812971}.

Nov 28 2016, 6:33 PM · Continuous-Integration-Infrastructure

Nov 22 2016

dduvall added a comment to T150504: Define generic job that runs unit tests within a Docker container.

First successful runs:
https://integration.wikimedia.org/ci/job/differential-docker-test/27/console – run with complete rebuild including download of base image ~ 135 seconds of overhead
https://integration.wikimedia.org/ci/job/differential-docker-test/25/console – run with image build on cached base image ~ 83 seconds of overhead
https://integration.wikimedia.org/ci/job/differential-docker-test/26/console – run with fully cached image (includes npm dependencies) ~ 6 seconds of overhead

Nov 22 2016, 2:31 AM · Continuous-Integration-Infrastructure
dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 22 2016, 1:43 AM · Release-Engineering-Team
dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 22 2016, 1:40 AM · Release-Engineering-Team

Nov 21 2016

dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 21 2016, 10:49 PM · Release-Engineering-Team
dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 21 2016, 9:50 PM · Release-Engineering-Team

Nov 19 2016

mmodell awarded D455: Testing Docker based CI a Love token.
Nov 19 2016, 4:12 AM · Release-Engineering-Team

Nov 17 2016

dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 17 2016, 11:20 PM · Release-Engineering-Team
dduvall updated the diff for D455: Testing Docker based CI.

test

Nov 17 2016, 2:00 AM · Release-Engineering-Team