Page MenuHomePhabricator

Beta puppetmaster cherry-pick process
Open, NormalPublic

Description

In order to increase the maintainability of the beta puppetmaster, and reduce the delta from production puppetmaster we need a process for handling puppet changes that are being tested on beta.

The current process in which deployment-prep admins simply cherry-pick a patch to the puppetmaster is unsustainable. The process leaves uncollected commits littered in beta, or, worse prevents beta from pulling in production updates.

Ideally, we could use this ticket to define a process that is:

  1. More visible than the current process
  2. More accountable than the current process
  3. Adds very little overhead to the current process
  4. Defines a clear set of criteria for when/how a cherry pick can be made on beta puppet master
  5. Defines a clear set of criteria for when/how a cherry pick can be removed without the involvement of patch authors necessarily
  6. Timeboxed so we don't have cherry picks to beta puppetmaster living on for many months.

Related Objects

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 16 2016, 6:47 PM

As of today we have 13 patches cherry picked to beta of various ages by various authors:

thcipriani@deployment-puppetmaster:/var/lib/git/operations/puppet$ git status
On branch production
Your branch is ahead of 'origin/production' by 13 commits.
  (use "git push" to publish your local commits)

nothing to commit, working directory clean

thcipriani@deployment-puppetmaster:/var/lib/git/operations/puppet$ git log --pretty=format:"%h %ad (%ar) %an" --graph --date=short -13
* 969cd78 2016-04-18 (4 weeks ago) Eric Evans
* c54907f 2015-12-14 (5 months ago) Giuseppe Lavagetto
* 542da29 2016-04-21 (4 weeks ago) Mukunda Modell
* 5d47dd9 2016-03-09 (10 weeks ago) Antoine Musso
* 928cf4d 2015-10-28 (7 months ago) Antoine Musso
* 9a6f4f6 2016-03-05 (2 months ago) root
* b0cd9dd 2016-02-11 (3 months ago) Timo Tijhof
* 4fb3fe9 2016-01-27 (4 months ago) Antoine Musso
* 5cde85b 2016-01-07 (4 months ago) Gergő Tisza
* 3eb89da 2016-01-06 (4 months ago) Gergő Tisza
* dd60fca 2015-06-12 (11 months ago) Moritz Mühlenhoff
* 4849f99 2015-10-20 (7 months ago) root
* 5ddacd7 2015-10-07 (7 months ago) Bryan Davis

Can we start by cleaning/reviewing some of these?

One simple solution which seems fairly straightforward would be to switch to using a branch for beta. Admittedly that may only achieve some of the goals set out here.

  1. More visible
    • The list of patches is in a branch in the git repo so everyone can compare it to production
  2. More accountable
    • Git committer would name the person who brought the patch into beta
  3. Little Overhead
    • I don't think maintaining a branch is more difficult than maintaining local cherry picks directly on the puppetmaster node

The remaining goals would need to be addressed in some other way.

bd808 added a comment.May 16 2016, 7:16 PM
$ git show --pretty 5ddacd7
commit 5ddacd7deaca60ce51290454c3344ffb36e70c40
Author: Bryan Davis <bd808@wikimedia.org>
Date:   Wed Oct 7 15:45:48 2015 +0000

    [LOCAL HACK] change l10nupdate cache location

    Point l10nupdate at /srv/ instead of our tiny /var/ partition

diff --git a/modules/scap/files/l10nupdate-1 b/modules/scap/files/l10nupdate-1
index 7b0f6c8..84e516c 100755
--- a/modules/scap/files/l10nupdate-1
+++ b/modules/scap/files/l10nupdate-1
@@ -83,7 +83,7 @@ for i in ${mwVerDbSets[@]}
 do
        mwVerNum=${i%=*}
        mwDbName=${i#*=}
-       tempCacheDir=/var/lib/l10nupdate/caches/cache-"$mwVerNum"
+       tempCacheDir=/srv/l10nupdate/caches/cache-"$mwVerNum"

        if [ ! -d "$tempCacheDir" ]; then
                sudo -u www-data mkdir "$tempCacheDir"

This was needed on deployment-bastion when @Krenair was trying to get l10nupdate setup there and the /var partition was running out of space as a result. It is probably no longer needed by deployment-bastion since /var is not a separate mount and the root volume has 10G of free space.

This was needed on deployment-bastion when @Krenair was trying to get l10nupdate setup there and the /var partition was running out of space as a result. It is probably no longer needed by deployment-bastion since /var is not a separate mount and the root volume has 10G of free space.

Removed. Down to 12 patches.

This is likely one I should have caught when rebuilding deployment-tin. This patch is certainly one that could have several solutions and the long-lived cherry pick seems like the simplest: Using an environment variable in the script seems like a bad solution. Rebuilding deployment-tin is a somewhat sizable blocking task, but is likely the right solution.

* 4849f99 2015-10-20 (7 months ago) root adds self-signed certificate, which was needed to keep https working (ish) in beta until T50501: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org is done
I don't know what * 9a6f4f6 2016-03-05 (2 months ago) root is and it has no name or task. It adds not-already-defined checks around labs_lvm::volume in modules/role/manifests/labs/lvm/mnt.pp and modules/role/manifests/labs/lvm/srv.pp

* b0cd9dd 2016-02-11 (3 months ago) Timo Tijhof - this got merged in https://gerrit.wikimedia.org/r/#/c/270009/ and is now just adding a single line. I got rid of it.

Krenair added a comment.EditedMay 16 2016, 7:57 PM

It now looks like this:

root@deployment-puppetmaster:/var/lib/git/operations/puppet# git log --pretty=format:"%h %ad (%ar) %an - %s" --graph --date=short origin/production..HEAD
* fe0411e 2016-04-18 (4 weeks ago) Eric Evans - [WIP]: Cassandra 2.2.5 config
* a221853 2015-12-14 (5 months ago) Giuseppe Lavagetto - mediawiki: add conftool-specifc credentials and scripts
* df5862d 2016-04-21 (4 weeks ago) Mukunda Modell - Fix race in puppet::self (puppet.conf compilation)
* 97c86f6 2016-03-09 (10 weeks ago) Antoine Musso - hiera_lookup: recognize labs project and site
* 92a931c 2015-10-28 (7 months ago) Antoine Musso - cache: vary statsd_server with hiera
* 32863c1 2016-03-05 (2 months ago) root - wrap labs_lvm::mnt in if (defined) conditional
* b08e6b6 2016-01-27 (4 months ago) Antoine Musso - sysfs: puppet always restarted the sysfsutils service
* 0270bc2 2016-01-07 (4 months ago) Gergő Tisza - logstash: add sentry output plugin
* 6150e71 2016-01-06 (4 months ago) Gergő Tisza - Add output plugin for Sentry
* 1d62344 2015-06-12 (11 months ago) Moritz Mühlenhoff - Prevent access to hidden directories
* 5a264cd 2015-10-20 (7 months ago) root - [LOCAL HACK] Change star.wmflabs.org to beta certificate

Related tasks for those commits:

Krenair added a comment.EditedMay 16 2016, 8:02 PM

We should allow patches to be put on the puppetmaster to test them temporarily (say... for each version up to a week?), and patches to make beta behave more like production indefinitely.

We should allow patches to be put on the puppetmaster to test them temporarily (say... for each version up to a week?), and patches to make beta behave more like production indefinitely.

There definitely seem to be at least 2 classes of patches here that seem to map fairly well to the Beta-Cluster-Infrastructure and Beta-Cluster-reproducible projects.

It's fine, probably, if some of the infrastructure patches are blocked, but that should be documented somewhere (commit message or wikipage).

I would still like non-blocked patches of both types to be timeboxed via some kind of a dead man's switch. That is to say, it'd be nice if all the patches that weren't documented as blocked went away after 2 weeks or a month and if the owner really wanted to keep the cherry-pick or commit around some action would have to be taken.

This task is about coming up with a new process, not just cleaning up the current list of patches.

It's not always so straightforward. Sometimes the cherry picked patches have interdependencies, don't rebase cleanly, etc.

Even worse, people can easily trample eachothers' patches accidentally when doing git rebase -i on the puppetmaster.

As of today we have 13 patches cherry picked to beta of various ages by various authors:

thcipriani@deployment-puppetmaster:/var/lib/git/operations/puppet$ git status
On branch production
Your branch is ahead of 'origin/production' by 13 commits.
  (use "git push" to publish your local commits)

nothing to commit, working directory clean

thcipriani@deployment-puppetmaster:/var/lib/git/operations/puppet$ git log --pretty=format:"%h %ad (%ar) %an" --graph --date=short -13
* 969cd78 2016-04-18 (4 weeks ago) Eric Evans

[ ... ]

Can we start by cleaning/reviewing some of these?

This one should be getting merged RSN (today? tomorrow?). Removing it would downgrade the Cassandra configuration, breaking the cluster (forcing us to wipe and start-over). We could deal with this if it were necessary, but it wouldn't be awesome.

This one should be getting merged RSN (today? tomorrow?). Removing it would downgrade the Cassandra configuration, breaking the cluster (forcing us to wipe and start-over). We could deal with this if it were necessary, but it wouldn't be awesome.

No problem keeping it around. I was just hoping to get a better idea of what's currently cherry-picked and why so we can come up with a process for cherry picks and the maintenance of cherry-picks.

This one should be getting merged RSN (today? tomorrow?). Removing it would downgrade the Cassandra configuration, breaking the cluster (forcing us to wipe and start-over). We could deal with this if it were necessary, but it wouldn't be awesome.

No problem keeping it around. I was just hoping to get a better idea of what's currently cherry-picked and why so we can come up with a process for cherry picks and the maintenance of cherry-picks.

In this case, we entered this sort point-of-no-return the moment we cherry-picked the changeset and proceeded with the Cassandra upgrade; Anything other than leaving it in place until the changeset was merged was always going to mean forcibly downgrading (a risk I accepted at the time).

Any policy/process moving forward that put a hard upper-bound on the amount of time a cherry-picked changeset could be applied would probably run afoul of cases like this one.

Any policy/process moving forward that put a hard upper-bound on the amount of time a cherry-picked changeset could be applied would probably run afoul of cases like this one.

This is one of the critical points we need to pay attention to. It can be slow & difficult to roll out major changes to production. We need to make the transition as painless as possible. At the same time we are trying to make beta-cluster closely match production and remain stable for testing other changes.

I don't think beta cluster can be all things for all people but there is a lot of room for improvement. I think we need to explore the use-cases that beta cluster is expected to support and formalize processes that balance these needs appropriately.

hmm...why does it feel like I just said a whole lot of nothing?

So deployment-prep is a fairly poor simulation of a production wmf cluster and we have a lot of different kinds of testing going on simultaneously:

  • Automated continuous integration tests
    • I believe that this is a primary and critical role that should not be allowed to break due to other types of testing. Correct me if I'm wrong here.
  • Staging mediawiki changes before they go to production
  • Testing puppet changes before they go to production
  • Prototyping large infrastructure changes and new service deployments

Some of these uses are in conflict and we don't currently have any viable alternatives to fill these needs.

@Eevans Playing a maintenance on beta before rolling it on production is a perfectly legitimate use case and the whole point of beta cluster.

Limiting the amount of time a cherry pick or present or arbitrary drop will surely cause havoc. It does already whenever the auto rebase breaks.


My experience

About a better process: as part of my daily grind, almost every morning I would check for the Shinken monitoring ( http://shinken.wmflabs.org/problems ) and ninja fix puppet.git to adjust it for beta or the CI infrastructure (which also have its own puppet master).

Since the problem got fixed, there is little incentive for me to hunt for a reviewer from ops to get it merged. Though when it also fix a production issue I would reach out to the appropriate ops having the area of expertise.

In some other case, the ninja fix needs more work before asking for review. It fix the immediate problem but is a very rough patch that needs polishing. I eventually forget about the task/commit :-/


Lets merge!

Something that has been very successful are the Puppet SWAT windows. They are one hour deployment windows lead by the operations team. The process is straightforward:

From there, ops will have high confident it can land and be pushed to prod. Else they will skip it since Puppet SWAT patches must be straight forward / low impact.

For the couple or so time I have used puppet SWAT, I got batches of changes merged in just a dozen of minutes.

Tgr added a comment.May 18 2016, 10:32 PM
* 5cde85b 2016-01-07 (4 months ago) Gergő Tisza
* 3eb89da 2016-01-06 (4 months ago) Gergő Tisza

Those can be dropped; I added them for testing but haven't had time to work on that project for quite a while now.
Sorry for the noise.

We could easily implement patch expiration in a way which would strike a balance between the need for sanity and the need for longer lived patches:

I propose that the expiration of patches should be controlled by a TTL: field in the commit message. This way the author of the patch can have some control over how long the patch will remain. It could default to a short life if the field isn't present, but honor arbitrarily long TTL times if the commit has a one specified. We could have a maximum TTL or we could support TTL: forever if that seems useful.

mmodell triaged this task as High priority.May 19 2016, 1:00 AM
Krinkle removed a subscriber: Krinkle.May 20 2016, 5:55 PM

We could easily implement patch expiration in a way which would strike a balance between the need for sanity and the need for longer lived patches:

I propose that the expiration of patches should be controlled by a TTL: field in the commit message. This way the author of the patch can have some control over how long the patch will remain. It could default to a short life if the field isn't present, but honor arbitrarily long TTL times if the commit has a one specified. We could have a maximum TTL or we could support TTL: forever if that seems useful.

Cherry pick I am doing are meant to stick around until they are merged on beta. I dont really make short lived / one time cherry picks.

I think it all boils down to the patch author working with Operations to have the patch polished and merged in puppet.git.

Here is the current state on the beta cluster puppetmaster:
git log --format='| %an | %s' HEAD@{upstream}..

Andrew OttoHieraize eventlogging_kafka_handler to allow selection of different kafka clients
Filippo Giunchedipuppetization for thumbor
Brandon Blacksslcert: regenerate dhparam.pem
Filippo Giunchedipuppetization for thumbor
Tyler CiprianiBeta: Add logstash host to scap.cfg
Tyler CiprianiBeta: Scap canary deploy dsh groups
Bryan Davislogstash: Parse nginx access logs for wdqs
Kartik MistryBeta: Fix cxserver restbase_url
amirores: Scap3 deployment configurations
Mukunda ModellFix race in puppet::self (puppet.conf compilation)
Antoine Mussocache: vary statsd_server with hiera
rootwrap labs_lvm::mnt in if (defined) conditional
Gergő Tiszalogstash: add sentry output plugin
Gergő TiszaAdd output plugin for Sentry
thcipriani lowered the priority of this task from High to Normal.Aug 23 2016, 5:01 PM

Lowing priority from high since nothing is happened here in a while.

Here's what's currently cherry picked:

Bryan Davislogstash: Tag Striker messages for indexing
Alex MonkRemove the hard-coded /a/mw-log references scattered around everywhere
Alex Monkdeployment-prep: Move udp2log to deployment-fluorine02
Marko ObrovacPDF Render Service: Role and module
Tyler Ciprianiscap: bump version to 3.2.3-1
Alex Monkbeta: Use Let's Encrypt cert
Tyler CiprianiBeta: Add logstash port
Bryan Davislogstash: Parse nginx access logs for wdqs
Brandon Blacksslcert: regenerate dhparam.pem
Mukunda ModellFix race in puppet::self (puppet.conf compilation)
Antoine Mussocache: vary statsd_server with hiera
Gergő Tiszalogstash: add sentry output plugin
Gergő TiszaAdd output plugin for Sentry
root[LOCAL HACK] wrap labs_lvm::mnt in if (defined) conditional

Most of this stuff seems like it will merge eventually.

Random proposal for moving this task forward, we could create a script that runs on $interval that does:

  1. Ensure every cherry-picked patch has a 'Bug: TXXXXXX'
  2. Check that task:
    • Closed? Remove cherry-pick
    • Doesn't have tag $cherryPickTag: Remove cherry-pick
  3. Poke the task by leaving a message as $systemUser user on $interval: "patch is still cherry picked on beta"

This would remove some of the current cherry picks, but it would make thing more accountable.

bd808 added a comment.Aug 23 2016, 5:58 PM

Here's what's currently cherry picked:

Tyler Ciprianiscap: bump version to 3.2.3-1

merged

Brandon Blacksslcert: regenerate dhparam.pem

already merged upstream, removed.

Gergő TiszaAdd output plugin for Sentry

Was for a completely different repo. removed.

root[LOCAL HACK] wrap labs_lvm::mnt in if (defined) conditional

Does anyone know what this one was needed for? I added the [LOCAL HACK] tag to it today when I found that it did not have a gerrit Change-Id. The commit message unhelpfully does not mention which roles were found to be in conflict for Labs_lvm::Volume['second-local-disk']. Judging by the patch contents it looks like something was applying both role::labs::lvm::srv and role::labs::lvm::mnt on the same host.

mmodell added a comment.EditedAug 25 2016, 12:52 AM

Now we are down to these:

@PcheloloChange-Prop: Enable file transclusion updates
@AlexMonk-WMFRemove the hard-coded /a/mw-log references scattered around everywhere
@mobrovacPDF Render Service: Role and module
@AlexMonk-WMFbeta: Use Let's Encrypt cert
@bd808logstash: Parse nginx access logs for wdqs
@hasharcache: vary statsd_server with hiera
@Tgrlogstash: add sentry output plugin
mmodell added a subscriber: Pchelolo.
@PcheloloChange-Prop: Enable file transclusion updates

Pre-prod testing, should be merged early next week (not going to do it right before the week-end, as it has disruption potential)

@mobrovacPDF Render Service: Role and module

This is a new service that should enter production soon(TM). For the time being, we are testing it in Beta. The associated ticket is T143129: New service request - PDF Render.

Thanks @mobrovac. I think we are in pretty good shape.

@hasharcache: vary statsd_server with hiera

That is for T116898: On beta cluster varnish stats process points to production statsd eg have the Varnnish stats reported to the labs statsd instead of the production one. Still have to change the keyprefix. The patch is https://gerrit.wikimedia.org/r/#/c/249490/

@hasharcache: vary statsd_server with hierahttps://gerrit.wikimedia.org/r/#/c/249490/

Got reviewed, merged and deployed to production.

root@deployment-puppetmaster:/var/lib/git/operations/puppet# git log --oneline origin/production..HEAD
d1d3613 wmflib: Return default dir when role::puppet::self isn't used
9fff4c3 contint: fix resource conflict with service::deploy::common
e8c0199 PDF Render Service: Role and module
5656c25 logstash: Parse nginx access logs for wdqs
0b5a2a4 logstash: add sentry output plugin

The first two are due to T120159 and T143065, the PDF render service one is addressed above

hashar awarded a token.Sep 7 2016, 2:15 PM
  1. Ensure every cherry-picked patch has a 'Bug: TXXXXXX'
  2. Check that task:
    • Closed? Remove cherry-pick
    • Doesn't have tag $cherryPickTag: Remove cherry-pick
  3. Poke the task by leaving a message as $systemUser user on $interval: "patch is still cherry picked on beta"

As I've been working through this and refining I've realized that this proposal would undoubtedly lead to some confusion and frustration. For a first iteration, I would like to instead propose:

  1. Check the change-id of each cherry pick against gerrit. If the status in gerrit is MERGED or ABANDONED, remove the cherry pick.
  2. Check if there is a Bug: TXXXXX in the commit, if there is and there hasn't been any activity on the task in 1 month add a message to the task that there is a cherry pick for this task on beta.

Basically, we'd let folks get used to beta being aware of patches and poking tasks and only remove patches when it's clear that the patch should no longer exist as a cherry-pick on beta.

This is a less aggressive approach than the initial proposal, but I think this is a better balance of the ease-of-use of beta-cluster cherry-picks and beta-cluster maintainers ability to monitor and track changes there.

Created https://phabricator.wikimedia.org/p/beta-puppetmaster/ bot user to comment on tasks. Refining puppet patch to implement a cron that will do what is described above.

Created https://phabricator.wikimedia.org/p/beta-puppetmaster/ bot user to comment on tasks. Refining puppet patch to implement a cron that will do what is described above.

Perhaps it would be also worth creating a similar phab tag that could be added automatically to tasks with the Bug: stanza on cherry-picks?

As I've been working through this and refining I've realized that this proposal would undoubtedly lead to some confusion and frustration. For a first iteration, I would like to instead propose:

  1. Check the change-id of each cherry pick against gerrit. If the status in gerrit is MERGED or ABANDONED, remove the cherry pick.
  2. Check if there is a Bug: TXXXXX in the commit, if there is and there hasn't been any activity on the task in 1 month add a message to the task that there is a cherry pick for this task on beta.

Sounds like a good plan to me! Especially #1 will be very important to keep beta self-maintaining as much as possible.

Re: 2. if a Bug: is missing a fallback might be to comment on the review itself.
Not for the first iteration but I'd argue that removing an unmerged patch that's been sitting in Beta for e.g. a quarter would be good too. Opting-in again by cherry-picking again in beta is easy enough anyways.

Change 310719 had a related patch set uploaded (by Thcipriani):
[WIP] Beta: Clean puppetmaster cherry-picks

https://gerrit.wikimedia.org/r/310719

Fixed up https://gerrit.wikimedia.org/r/310719 based on some review from @Volans.

Additional review is welcome! I think this could merge any time with minimal disruption.

root@deployment-puppetmaster02:/var/lib/git/operations/puppet# git log --oneline origin/production..HEAD
dd8fcf7762 Followup Ia5d07908: Fix sentry's base::service_unit to require correct class
b757ca8375 profile: fix udev reload dependency for swift::storage::labs
1522749ac1 logstash: Parse nginx access logs for wdqs
18f3c24c9f thcipriani: Fix for service dependency loops
9a856bda35 Change $deploy_user home directory to /var/lib/${deploy_user}
a00102fb5e varnish: Add --errorpage-noise trigger for easy testing
68195addd4 Scap: scap_source correct gid
cd7547c9ce swift: use implicit /dev/swift prefix for swift devices
27401356a3 swift: lower replication interval for beta
7df2bf76df Do the echo when running update.php
42a8e6578e Make /entity/ redirect internal
67a22aba41 Add 3d2png deploy repo to image scalers
51806cf9f4 interface: IPAddr.new() requires an address family
72da862e18 Add a git_repo to global scap.cfg
ebb1850387 service: use gzip for logging in uwsgi
78eff1f8f1 [LOCAL] Add cert for etcd in deployment-prep, hiera data for instance
90ae1ea1ff salt: fix grain-ensure comparison
3f30e5eced contint: move from /mnt to /srv
ad04b266e4 hacks to fix puppet --krenair
b1ba161f44 POC: Secure redirect service
06a97e32de deployment-prep: Make LVS config compatible with new requirements
88cb2ef622 [WIP] logstash: send errors to sentry
a7dceac3b8 logstash: parse runJobs messages
greg added a subscriber: greg.EditedOct 6 2017, 9:42 PM

was 23 in krenair's post above, now 18:

gjg@deployment-puppetmaster02:/var/lib/git/operations/puppet$ git log --pretty=format:"%h (%ar) %an - %s" --graph --date=short -13
* 347d03ea6a (11 months ago) Alex Monk - puppetmaster: hacks to fix puppet logstash
* c7acd3ea4c (12 months ago) Alex Monk - deployment-prep: Make LVS config compatible with new requirements
* f691522d23 (9 days ago) Antoine Musso - prometheus: force ferm dns resolution to Ipv4
* a84afe7c2f (10 days ago) Moritz Muehlenhoff - Add support for stretch to hhvm::debug
* 863c77bab3 (7 weeks ago) Alex Monk - Fix mwrepl to require expanddblist dependency, from scap::scripts
* 39fa12377c (7 weeks ago) Alex Monk - Followup Ia5d07908: Fix sentry's base::service_unit to require correct class
* 9bfa6eddcb (1 year, 3 months ago) Bryan Davis - logstash: Parse nginx access logs for wdqs
* 2382d4d826 (2 months ago) Root - thcipriani: Fix for service dependency loops
* 023dd6c75a (5 months ago) Timo Tijhof - varnish: Add --errorpage-noise trigger for easy testing
* 23e7f77f1a (3 months ago) Tyler Cipriani - Scap: scap_source correct gid
* 5cc8f829b8 (3 months ago) Filippo Giunchedi - swift: use implicit /dev/swift prefix for swift devices
* 6feca41055 (7 months ago) Antoine Musso - swift: lower replication interval for beta
* 6d2289298c (8 months ago) Antoine Musso - interface: IPAddr.new() requires an address family
gjg@deployment-puppetmaster02:/var/lib/git/operations/puppet$ git log --oneline origin/production..HEAD | wc -l
18
  • 023dd6c75a (5 months ago) Timo Tijhof - varnish: Add --errorpage-noise trigger for easy testing

I've removed this one from beta cluster's puppetmaster.

root@deployment-puppetmaster03:/var/lib/git/operations/puppet# git log --pretty=format:"%h (%ar) %an - %s" --graph --date=short origin/production..HEAD
* 04256bbf72 (3 days ago) Alex Monk - Attempt to secure Puppet DB better
* e658326428 (3 days ago) Alex Monk - Puppetise simple no-CA class for deployment-dumps-puppetmaster02
* 3b5d916f6d (2 weeks ago) Alex Monk - Central certificates / Secure redir WIP
* 71f564f167 (3 days ago) Alex Monk - cumin: Allow Puppet DB backend to be used within Labs projects that use it
* 91778aa0fe (5 months ago) Alex Monk - swift: Fix checks on drive/filesystem titles to allow for labs ones
* ac20f079e3 (11 months ago) Filippo Giunchedi - swift: use implicit /dev/swift prefix for swift devices
* 0b7d70f7b9 (5 days ago) Timo Tijhof - webperf: Add statsv, navtiming and coal to scap::sources
* 6c5de6eec5 (11 months ago) Tyler Cipriani - Scap: scap_source correct gid
* 1da2ac8835 (10 days ago) Alex Monk - Allow PuppetDB use on standalone puppetmasters
* d322869448 (4 weeks ago) Mukunda Modell - WIP: Add phabricator config for the new swift backend
* fc711b8c3d (4 weeks ago) Mukunda Modell - Add account for phabricator_files to swift::params::accounts
* 9c16df4168 (5 months ago) Root - Hack profile::base::firewall to prevent dupe definition
* ff6ddbe8d9 (7 months ago) Filippo Giunchedi - hieradata: add redis stretch deployment-prep instances
* 6a822dc76b (8 months ago) Antoine Musso - prometheus: force ferm dns resolution to Ipv4
* 2dd3f86432 (1 year, 7 months ago) Alex Monk - puppetmaster: hacks to fix puppet logstash
* 2684f4571f (1 year, 8 months ago) Alex Monk - deployment-prep: Make LVS config compatible with new requirements
* 43a6d4a39c (10 months ago) Alex Monk - Fix mwrepl to require expanddblist dependency, from scap::scripts
* 44cd03158f (11 months ago) Root - thcipriani: Fix for service dependency loops
* 822bc923c4 (1 year, 2 months ago) Antoine Musso - swift: lower replication interval for beta
* ac2e48f579 (1 year, 4 months ago) Antoine Musso - interface: IPAddr.new() requires an address family
* 901afdfa01 (2 years, 5 months ago) Gergő Tisza - [WIP] logstash: send errors to sentry

I wonder if we should start explicitly separating these lists of cherry-picked commits into those which are open in gerrit (i.e., already pending review) vs. those which are not

Krenair added a comment.EditedJun 5 2018, 7:29 PM

I made a very crappy script at deployment-puppetmaster03:/root/make_T135427_table.py containing this:

make_T135427_table.py
#!/usr/bin/python3
import json
import os
import urllib.request
gerrit_req_url = 'https://gerrit.wikimedia.org/r/changes/?q={}'
out = os.popen('git --git-dir=/var/lib/git/operations/puppet/.git log --pretty=format:"%h NOONEWILLUSETHISSEPARATOR %ar NOONEWILLUSETHISSEPARATOR %an NOONEWILLUSETHISSEPARATOR %s NOONEWILLUSETHISSEPARATOR %b NOONEWILLUSETHISLINEBREAK" --date=short origin/production..HEAD').read()
print('<table>')
print('<tr><th>Hash</th><th>Time</th><th>Author</th><th>Subject</th><th>Change URL</th><th>Task</th></tr>')
for commit in out.split('NOONEWILLUSETHISLINEBREAK\n'):
    if not commit:
        continue
    hash, time, author, subject, message = commit.split(' NOONEWILLUSETHISSEPARATOR ')
    change_url = None
    task = None
    for line in message.splitlines():
        if line.startswith('Change-Id:'):
            _, change_id = line.split(':')
            change_id = change_id.strip()
            resp = json.loads(urllib.request.urlopen(gerrit_req_url.format(change_id)).read()[5:-1].decode('utf-8'))
            if not len(resp):
                break
            gerrit_change, = resp
            change_url = 'https://gerrit.wikimedia.org/r/c/{}'.format(gerrit_change['_number'])
            if gerrit_change['status'] != 'NEW':
                change_url += ' - ' + gerrit_change['status']
        elif line.startswith('Bug:'):
            _, task_id = line.split(':')
            task = task_id.strip()
    print('<tr>')
    for item in (hash, time, author, subject, change_url, task):
        print('<td>{}</td>'.format(item))
    print('</tr>')

print('</table>')

It prints this:

HashTimeAuthorSubjectChange URLTask
f14563b86f5 days agoTimo Tijhofwebperf: Add statsv, navtiming and coal to scap::sourceshttps://gerrit.wikimedia.org/r/c/436601T195314
8d16ba10293 days agoAlex MonkAttempt to secure Puppet DB betterNoneNone
b07dd175193 days agoAlex MonkPuppetise simple no-CA class for deployment-dumps-puppetmaster02NoneNone
41f26c4e4d2 weeks agoAlex MonkCentral certificates / Secure redir WIPNoneT194962
e4283876423 days agoAlex Monkcumin: Allow Puppet DB backend to be used within Labs projects that use ithttps://gerrit.wikimedia.org/r/c/437052None
062a446d7b5 months agoAlex Monkswift: Fix checks on drive/filesystem titles to allow for labs oneshttps://gerrit.wikimedia.org/r/c/402758T184236
454dea246e11 months agoFilippo Giunchediswift: use implicit /dev/swift prefix for swift deviceshttps://gerrit.wikimedia.org/r/c/361648T163673
6c5de6eec511 months agoTyler CiprianiScap: scap_source correct gidhttps://gerrit.wikimedia.org/r/c/361796None
1da2ac883510 days agoAlex MonkAllow PuppetDB use on standalone puppetmastershttps://gerrit.wikimedia.org/r/c/435631T194962
d3228694484 weeks agoMukunda ModellWIP: Add phabricator config for the new swift backendhttps://gerrit.wikimedia.org/r/c/432533None
fc711b8c3d4 weeks agoMukunda ModellAdd account for phabricator_files to swift::params::accountshttps://gerrit.wikimedia.org/r/c/432528None
9c16df41685 months agoRootHack profile::base::firewall to prevent dupe definitionNoneNone
ff6ddbe8d97 months agoFilippo Giunchedihieradata: add redis stretch deployment-prep instanceshttps://gerrit.wikimedia.org/r/c/386869T148637
6a822dc76b8 months agoAntoine Mussoprometheus: force ferm dns resolution to Ipv4https://gerrit.wikimedia.org/r/c/381073 - ABANDONEDT176314
2dd3f864321 year, 7 months agoAlex Monkpuppetmaster: hacks to fix puppet logstashNoneNone
2684f4571f1 year, 8 months agoAlex Monkdeployment-prep: Make LVS config compatible with new requirementshttps://gerrit.wikimedia.org/r/c/316512None
43a6d4a39c10 months agoAlex MonkFix mwrepl to require expanddblist dependency, from scap::scriptshttps://gerrit.wikimedia.org/r/c/372764None
44cd03158f11 months agoRootthcipriani: Fix for service dependency loopsNoneT171173
822bc923c41 year, 2 months agoAntoine Mussoswift: lower replication interval for betahttps://gerrit.wikimedia.org/r/c/344387T160990
ac2e48f5791 year, 4 months agoAntoine Mussointerface: IPAddr.new() requires an address familyhttps://gerrit.wikimedia.org/r/c/336840 - ABANDONEDNone
901afdfa012 years, 5 months agoGergő Tisza[WIP] logstash: send errors to sentryhttps://gerrit.wikimedia.org/r/c/263024T85239

@Krenair: I could probably make a conduit endpoint to receive that table and update the task from inside a cron job.

@Krenair: I could probably make a conduit endpoint to receive that table and update the task from inside a cron job.

I'm not sure if we want to go to quite that much effort, I certainly didn't in the above script, but wouldn't the existing maniphest.edit endpoint suffice if you really wanted to?

@Krenair: Good point, that probably would do it.

Krenair added a comment.EditedJun 6 2018, 12:03 AM

Anyway just to be more unhelpful I'm going to take that automatically generated table and add helpful manual comments to it:

HashTimeAuthorSubjectChange URLTaskComments
f14563b86f5 days agoTimo Tijhofwebperf: Add statsv, navtiming and coal to scap::sourceshttps://gerrit.wikimedia.org/r/c/436601T195314puppet/deployment development
8d16ba10293 days agoAlex MonkAttempt to secure Puppet DB betterNoneNonepuppet development, could be removed safely
b07dd175193 days agoAlex MonkPuppetise simple no-CA class for deployment-dumps-puppetmaster02NoneNonebeta-specific puppet infra thing to try to make -snapshot01 less of an edge case. I basically shifted the problem a bit - might be able to get rid of this, not sure
41f26c4e4d2 weeks agoAlex MonkCentral certificates / Secure redir WIPNoneT194962puppet/infrastructure development for something primarily being developed in deployment-prep intending to go to prod later. not yet ready for review. could theoretically be worked on outside beta
e4283876423 days agoAlex Monkcumin: Allow Puppet DB backend to be used within Labs projects that use ithttps://gerrit.wikimedia.org/r/c/437052Nonepuppet/infrastructure development, could be removed safely (though this would be sad). get ops to review
062a446d7b5 months agoAlex Monkswift: Fix checks on drive/filesystem titles to allow for labs oneshttps://gerrit.wikimedia.org/r/c/402758T184236unbreaks puppet on ms-be boxes. might be easier to get through review if we could get rid of the dependency (see below)
454dea246e11 months agoFilippo Giunchediswift: use implicit /dev/swift prefix for swift deviceshttps://gerrit.wikimedia.org/r/c/361648T163673dependency of commit that unbreaks puppet on ms-be boxes. I forget why this one was involved, I think maybe it used to be a cherry-pick while in development and then our puppet fix above had to depend on it. Was authored by ops so it's not a usual candidate for becoming stuck in gerrit
6c5de6eec511 months agoTyler CiprianiScap: scap_source correct gidhttps://gerrit.wikimedia.org/r/c/361796Nonecan't remember what this was
1da2ac883510 days agoAlex MonkAllow PuppetDB use on standalone puppetmastershttps://gerrit.wikimedia.org/r/c/435631T194962dependency of central certificates service commit above, brings beta closer to prod - ask ops to review
d3228694484 weeks agoMukunda ModellWIP: Add phabricator config for the new swift backendhttps://gerrit.wikimedia.org/r/c/432533Nonedunno
fc711b8c3d4 weeks agoMukunda ModellAdd account for phabricator_files to swift::params::accountshttps://gerrit.wikimedia.org/r/c/432528Nonepresumably dependency of the above
9c16df41685 months agoRootHack profile::base::firewall to prevent dupe definitionNoneNonesounds like this is to fix puppet issues, though it seems more like a workaround than an actual fix - make a task to investigate killing this?
ff6ddbe8d97 months agoFilippo Giunchedihieradata: add redis stretch deployment-prep instanceshttps://gerrit.wikimedia.org/r/c/386869T148637ticket was closed, patch was made by ops so not a usual candidate to get stuck in gerrit - patch looks like a very simple deployment-prep-only change anyway, should be (relatively) easy to get reviewed
6a822dc76b8 months agoAntoine Mussoprometheus: force ferm dns resolution to Ipv4https://gerrit.wikimedia.org/r/c/381073 - ABANDONEDT176314ticket is (well, was) a tangent to the problem being worked around, complicated problem seemingly deep within perl's Net::DNS library, see T153468. this particular patch is -2'd in Gerrit, it's really more of a bad workaround
2dd3f864321 year, 7 months agoAlex Monkpuppetmaster: hacks to fix puppet logstashNoneNoneI should probably re-review this and figure out why prod wasn't also having problems with this
2684f4571f1 year, 8 months agoAlex Monkdeployment-prep: Make LVS config compatible with new requirementshttps://gerrit.wikimedia.org/r/c/316512NoneI think this is just to stop puppet problems in beta relating to services that would normally be behind LVS in prod (we can't use LVS due to nova-network's source+destination checking on IP packets IIRC)
43a6d4a39c10 months agoAlex MonkFix mwrepl to require expanddblist dependency, from scap::scriptshttps://gerrit.wikimedia.org/r/c/372764NoneMakes a dependency that already exists explicit in puppet, unfortunately Jenkins doesn't like this anymore as they added new rules. I found Ariel had also cherry-picked this to deployment-dumps-puppetmaster
44cd03158f11 months agoRootthcipriani: Fix for service dependency loopsNoneT171173fixed puppet error in beta, might be possible to make beta more like prod instead, asking on ticket
822bc923c41 year, 2 months agoAntoine Mussoswift: lower replication interval for betahttps://gerrit.wikimedia.org/r/c/344387T160990Ticket may have been incorrectly closed if it depended on this patch being in place as it was only a cherry-pick. If it didn't depend on this commit we should be able to remove the commit from our cherry-picks and abandon the patch in gerrit
ac2e48f5791 year, 4 months agoAntoine Mussointerface: IPAddr.new() requires an address familyhttps://gerrit.wikimedia.org/r/c/336840 - ABANDONEDNoneI think this was only in beta for testing that it wouldn't break existing installs as no beta machine appears to have the version of ruby that was apparently problematic here
901afdfa012 years, 5 months agoGergő Tisza[WIP] logstash: send errors to sentryhttps://gerrit.wikimedia.org/r/c/263024T85239puppet/infrastructure development for something probably intending to go to prod later
Krinkle added a comment.EditedJul 7 2018, 12:03 AM

I've gone through and hash-tagged the Gerrit patches with beta-picked and restored any that were marked as abandoned:

https://gerrit.wikimedia.org/r/#/q/hashtag:beta-picked+is:open

I'm hoping this'll be an easier way to maintain.

It should also provide better visibility to SRE when interacting with such patch. For example, when abandoned such patch, the hash-tag helps remember that they need to be unpicked in Beta. And that if such patch is amended/merged, to check Beta's puppetmaster afterwards for rebase conflicts and to resolve those as needed.

Krenair added a comment.EditedAug 31 2018, 11:51 PM

17:

HashTimeAuthorSubjectChange URLTaskComments
f3fe7d568b4 months agoAlex Monk[WIP] Central certificates servicehttps://gerrit.wikimedia.org/r/c/441991T194962puppet/infrastructure development for something being tested in deployment-prep intending to go to prod later. not yet ready for review. could theoretically be worked on outside beta
d8c78954912 weeks agoGiuseppe Lavagettomediawiki: move php to a profile, use the php classhttps://gerrit.wikimedia.org/r/c/453093T201140
b6ee8bc2693 months agoAlex Monkcumin: Allow Puppet DB backend to be used within Labs projects that use ithttps://gerrit.wikimedia.org/r/c/437052Nonepuppet/infrastructure development, could be removed safely (though this would be sad). get ops to review
ba09e14a2d7 weeks agoMoritz MuehlenhoffMove declaration of diamond package out of diamond classhttps://gerrit.wikimedia.org/r/c/446242T183454
44b7c0fd373 months agoGilles DubucServe WebP variants for the hottest thumbnailshttps://gerrit.wikimedia.org/r/c/434055T27611
66c14a774a8 weeks agoroot[LOCAL HACK] tls certs for deployment-elastic*NoneNone
c33430ddc23 months agoAlex MonkRe-combine labs and production exim minimal confighttps://gerrit.wikimedia.org/r/c/439774NoneThis makes our wiki mail run through deployment-mx02 instead of the production MXes
876e58014b3 months agoAlex MonkAttempt to secure Puppet DB betterNoneNonepuppet development, could be removed safely
c6628d36503 months agoAlex MonkPuppetise simple no-CA class for deployment-dumps-puppetmaster02NoneNonebeta-specific puppet infra thing to try to make -snapshot01 less of an edge case. I basically shifted the problem a bit - might be able to get rid of this, not sure
837baf92c88 months agoAlex Monkswift: Fix checks on drive/filesystem titles to allow for labs oneshttps://gerrit.wikimedia.org/r/c/402758T184236unbreaks puppet on ms-be boxes. might be easier to get through review if we could get rid of the dependency (see below)
51ce87290d1 year, 2 months agoFilippo Giunchediswift: use implicit /dev/swift prefix for swift deviceshttps://gerrit.wikimedia.org/r/c/361648T163673dependency of commit that unbreaks puppet on ms-be boxes. I forget why this one was involved, I think maybe it used to be a cherry-pick while in development and then our puppet fix above had to depend on it. Was authored by ops so it's not a usual candidate for becoming stuck in gerrit
14cb98815d1 year, 2 months agoTyler CiprianiScap: scap_source correct gidhttps://gerrit.wikimedia.org/r/c/361796None
23aa019c794 months agoMukunda ModellAdd account for phabricator_files to swift::params::accountshttps://gerrit.wikimedia.org/r/c/432528None
736484f7908 months agoRootHack profile::base::firewall to prevent dupe definitionNoneNonesounds like this is to fix puppet issues, though it seems more like a workaround than an actual fix - make a task to investigate killing this?
2782db370d11 months agoAntoine Mussoprometheus: force ferm dns resolution to Ipv4https://gerrit.wikimedia.org/r/c/381073T176314ticket is (well, was) a tangent to the problem being worked around, complicated problem seemingly deep within perl's Net::DNS library, see T153468. this particular patch is -2'd in Gerrit, it's really more of a bad workaround
6c883f71da1 year, 5 months agoAntoine Mussoswift: lower replication interval for betahttps://gerrit.wikimedia.org/r/c/344387T160990Ticket may have been incorrectly closed if it depended on this patch being in place as it was only a cherry-pick. If it didn't depend on this commit we should be able to remove the commit from our cherry-picks and abandon the patch in gerrit
ac40281c792 years, 8 months agoGergő Tisza[WIP] logstash: send errors to sentryhttps://gerrit.wikimedia.org/r/c/263024T85239puppet/infrastructure development for something possibly intending to go to prod later?