I applied a new cherry-pick for logstash testing and found when I tried to run puppet on deployment-logstash1 that a role that should have been available via a cherry-pick was missing. When I looked more closely at the git log on deployment-salt I found that only my new cherry-pick and the two venerable [LOCAL HACK] patches were there on top of the operations/puppet.git production branch.
Description
Related Objects
Event Timeline
Looking at git reflog this may have be caused by someone fixing a merge conflict between one of the [LOCAL HACK] patches and the upstream. I think these are the only patches that were dropped:
f51e807 HEAD@{118}: pull: Configure Logstash and Elasticsearch for ApiFeatureUsage 311e748 HEAD@{119}: pull: Change eventlogging log dir c0e32ea HEAD@{120}: pull: Allow puppetmaster to send reports to logstash 52a183c HEAD@{121}: pull: eventlogging: couple less tightly to ganglia 3359884 HEAD@{122}: pull: Fix Parsoid in beta ac13d48 HEAD@{123}: pull: Get betalabs localsettings.js file from deploy repo (just like prod)
Reapplied via local cherry-pick (eg git cherry-pick ac13d48). Here's the local patch stack now:
$ git log --color --pretty=oneline --abbrev-commit origin/HEAD..HEAD 59ab71d logstash: Rules for processing MW input via Redis fccc426 Configure Logstash and Elasticsearch for ApiFeatureUsage be26b69 Change eventlogging log dir f6824b3 eventlogging: couple less tightly to ganglia c8d6c26 Fix Parsoid in beta 4e58e5a Get betalabs localsettings.js file from deploy repo (just like prod) 158a853 Allow puppetmaster to send reports to logstash 9e05e32 [LOCAL HACK] Bug 65591: User['mwdeploy'] shell => /bin/bash 908b435 [LOCAL HACK] Change MySQL admin user in sql script
I think this is fixed for now, but I'm still not 100% sure how things got messed up in the first place. Maybe @yuvipanda has some ideas?
This also reinforces @hashar's long standing concerns that although convenient our use of cherry-picks sitting on top of the upstream branch is fragile.
The [LOCAL HACK] patches definitely need to be sent to Gerrit if not already.. There is a task filled for one ( T67591 ), the Change MySQL admin user in sql script could use task/gerrit change.
We should probably sprint with ops to get all those changes reviewed/merged in.
Nope, I found things in the same state earlier, and I re-applied the local hacks myself. My theory is that someone is testing things on betacluster who does not know about the local cherry pick hacks, and is just using normal git things to apply their patches (checkout instead of cherry-pick perhaps?). Outside of that I don't have anything going on, but definitely feels like operator mistake.
I agree that local hacks are terrible. perhaps we can replace that with hiera :) Let me take a look.
I'll email about it to ops/qa lists
I agree that local hacks are terrible. perhaps we can replace that with hiera :) Let me take a look.
see: T76392
The local hacks / cherry picks have been restored so this task is complete.
We have another task to get rid of the local hacks: T76392: Reduce [LOCAL HACK] changes on Beta Cluster to zero