As described in T307349#7895775, it would be nice to clean up old repositories on deploy1002 to avoid inconsistencies when comparing deploy1002 with deploy2002.
Description
Related Objects
Event Timeline
The number of repositories on deploy1002 and deploy2002 under /srv/deployment is the same (as intended by the rsync, it uses --delete). The total size also looks the same.
This ticket sounds like this wouldn't be the case though?
Top ten oldest repos by modifiation time, oldest first:
Oct 9 2013 elasticsearch Dec 9 2013 scholarships Apr 18 2014 ocg May 30 2014 rcstream Sep 9 2014 servermon Oct 17 2014 iegreview Dec 9 2014 statsv Feb 12 2015 cxserver Mar 9 2015 zotero
(same on both servers)
list of repositories mentioned in hieradata/role/common/deployment_server/kubernetes.yaml
(some have a repository but also a "scap_repository" value?)
repository: analytics/statsv repository: data-engineering/airflow-dags repository: data-engineering/airflow-dags repository: data-engineering/airflow-dags repository: eventlogging repository: iegreview repository: integration/zuul/deploy repository: labs/striker/deploy repository: maps/kartotherian/deploy repository: maps/tilerator/deploy repository: mediawiki/services/eventstreams/deploy repository: mediawiki/services/ores/deploy repository: mediawiki/services/parsoid/deploy repository: mediawiki/services/restbase/deploy repository: openstack/horizon/deploy repository: operations/docker-images/docker-pkg/deploy repository: operations/dumps repository: operations/software/cassandra-twcs repository: operations/software/debmonitor/deploy repository: operations/software/gerrit repository: operations/software/gerrit/tools/gervert/deploy repository: operations/software/homer/deploy repository: operations/software/librenms repository: operations/software/logstash-logback-encoder repository: operations/software/logstash/plugins repository: operations/software/netbox-deploy repository: search/MjoLniR/deploy repository: wikidata/query/deploy repository: wikimedia/discovery/analytics scap_repository: analytics/refinery/scap scap_repository: data-engineering/airflow-dags-scap-analytics scap_repository: data-engineering/airflow-dags-scap-analytics_test scap_repository: data-engineering/airflow-dags-scap-research scap_repository: eventlogging/scap/analytics scap_repository: operations/dumps/scap
So does T307349#7895775 say all repos on (both) deployment servers should be deleted if they do not appear in the list above?
list of repos that exist on deployment servers but do not appear in the kubernetes.yaml. (just using the string that is the first level of the name but if this does not appear at all in the yaml then it's a candidate).
3d2png apache2modsec changeprop citoid cp-jobqueue cpjobqueue cxserver design dropwizard elasticsearch electron-render fluoride graphoid httpbb httpbb-tests imagecatalog jobrunner jobrunner.old mathoid mediawiki-staging mobileapps modsec ocg performance phabricator prometheus proton puppetboard rcstream recommendation-api releng relforge scholarships sentry servermon trending-edits wdqs zotero
@Dzahn That doesn't seem right- mediawiki-staging is the current main method of deploying mediawiki, and httpbb-tests seems in active usage (although I don't know how it is deployed): https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=httpbb
Both appear on puppet:
https://puppetboard.wikimedia.org/catalog/deploy1002.eqiad.wmnet
My suggestion is to start with those resources not appearing on puppet first.
Those that are old AND on puppet should be easier to cleanup with a puppet ensure => absent.
This is a list of resources configured on puppet, but I am not sure if the list is exhaustive:
File[/srv/deployment/scap] from /etc/puppet/modules/git/manifests/clone.pp:184 File[/srv/deployment/httpbb-tests] from /etc/puppet/modules/httpbb/manifests/init.pp:13 File[/srv/deployment/imagecatalog] from /etc/puppet/modules/imagecatalog/manifests/init.pp:34 File[/srv/deployment/mediawiki-staging] from /etc/puppet/modules/profile/manifests/mediawiki/deployment/server.pp:122
@jcrespo You are correct. In that case I still don't understand what this ticket is really asking for, first I thought it was about both deployment servers not being in sync (T309162#7958748), then I thought it was about the kubernetes.yaml not being in sync with what is on the deployment servers. It doesn't seem to be either of those though.
@hashar As the original requester (T307349#7895775), could you help clarify what's needed here?
The issue we had was to compare the state of the repositories between the two deployment servers. One of them had some repositories deleted. Missing one got provisionned from scratch by Puppet using the tip of the branch which showed up as different with the copy on the other deployment server cause those repositories had long no more been deployed (even if we had new commits entering). So it is something such as:
Initially:
deploy1002 | deploy2002 |
---|---|
repo @ commit 1 | repo @ commit 1 |
obsoleterepo | obsoleterepo |
Repo got deleted and cloned on deploy1002:
deploy1002 | deploy2002 |
---|---|
repo @ commit 1 | repo @ commit 2 (newer) |
obsoleterepo | <missing repo> |
repo get cloned cause it still in Puppet and thus that showed up on diff.
<missing repo> is not cloned cause it got removed from Puppet but did not get manually garbage collected from the deployment servers. That is what this task is about: remove repos from the deployment servers which are no more defined in Puppet and no more deployed via scap.
It is also possible that some repositories on the deployment servers should be kept even though they are not defined in Puppet.
Thanks for the explanation. So the harder part of this ticket is then how to tell the difference.. which repos have (ever) been in puppet (we can't just look at one specific date, I assume).. and which repos are among the "were never in puppet but should be kept" (and not added to puppet?) and why.
Currently I would not know how to identify those that should be kept.
Are these mtime of the parent directory or the newest file/commit inside those repos?
I don't know about all repos, but for statsv at least this I don't think git-pull and scap commands result in the mtime of the parent directory changing. If that's true for other repos as well, then these are effectively creation times, with "last modified" generally being equal to that, but that's not indication that the repo is inactive.
/srv/deployment$ l 9 Dec 2014 statsv/ /srv/deployment/statsv/statsv$ l 30 Sep 2021 statsv.py /srv/deployment/statsv/statsv$ git l -1 * (HEAD -> master, tag: scap/sync/2021-09-30) Add TLS support (14 Sep 2021)
[…]
I didn't know Scap was used in Kubernetes, but most of these are in Puppet with online servers receiving their code. As example. performance/ this contains performance/navtiming/ which was last deployed last week to webperf1003.
This one is definitely okay to remove. It hasn't been in production for a while and used to be owned between Perf and Analytics, since then replaced by EventStreams.
Not sure about the comparison but on deploy1002 we have roughly:
$ (cd /srv/deployment && find . -mindepth 3 -maxdepth 3 -type d -name .git -printf "%h\n"|sort) find: ‘./.links2’: Permission denied find: ‘./imagecatalog’: Permission denied ./3d2png/deploy ./airflow-dags/analytics ./airflow-dags/analytics_test ./airflow-dags/platform_eng ./airflow-dags/research ./analytics/refinery ./apache2modsec/apache2modsec ./cassandra/logstash-logback-encoder ./cassandra/metrics-collector ./cassandra/twcs ./changeprop/deploy ./citoid/deploy ./cp-jobqueue/cp-jobqueue ./cpjobqueue/deploy ./cxserver/deploy ./debmonitor/deploy ./design/style-guide ./docker-pkg/deploy ./dropwizard/metrics ./dumps/dumps ./elasticsearch/plugins ./electron-render/deploy ./eventlogging/analytics ./eventstreams/deploy ./fluoride/fluoride ./gerrit/gerrit ./gervert/deploy ./graphoid/deploy ./homer/deploy ./horizon/deploy ./iegreview/iegreview ./integration/docroot ./jobrunner/jobrunner ./jobrunner.old/jobrunner ./kartotherian/deploy ./librenms/librenms ./logstash/plugins ./mathoid/deploy ./mobileapps/deploy ./netbox/deploy ./ocg/ocg ./ores/deploy ./parsoid/config ./parsoid/deploy ./performance/arc-lamp ./performance/asoranking ./performance/coal ./performance/navtiming ./phabricator/deployment ./prometheus/jmx_exporter ./proton/deploy ./puppetboard/deploy ./rcstream/rcstream ./recommendation-api/deploy ./releng/phatality ./relforge/mjolnir ./restbase/deploy ./scholarships/scholarships ./search/airflow ./sentry/sentry ./servermon/servermon ./statsv/statsv ./striker/deploy ./tilerator/deploy ./trending-edits/deploy ./wdqs/wdqs ./zotero/translation-server ./zotero/translators ./zuul/deploy
From hieradata/role/common/deployment_server/kubernetes.yaml scap::sources using a basic script:
import yaml with open('hieradata/role/common/deployment_server/kubernetes.yaml') as f: for source in yaml.safe_load(f).get('scap::sources'): print('./%s' % source)
If I diff that diff --color -U0 <(sort deployed.txt) <(python3 scaprepos.py|sort)
--- deployed.txt 2023-01-13 15:44:55.424107091 +0100 +++ inpuppet.txt 2023-01-13 15:44:55.424107091 +0100 @@ -5,0 +6 @@ +./analytics/hdfs-tools/deploy @@ -7 +8,2 @@ -./apache2modsec/apache2modsec +./analytics/superset/deploy +./analytics/turnilo/deploy @@ -9 +10,0 @@ -./cassandra/metrics-collector @@ -11,5 +11,0 @@ -./changeprop/deploy -./citoid/deploy -./cp-jobqueue/cp-jobqueue -./cpjobqueue/deploy -./cxserver/deploy @@ -19 +14,0 @@ -./dropwizard/metrics @@ -21,2 +15,0 @@ -./elasticsearch/plugins -./electron-render/deploy @@ -25 +17,0 @@ -./fluoride/fluoride @@ -28 +19,0 @@ -./graphoid/deploy @@ -33,2 +23,0 @@ -./jobrunner/jobrunner -./jobrunner.old/jobrunner @@ -38,2 +26,0 @@ -./mathoid/deploy -./mobileapps/deploy @@ -41 +27,0 @@ -./ocg/ocg @@ -43 +28,0 @@ -./parsoid/config @@ -50,5 +34,0 @@ -./prometheus/jmx_exporter -./proton/deploy -./puppetboard/deploy -./rcstream/rcstream -./recommendation-api/deploy @@ -56 +35,0 @@ -./relforge/mjolnir @@ -58 +36,0 @@ -./scholarships/scholarships @@ -60,2 +38 @@ -./sentry/sentry -./servermon/servermon +./search/mjolnir/deploy @@ -63 +39,0 @@ -./striker/deploy @@ -65 +40,0 @@ -./trending-edits/deploy @@ -67,2 +42 @@ -./zotero/translation-server -./zotero/translators +./wikimedia/discovery/analytics
This was long forgotten. The problem is when a Scap::Target is removed from Puppet, it is not necessarily cleaned up from the deployment server. We do reimage them from time to time so eventually old repositories vanishes. If I remember properly the two deploy servers at the time had different ages and thus one of them add more repositories than the other (cause of the left over repositories).
What we could potentially do is ask Puppet DB for the list of Scap::Target resources, extract the path of each repos and dump them in a file somewhere on the deployment server. Then a timer could crawl through /srv/deployment to find current git repositories on disk and an alert can be emitted when the lists differ asking for manual deletion.
Or we can make Puppet to somehow fully manage /srv/deployment.