Page MenuHomePhabricator

Remove old scap repositories from deploy1002
Open, LowPublic

Description

As described in T307349#7895775, it would be nice to clean up old repositories on deploy1002 to avoid inconsistencies when comparing deploy1002 with deploy2002.

Event Timeline

The number of repositories on deploy1002 and deploy2002 under /srv/deployment is the same (as intended by the rsync, it uses --delete). The total size also looks the same.

This ticket sounds like this wouldn't be the case though?

Top ten oldest repos by modifiation time, oldest first:

Oct  9  2013 elasticsearch
Dec  9  2013 scholarships
Apr 18  2014 ocg
May 30  2014 rcstream
Sep  9  2014 servermon
Oct 17  2014 iegreview
Dec  9  2014 statsv
Feb 12  2015 cxserver
Mar  9  2015 zotero

(same on both servers)

list of repositories mentioned in hieradata/role/common/deployment_server/kubernetes.yaml

(some have a repository but also a "scap_repository" value?)

repository: analytics/statsv
repository: data-engineering/airflow-dags
repository: data-engineering/airflow-dags
repository: data-engineering/airflow-dags
repository: eventlogging
repository: iegreview
repository: integration/zuul/deploy
repository: labs/striker/deploy
repository: maps/kartotherian/deploy
repository: maps/tilerator/deploy
repository: mediawiki/services/eventstreams/deploy
repository: mediawiki/services/ores/deploy
repository: mediawiki/services/parsoid/deploy
repository: mediawiki/services/restbase/deploy
repository: openstack/horizon/deploy
repository: operations/docker-images/docker-pkg/deploy
repository: operations/dumps
repository: operations/software/cassandra-twcs
repository: operations/software/debmonitor/deploy
repository: operations/software/gerrit
repository: operations/software/gerrit/tools/gervert/deploy
repository: operations/software/homer/deploy
repository: operations/software/librenms
repository: operations/software/logstash-logback-encoder
repository: operations/software/logstash/plugins
repository: operations/software/netbox-deploy
repository: search/MjoLniR/deploy
repository: wikidata/query/deploy
repository: wikimedia/discovery/analytics
scap_repository: analytics/refinery/scap
scap_repository: data-engineering/airflow-dags-scap-analytics
scap_repository: data-engineering/airflow-dags-scap-analytics_test
scap_repository: data-engineering/airflow-dags-scap-research
scap_repository: eventlogging/scap/analytics
scap_repository: operations/dumps/scap

So does T307349#7895775 say all repos on (both) deployment servers should be deleted if they do not appear in the list above?

list of repos that exist on deployment servers but do not appear in the kubernetes.yaml. (just using the string that is the first level of the name but if this does not appear at all in the yaml then it's a candidate).

3d2png
apache2modsec
changeprop
citoid
cp-jobqueue
cpjobqueue
cxserver
design
dropwizard
elasticsearch
electron-render
fluoride
graphoid
httpbb
httpbb-tests
imagecatalog
jobrunner
jobrunner.old
mathoid
mediawiki-staging
mobileapps
modsec
ocg
performance
phabricator
prometheus
proton
puppetboard
rcstream
recommendation-api
releng
relforge
scholarships
sentry
servermon
trending-edits
wdqs
zotero

@Dzahn That doesn't seem right- mediawiki-staging is the current main method of deploying mediawiki, and httpbb-tests seems in active usage (although I don't know how it is deployed): https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=httpbb

Both appear on puppet:

https://puppetboard.wikimedia.org/catalog/deploy1002.eqiad.wmnet

My suggestion is to start with those resources not appearing on puppet first.

Those that are old AND on puppet should be easier to cleanup with a puppet ensure => absent.

This is a list of resources configured on puppet, but I am not sure if the list is exhaustive:

File[/srv/deployment/scap] from /etc/puppet/modules/git/manifests/clone.pp:184
File[/srv/deployment/httpbb-tests] from /etc/puppet/modules/httpbb/manifests/init.pp:13
File[/srv/deployment/imagecatalog] from /etc/puppet/modules/imagecatalog/manifests/init.pp:34
File[/srv/deployment/mediawiki-staging] from /etc/puppet/modules/profile/manifests/mediawiki/deployment/server.pp:122

@jcrespo You are correct. In that case I still don't understand what this ticket is really asking for, first I thought it was about both deployment servers not being in sync (T309162#7958748), then I thought it was about the kubernetes.yaml not being in sync with what is on the deployment servers. It doesn't seem to be either of those though.

LSobanski added subscribers: hashar, LSobanski.

@hashar As the original requester (T307349#7895775), could you help clarify what's needed here?

The issue we had was to compare the state of the repositories between the two deployment servers. One of them had some repositories deleted. Missing one got provisionned from scratch by Puppet using the tip of the branch which showed up as different with the copy on the other deployment server cause those repositories had long no more been deployed (even if we had new commits entering). So it is something such as:

Initially:

deploy1002deploy2002
repo @ commit 1repo @ commit 1
obsoleterepoobsoleterepo

Repo got deleted and cloned on deploy1002:

deploy1002deploy2002
repo @ commit 1repo @ commit 2 (newer)
obsoleterepo<missing repo>

repo get cloned cause it still in Puppet and thus that showed up on diff.

<missing repo> is not cloned cause it got removed from Puppet but did not get manually garbage collected from the deployment servers. That is what this task is about: remove repos from the deployment servers which are no more defined in Puppet and no more deployed via scap.

It is also possible that some repositories on the deployment servers should be kept even though they are not defined in Puppet.

That is what this task is about: remove repos from the deployment servers which are no more defined in Puppet and no more deployed via scap
It is also possible that some repositories on the deployment servers should be kept even though they are not defined in Puppet.

Thanks for the explanation. So the harder part of this ticket is then how to tell the difference.. which repos have (ever) been in puppet (we can't just look at one specific date, I assume).. and which repos are among the "were never in puppet but should be kept" (and not added to puppet?) and why.

Currently I would not know how to identify those that should be kept.

Top ten oldest repos by modifiation time, oldest first:

May 30  2014 rcstream
Dec  9  2014 statsv
[…]

Are these mtime of the parent directory or the newest file/commit inside those repos?

I don't know about all repos, but for statsv at least this I don't think git-pull and scap commands result in the mtime of the parent directory changing. If that's true for other repos as well, then these are effectively creation times, with "last modified" generally being equal to that, but that's not indication that the repo is inactive.

deploy1002
/srv/deployment$ l
9 Dec 2014 statsv/
/srv/deployment/statsv/statsv$ l
30 Sep 2021 statsv.py
/srv/deployment/statsv/statsv$ git l -1
* (HEAD -> master, tag: scap/sync/2021-09-30) Add TLS support (14 Sep 2021)

list of repos that exist on deployment servers but do not appear in the kubernetes.yaml. […]

performance

[…]

I didn't know Scap was used in Kubernetes, but most of these are in Puppet with online servers receiving their code. As example. performance/ this contains performance/navtiming/ which was last deployed last week to webperf1003.

May 30  2014 rcstream
[…]

This one is definitely okay to remove. It hasn't been in production for a while and used to be owned between Perf and Analytics, since then replaced by EventStreams.

I don't remember exactly but most likely find -mtime, yea. ACK!

Not sure about the comparison but on deploy1002 we have roughly:

$ (cd /srv/deployment && find . -mindepth 3 -maxdepth 3 -type d -name .git -printf "%h\n"|sort)
find: ‘./.links2’: Permission denied
find: ‘./imagecatalog’: Permission denied
./3d2png/deploy
./airflow-dags/analytics
./airflow-dags/analytics_test
./airflow-dags/platform_eng
./airflow-dags/research
./analytics/refinery
./apache2modsec/apache2modsec
./cassandra/logstash-logback-encoder
./cassandra/metrics-collector
./cassandra/twcs
./changeprop/deploy
./citoid/deploy
./cp-jobqueue/cp-jobqueue
./cpjobqueue/deploy
./cxserver/deploy
./debmonitor/deploy
./design/style-guide
./docker-pkg/deploy
./dropwizard/metrics
./dumps/dumps
./elasticsearch/plugins
./electron-render/deploy
./eventlogging/analytics
./eventstreams/deploy
./fluoride/fluoride
./gerrit/gerrit
./gervert/deploy
./graphoid/deploy
./homer/deploy
./horizon/deploy
./iegreview/iegreview
./integration/docroot
./jobrunner/jobrunner
./jobrunner.old/jobrunner
./kartotherian/deploy
./librenms/librenms
./logstash/plugins
./mathoid/deploy
./mobileapps/deploy
./netbox/deploy
./ocg/ocg
./ores/deploy
./parsoid/config
./parsoid/deploy
./performance/arc-lamp
./performance/asoranking
./performance/coal
./performance/navtiming
./phabricator/deployment
./prometheus/jmx_exporter
./proton/deploy
./puppetboard/deploy
./rcstream/rcstream
./recommendation-api/deploy
./releng/phatality
./relforge/mjolnir
./restbase/deploy
./scholarships/scholarships
./search/airflow
./sentry/sentry
./servermon/servermon
./statsv/statsv
./striker/deploy
./tilerator/deploy
./trending-edits/deploy
./wdqs/wdqs
./zotero/translation-server
./zotero/translators
./zuul/deploy

From hieradata/role/common/deployment_server/kubernetes.yaml scap::sources using a basic script:

scaprepos.py
import yaml

with open('hieradata/role/common/deployment_server/kubernetes.yaml') as f:
    for source in yaml.safe_load(f).get('scap::sources'):
        print('./%s' % source)

If I diff that diff --color -U0 <(sort deployed.txt) <(python3 scaprepos.py|sort)

--- deployed.txt	2023-01-13 15:44:55.424107091 +0100
+++ inpuppet.txt	2023-01-13 15:44:55.424107091 +0100
@@ -5,0 +6 @@
+./analytics/hdfs-tools/deploy
@@ -7 +8,2 @@
-./apache2modsec/apache2modsec
+./analytics/superset/deploy
+./analytics/turnilo/deploy
@@ -9 +10,0 @@
-./cassandra/metrics-collector
@@ -11,5 +11,0 @@
-./changeprop/deploy
-./citoid/deploy
-./cp-jobqueue/cp-jobqueue
-./cpjobqueue/deploy
-./cxserver/deploy
@@ -19 +14,0 @@
-./dropwizard/metrics
@@ -21,2 +15,0 @@
-./elasticsearch/plugins
-./electron-render/deploy
@@ -25 +17,0 @@
-./fluoride/fluoride
@@ -28 +19,0 @@
-./graphoid/deploy
@@ -33,2 +23,0 @@
-./jobrunner/jobrunner
-./jobrunner.old/jobrunner
@@ -38,2 +26,0 @@
-./mathoid/deploy
-./mobileapps/deploy
@@ -41 +27,0 @@
-./ocg/ocg
@@ -43 +28,0 @@
-./parsoid/config
@@ -50,5 +34,0 @@
-./prometheus/jmx_exporter
-./proton/deploy
-./puppetboard/deploy
-./rcstream/rcstream
-./recommendation-api/deploy
@@ -56 +35,0 @@
-./relforge/mjolnir
@@ -58 +36,0 @@
-./scholarships/scholarships
@@ -60,2 +38 @@
-./sentry/sentry
-./servermon/servermon
+./search/mjolnir/deploy
@@ -63 +39,0 @@
-./striker/deploy
@@ -65 +40,0 @@
-./trending-edits/deploy
@@ -67,2 +42 @@
-./zotero/translation-server
-./zotero/translators
+./wikimedia/discovery/analytics
LSobanski lowered the priority of this task from Medium to Low.Jan 17 2023, 3:53 PM
LSobanski moved this task from Incoming to Backlog on the collaboration-services board.

This was long forgotten. The problem is when a Scap::Target is removed from Puppet, it is not necessarily cleaned up from the deployment server. We do reimage them from time to time so eventually old repositories vanishes. If I remember properly the two deploy servers at the time had different ages and thus one of them add more repositories than the other (cause of the left over repositories).

What we could potentially do is ask Puppet DB for the list of Scap::Target resources, extract the path of each repos and dump them in a file somewhere on the deployment server. Then a timer could crawl through /srv/deployment to find current git repositories on disk and an alert can be emitted when the lists differ asking for manual deletion.

Or we can make Puppet to somehow fully manage /srv/deployment.