Page MenuHomePhabricator

Figure out if the WDCM repository on stats1007 is actually used
Closed, ResolvedPublic

Description

There is a puppet file to clone a Wikidata Concept Monitor repository to a stats100* host: wdcm.pp

It is unclear if that cloned repo is actually in use. It seems to come from an overall abandoned initiative: T171258: WDCM: Puppetization

Also, all the related cron-jobs that we know about seem to run against a personal user directory, not something related to the analytics-wmde user?

Then again, I do not yet understand how/where wdcm.pp is invoked and with what arguments, particularly for $dir.

Acceptance criteria:

  • We know if the code cloned in wdcm.pp is actually used or whether it can be deleted without any further effects at all.

Event Timeline

The clone path seems to resolve to /srv/analytics-wmde/wdcm/src/:

lucaswerkmeister-wmde@stat1007:~$ sudo -u analytics-wmde git -C /srv/analytics-wmde/wdcm/src/ remote -v
origin	https://gerrit.wikimedia.org/r/analytics/wmde/WDCM (fetch)
origin	https://gerrit.wikimedia.org/r/analytics/wmde/WDCM (push)
lucaswerkmeister-wmde@stat1007:~$ sudo -u analytics-wmde git -C /srv/analytics-wmde/wdcm/src/ show -U0
commit aa40182c322410584e91b93b9327adc017a218e9 (HEAD -> master, origin/master, origin/HEAD)
Author: GoranMilovanovic <goran.s.milovanovic@gmail.com>
Date:   Thu Feb 8 20:08:09 2018 +0100

    Minor
    
    Change-Id: I5329a17b7b82b88958945f90b804195838e536b7

diff --git a/wdcmStructure_Update.R b/wdcmStructure_Update.R
index dcf4b79..1d2b102 100644
--- a/wdcmStructure_Update.R
+++ b/wdcmStructure_Update.R
@@ -291,0 +292,9 @@ write.csv(myWD$counts, "wdcmStructure_Counts.csv")
+### --- updateReport File
+updateReport <- as.character(Sys.time())
+upY <- substr(updateReport, 1, 4)
+upM <- as.numeric(substr(updateReport, 6, 7))
+upM <- month.name[upM]
+upD <- substr(updateReport, 9, 10)
+updateReport <- paste0(upY, " ", upM, " ", upD)
+write(updateReport, "updateReport.txt")
+

Note that this commit is 23 commits behind the actually current master, if I’m not mistaken. But given that the repository thinks it’s up to date with origin/master, it seems the last git fetch was somewhere between February 2018 (commit shown above) and June 2018 (first commit after that). I assume that’s because the Puppet spec has ensure => 'present' for the git clone, rather than ensure => 'latest' as seen in e.g. graphite.pp (which clones the analytics/wmde/scripts repo); the TODO below would agree with that:

git::clone { 'analytics/wmde/WDCM':
    # TODO do we want a similar latest & production branch here? Or just manually pulling? scap?
    # Currently when we update the code in the repo we will have to pull the updates ourselves.
    ensure    => 'present',

Given that one of the missing commits is called Security patch, I sure hope this outdated code never actually runs…

None of the systemd timers seem to do anything with WDCM:

lucaswerkmeister-wmde@stat1007:~$ systemctl cat -- $(systemctl list-timers --no-legend --full | awk '{print $NF}') | grep -i wdcm | wc -l
0

(awk {print $NF} selects the last field of the systemctl list-timers output, which is the name of the activated service, so this searches through all the timer-activated services.)

Unfortunately, it looks like I can’t check analytics-wmde’s crontab:

lucaswerkmeister-wmde@stat1007:~$ sudo -u analytics-wmde crontab -l
crontabs/analytics-wmde/: fopen: Permission denied

Also, all the related cron-jobs that we know about seem to run against a personal user directory, not something related to the analytics-wmde user?

If we trust that this spreadsheet is complete, I think it’s almost certain that this repository isn’t used (now that we know what its path is, and can see that the path doesn’t show up in the spreadsheet); otherwise I guess we could ask someone with more permissions to check some more crontabs?

(The other analytics clients – stat1004, stat1005, stat1006, stat1008, stat1009 – don’t have a /srv/analytics-wmde directory at all, so I think this repository is limited to stat1007.)

Thank you! This looks good to me. I'm moving it forward to Product Verification. @Manuel, @AndrewTavis_WMDE does one of you happen to have access to the analytics-wmde user so that you can check sudo -u analytics-wmde crontab -l on stat1007?

Just checked in with @Michael on this and we did a check of my ability to run sudo -u analytics-wmde crontab -l on stat1007. I sadly don't have access :(

Hi @JAllemandou, we would need to run sudo -u analytics-wmde crontab -l on stat1007 to verify that we can deprecate something that looks like a forgotten clone of some of the code that we last looked at together (for the Spark 3 migration). Could you please help us out and run the command for us, or refer us to someone that can?

Hi @JAllemandou, we would need to run sudo -u analytics-wmde crontab -l on stat1007 to verify that we can deprecate something that looks like a forgotten clone of some of the code that we last looked at together (for the Spark 3 migration). Could you please help us out and run the command for us, or refer us to someone that can?

Thanks for reaching out @JAllemandou
Here's the response from the

sudo crontab -u analytics-wmde -l
# HEADER: This file was autogenerated at 2021-03-23 17:50:52 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.

Thank you, @Stevemunene and @JAllemandou!

@Michael: Did this help to come to a conclusion?

Yes, it helps! We now know that there are no additional cronjobs running from this user (or related to WMDE in general). Which means that we should be able to safely remove that clone of the WDCM repo.

Manuel claimed this task.

Great, thank you all! I just opened a task to remove the clone: T351072: Remove the WDCM clone (stats1007)