Page MenuHomePhabricator

Cronspam from terbium
Closed, DuplicatePublic

Description

After @demon's patch to to fix MW maintenance scripts (https://gerrit.wikimedia.org/r/#/c/309616/) we still receive emails from Terbium, but this time for credentials issues:

Cron <www-data@terbium> /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/echo.dblist extensions/Echo/maintenance/processEchoEmailBatch.php >/dev/null
DB connection error: Access denied for user 'wikiadmin'@'10.64.32.13' (using password: YES) (208.80.153.14)
Cron <www-data@terbium> /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null
DB connection error: Access denied for user 'wikiadmin'@'10.64.32.13' (using password: YES) (208.80.153.14)

labtestweb2001.wikimedia.org. == 208.80.153.14

Why terbium needs to access labtestweb2001.wikimedia.org as wikiadmin?

Event Timeline

elukey created this task.Sep 12 2016, 6:05 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 12 2016, 6:05 AM

I am not sure wikitech should be reachable by terbium maintenance, and less by a production credential user like wikiadmin. Labswiki is not a production-core wiki, and it is not in the production network.

elukey updated the task description. (Show Details)Sep 16 2016, 7:39 AM
elukey added a subscriber: Krenair.EditedSep 16 2016, 7:46 AM
elukey added a subscriber: thcipriani.

Added both Tyler and Alex to get their thoughts about this issue :)

labtestweb2001 should be treated like silver here. terbium needs access to those so it can run non-essential maintenance jobs for the wikis that usually run there.

@Andrew @dpatrick do you think it is wise that labtestweb2001 had production passwords?

Which production passwords are you referring to?

Which production passwords are you referring to?

I am not sure wikitech should be reachable by terbium maintenance, and less by a production credential user like wikiadmin. Labswiki is not a production-core wiki, and it is not in the production network.

Krenair added a comment.EditedSep 20 2016, 12:47 PM

Oh, so you want to change the puppet manifests around so silver/labtestweb2001 can run those jobs for itself, and then change the mysql password and have it use a different one of those too?

I am not sure wikitech should be reachable by terbium maintenance, and less by a production credential user like wikiadmin. Labswiki is not a production-core wiki, and it is not in the production network.

Yes, a shared wikiadmin credential. I do not mind accessing the server from core. It is sharing credentials what worries me (and why it fails now, as it is not shared).

You know those credentials are pushed to every (wmnet) MW server by scap, right?

For the 'why now' part: could it be due to the fact that the script stopped before reaching this point? It started to occur IIRC right after https://gerrit.wikimedia.org/r/#/c/309616/ was merged.

Oh, so you want to change the puppet manifests around so silver can run those jobs for itself, and then change the mysql password and have it use a different one of those too?

I do not have a solid proposal, but that would work, yes. I would like Andrew (or you?) to have a saying on the overall architecture so that it it is as easy as possible but also as safe (isolated) as intended. Maybe even disabling those jobs directly, if they are not essential.

The script used to break before reaching this point, https://gerrit.wikimedia.org/r/#/c/309616/ fixed that. Now it's just broken by something that we already got fixed for labswiki, just not (yet?) labtestwiki

The script used to break before reaching this point, https://gerrit.wikimedia.org/r/#/c/309616/ fixed that. Now it's just broken by something that we already got fixed for labswiki, just not (yet?) labtestwiki

It is the passwords (account), but we just do not share passwords just because there is an error; specially when they are equivalent to a root password for databases to a non-production host called "labtestwiki".

Krenair added a subscriber: bd808.Sep 20 2016, 1:15 PM

Off the top of my head, the difference between this server and silver, beyond that it lives in codfw and has 'test' in it's name, is pretty much just that it also runs services that would live on californium, and has me and @bd808 as roots. Is that worth making a security distinction over?

Oh, so you want to change the puppet manifests around so silver can run those jobs for itself, and then change the mysql password and have it use a different one of those too?

I do not have a solid proposal, but that would work, yes. I would like Andrew (or you?) to have a saying on the overall architecture so that it it is as easy as possible but also as safe (isolated) as intended. Maybe even disabling those jobs directly, if they are not essential.

We could probably figure out how to get crons running on silver/labtestweb2001 itself, that would make the terbium grants unnecessary.

We're not going to disable the jobs altogether, they do some things like updating links and pages around the site that users expect to happen.

However, there is a wider issue with isolation of silver/labtestweb2001. Although things are firewalled etc., they are still mediawiki hosts so still receive the full mediawiki private repository containing not just mysql credentials (which you want to special-case on these hosts). This contains various other things which you may also not want silver/labtestweb2001 to see, but this is not a private ticket so maybe let's avoid that subject here.
We could change the private repository so it tells MediaWiki to use different MySQL passwords for those isolated wikitech DBs - but if they still get the core production secrets on the filesystem readable by the web server, I honestly question whether there's any point.

I honestly would accept no separation, if the host was called other than labstestweb- labswiki2, silver2, wikitech2, mw2300. Anything that does not implies "test" or "labs VM", so we could grant access to anyone by accident because "it is just a test machine", "it is not important". I hate the name, and it is 90% of the confusion it creates to me.

I honestly would accept no separation, if the host was called other than labstestweb- labswiki2, silver2, wikitech2, mw2300. Anything that does not implies "test" or "labs VM", so we could grant access to anyone by accident because "it is just a test machine", "it is not important". I hate the name, and it is 90% of the confusion it creates to me.

If we're going to separate things (and I am willing to support that, if we can figure out how to implement in a useful way) it needs to be because of actual security reasons, not just the machine's host name. Both groups with access (deployment and labtest-roots) have sudo rules and so go through ops meeting review (the same cannot be said for some far more sensitive things like reading certain analytics databases). I'm confident the operations team is far more knowledgeable than that, and am willing to trust them (not that I have a choice, but regardless). The name doesn't imply it's a labs VM, this is labs-support-level stuff. It answers to production's puppet and salt masters and is very close to being in the production puppet realm.

Volans added a subscriber: Volans.Nov 22 2016, 10:07 AM

Additional cronspam from the same script with different message:

Set $wgShowExceptionDetails = true; in LocalSettings.php to show detailed debugging information.

The cron is:

### from email subject:
Cron <www-data@terbium> /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/echo.dblist extensions/Echo/maintenance/processEchoEmailBatch.php >/dev/null

### from terbium crontab:
# Puppet Name: echo_mail_batch
0 0 * * * /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/echo.dblist extensions/Echo/maintenance/processEchoEmailBatch.php >/dev/null

Also a different one was triggered:

### From email subject:
Cron <www-data@terbium> /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null

### From terbium crontab:
# Puppet Name: cleanup_upload_stash
0 1 * * * /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null

The output in this case is a rather large list of arrays of arrays with backend-fail-delete with swift addresses like mwstore://local-swift-codfw/local-temp/thumb..... for both eqiad and codfw.

To avoid the cronspam the stderr should be redirected to a file or /dev/null depending if is needed or not. Or directly in the cron entries or, better, in the wrapper script that is executed.

Peachey88 added a subscriber: Peachey88.EditedNov 22 2016, 10:16 AM

Additional cronspam from the same script with different message:

not related, other breakage T148957: $wgShowExceptionDetails = true apparently is broken

Well "not related" as in the message, the true error that should be getting shown could be related.

Since Feb. 19th we're getting one email every day from terbium with an error for each wiki (~900 lines email) with:

The following extensions are required to be installed for this script to run: PageAssessments. Please enable them and then try again.

From the www-data crontab, installed by the mediawiki::maintenance::pageassessments puppet class:

# Puppet Name: pageassessments_cleanup
42 20 * * * /usr/local/bin/foreachwiki extensions/PageAssessments/maintenance/purgeUnusedProjects.php > /dev/null

Adding MediaWiki-extensions-PageAssessments

Restricted Application added a project: Community-Tech. · View Herald TranscriptApr 4 2018, 6:58 PM