Page MenuHomePhabricator

Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol on deployment-pdf01
Closed, ResolvedPublic

Description

Been around for ages now.

Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol
Notice: instanceproject: deployment-prep
Notice: /Stage[main]/Role::Labs::Instance/Notify[instanceproject: deployment-prep]/message: defined 'message' as 'instanceproject: deployment-prep'
Notice: /Stage[main]/Base::Sysctl/Sysctl::Parameters[wikimedia base]/Sysctl::Conffile[wikimedia base]/File[/etc/sysctl.d/60-wikimedia-base.conf]: Dependency Sysctl::Parameters[wikimedia base] has failures: true
Warning: /Stage[main]/Base::Sysctl/Sysctl::Parameters[wikimedia base]/Sysctl::Conffile[wikimedia base]/File[/etc/sysctl.d/60-wikimedia-base.conf]: Skipping because of failed dependencies
Notice: /Stage[main]/Ocg/File[/var/log/ocg]: Not removing directory; use 'force' to override
Notice: /Stage[main]/Ocg/File[/var/log/ocg]: Not removing directory; use 'force' to override
Error: Could not remove existing file
Error: /Stage[main]/Ocg/File[/var/log/ocg]/ensure: change from directory to link failed: Could not remove existing file
Notice: /Stage[main]/Lvs::Realserver/Exec[/usr/sbin/dpkg-reconfigure -p critical -f noninteractive wikimedia-lvs-realserver]: Triggered 'refresh' from 1 events
Notice: /Stage[main]/Sysctl/Exec[update_sysctl]: Dependency Sysctl::Parameters[wikimedia base] has failures: true
Warning: /Stage[main]/Sysctl/Exec[update_sysctl]: Skipping because of failed dependencies
Notice: Finished catalog run in 17.65 seconds

Event Timeline

yuvipanda raised the priority of this task from to Needs Triage.
yuvipanda updated the task description. (Show Details)
yuvipanda subscribed.

Issue does not seem to be present on pdf02 that has the exact same set of classes. I even verified /var/lib/puppet/state/classes.txt - sorted - has the same MD5. I am inclined to say "just replace the pdf01 instance with the pdf02 instance"

Why do we have two pdf servers anyway?

(am inclined to agree, btw. Let's get rid of the pdf01 instance once @cscott chimes in)

mmodell triaged this task as Medium priority.Jun 8 2015, 6:45 PM
mmodell moved this task from To Triage to In-progress on the Beta-Cluster-Infrastructure board.

I couldn't find any logical reason for this error to happen.

hashar subscribed.

+ OCG projects

If anyone actually watch those tasks, can you look at the beta cluster instances deployment-pdf01 and deployment-pdf02 and ensure they are working properly? Seems the pdf01 can be destroyed but then we have no idea what they are for.

Additional note: we can't connect to deployment-pdf01.eqiad.wmflabs

Unrelated but on deployment-pdf01 I have deleted /var/log/ocg/ content. The last entry was from July 25th 2014 and puppet complained with [/var/log/ocg]: Not removing directory; use 'force' to override.

Puppet recreated /var/log/ocg as a symlink to /srv/deployment/ocg/log. There is a bunch of logs being written there now.

hashar claimed this task.

I removed puppet and ruby from deployment-pdf01 including /var/lib/puppet.

That copied all the .rb files under /var/lib/puppet/lib/ and Finished catalog run in 47.30 seconds

So something ended up being corrupted somehow :-(