Page MenuHomePhabricator

Make the parsoid server on the beta cluster a mediawiki app server
Closed, ResolvedPublic

Description

Ref: T231569#5462187

This task is similar to how @Dzahn and @Joe made scandium (which previously was a parsoid testing server) into a mediawiki app server. This requires the appropriate puppet patches to be written to make deployment-parsoid09.deployment-prep.eqiad.wmflabs an app server.

By doing so and merging and deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/534215, we effectively would have deployed Parsoid/PHP on the beta cluster. This task blocks T231569 and blocks RESTBase's integration testing with Parsoid/PHP.

Event Timeline

hi @ssastry just a clarification: how would we load the parsoid code, if it can't be merged in the wmf vendor repository? Same way we do on scandium?

hi @ssastry just a clarification: how would we load the parsoid code, if it can't be merged in the wmf vendor repository? Same way we do on scandium?

Yes. gerrit 534215 linked in the description enables that.

Change 536598 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[operations/puppet@production] beta cluster: Make deployment-parsoid09 a Mediawiki appserver as well

https://gerrit.wikimedia.org/r/536598

Change 536598 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[operations/puppet@production] beta cluster: Make deployment-parsoid09 a Mediawiki appserver as well

https://gerrit.wikimedia.org/r/536598

CPT folks, I've done a very preliminary patch here. I don't yet know what else is involved there but, if @jeena's code review indicates it is simple to push it over the hump since I've already licked the cookie, so to speak. But, if any of you want to drive this to completion, I appreciate the help. Please take it over.

Change 536598 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[operations/puppet@production] beta cluster: Make deployment-parsoid09 a Mediawiki appserver as well

https://gerrit.wikimedia.org/r/536598

CPT folks, I've done a very preliminary patch here. I don't yet know what else is involved there but, if @jeena's code review indicates it is simple to push it over the hump since I've already licked the cookie, so to speak. But, if any of you want to drive this to completion, I appreciate the help. Please take it over.

It's a bit more involved than adding it to the scap targets. Currently deployment-parsoid09 has role::parsoid applied. It will also need role::labs::lvm::srv, role::mediawiki::appserver and role::beta::mediawiki.

As @hashar mentioned on your patchset you get these for free by renaming the instance: deployment-mediawiki-parsoid01 (or some such).

An update.

So, as per @thcipriani and @hashar's recommendations, I decided to create a new VM with a deployment-mediawiki-parsoid prefix. After chatting with @Pchelolo, the plan is to leave parsoid09 alone since it might be useful to have Parsoid/JS and Parsoid/PHP on different VMs.

But, in starting up a new instance, running into puppet issues and @Krenair is looking into it and he will spin up a new instance once he has figured it out.

Once that is successfully booted up and we confirm I can login and all the relevant puppet roles / classes are applied, I'll update the gerrit patch to make this new VM a scap target (for both mediawiki and parsoid codebases).

Okay, a further update.

@Krenair had to remove some of the roles from the puppet-prefix rules. Specially, he only left behind role::labs::lvm::srv and profile::rsyslog::kafka_shipper. So, anyone wanting to create new mediawiki appserver instances in the future should remember to manually apply those other roles. He had to remove these roles because new VM creation was leaving behind the instances in a bad state. The first 2 tries to spin up the server had to be abandoned.

That puppet prefix is there to not have to remember to apply those classes! Their removal defeat the purpose of the puppet prefix entirely.

If role::mediawiki::appserver and role::beta::mediawiki breaks the instance provisioning, I guess we have to fix them? Have you captured any log?

@jeena do you remember facing issues with puppet when you have created your deployment-mediawiki instance?

That puppet prefix is there to not have to remember to apply those classes! Their removal defeat the purpose of the puppet prefix entirely.

If role::mediawiki::appserver and role::beta::mediawiki breaks the instance provisioning, I guess we have to fix them? Have you captured any log?

@jeena do you remember facing issues with puppet when you have created your deployment-mediawiki instance?

@Krenair can answer that qn better than me.

That puppet prefix is there to not have to remember to apply those classes! Their removal defeat the purpose of the puppet prefix entirely.

If role::mediawiki::appserver and role::beta::mediawiki breaks the instance provisioning, I guess we have to fix them? Have you captured any log?

@jeena do you remember facing issues with puppet when you have created your deployment-mediawiki instance?

@Krenair can answer that qn better than me.

Well, their removal allows us to actually create instances that begin with the prefix, instead of being forced to abandon the prefix and name new instances something else. We can still set hiera on the prefix.
It's possible only one of those was necessary rather than both I guess.
I did not record all of the exceptions and problems encountered while we tried to make an instance under the prefix with those roles, I do recall that it got stuck with ferm problems for a while and at some point the puppet-agent process just vanished (we think it timed out after running for too long).
It should be possible to make a new prefix with those roles, create an instance inside it and see what happens in the console log in horizon.

Next update.

I applied all those roles and ran sudo puppet agent -tv on the instance and it failed soon enough:

ssastry@deployment-mediawiki-parsoid10:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find template 'varnish/errorpage.body.html.erb' at /etc/puppet/modules/profile/manifests/mediawiki/hhvm.pp:97:20 on node deployment-mediawiki-parsoid10.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

So, I am stalled here at this point and need to hand it off to someone who knows how to proceed.

Next update.

I applied all those roles and ran sudo puppet agent -tv on the instance and it failed soon enough:

ssastry@deployment-mediawiki-parsoid10:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find template 'varnish/errorpage.body.html.erb' at /etc/puppet/modules/profile/manifests/mediawiki/hhvm.pp:97:20 on node deployment-mediawiki-parsoid10.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

So, I am stalled here at this point and need to hand it off to someone who knows how to proceed.

Looks like @Krenair fixed it and puppet is running and installing things. But, on #wikimedia-cloud he says that on @jeena 's instance, puppet is stuck ( wrt the above discussion about removing the puppet roles from the prefix rules). I think that is worth following up on a separate phab ticket.

Looks like @Krenair fixed it and puppet is running and installing things. But, on #wikimedia-cloud he says that on @jeena 's instance, puppet is stuck ( wrt the above discussion about removing the puppet roles from the prefix rules). I think that is worth following up on a separate phab ticket.

@jeena's instance has some sort of ferm/iptables problem that might end up fixed in a restart or something, it's rather strange and I haven't had time to dig into it fully yet. @ssastry's instance I thought had got stuck running puppet, but actually it just takes a long time to apply those roles we removed from the prefix:
Notice: Applied catalog in 1215.82 seconds
That's over 20 minutes, plus:

<Krenair> andrewbogott, this is kind of weird, it's like puppet-agent just died mid-run?
<andrewbogott> might've been killed if it ran too long, I think it has a max time allowed

Edit: Actually I ran puppet a few more times on @ssastry's instance and it appears to have a ferm problem like @jeena's

ssastry claimed this task.

Ok, I verified that the instance has both Parsoid and Mediawiki latest version of code on there. There are gerrit patches to make this instance a mediawiki and parsoid scap target, but those can proceed independently of this.

I am going to resolve this ticket now since the narrow requirement of this task has been addressed. Thanks all for your pointers and help in making this happen.

@Krenair I wasn't able to run puppet because of the ferm problem you mentioned. @thcipriani commented out ferm.conf and was able to run puppet as far as I know.

Change 536598 merged by Dzahn:
[operations/puppet@production] beta cluster: Make deployment-mediawiki-parsoid10 a MW scap target

https://gerrit.wikimedia.org/r/536598

Fixed ferm on deployment-mediawiki-parsoid10 and deployment-mediawiki-jhuneidi by restarting those machines (they were stuck on a ferm error about /sbin/iptables-restore and /sbin/ip6tables-restore not working)
This seems to have had a positive impact on puppet on deployment-mediawiki-parsoid10 such as it installing some missing things like /run/hhvm and /tmp/heaps.

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks