Page MenuHomePhabricator

Convert work machines (tin, terbium) to Trusty and hhvm usage
Closed, ResolvedPublic

Description

We have a need to have production work machines (tin, terbium) to have HHVM (and thus Trusty). Use case is debugging code that only fails in a particular scenario present in production and difficult or impossible to reproduce elsewhere.

eg:

sudo -u www-data hhvm -m debug /var/www/w/MWScript.php extensions/Flow/maintenance/foo.php --wiki=somewiki
NOTE: This also needs to be mirrored into Beta (somewhere, currently we use deployment-bastion like tin, maybe making a terbium-like in Beta?).
  • create Trusty instance to replace deployment-bastion.
  • migrate to the new instance (Jenkins, scripts, whatever is hardcoded)
  • provide Trusty replacement for tin
  • provide Trusty replacement for terbium
  • migrate prod
  • update deployment / work doc

Related Objects

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a subscriber: EBernhardson.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 16 2015, 5:30 PM
greg added a subscriber: greg.Jan 16 2015, 5:33 PM

Note: terbium (and tin) are both on Precise, thus don't have hhvm either. Erik has been using mw1017 in prod for this use case. We should upgrade terbium as well.

greg triaged this task as Normal priority.Jan 16 2015, 8:18 PM
greg set Security to None.
greg moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.
hashar added a subscriber: hashar.

Seems this should go to Operations , HHVM and MediaWiki-Core-Team and be rephrased to: "convert work machine (tin, terbium) to Trusty and hhvm usage" + mention beta cluster needs a new Trusty instance that replaces deployment-bastion.eqiad.wmflabs (tin equivalent).

greg renamed this task from Create a terbium clone for the beta cluster to Convert work machines (tin, terbium) to Trusty and hhvm usage.Jan 16 2015, 9:24 PM
greg updated the task description. (Show Details)
hashar updated the task description. (Show Details)Jan 16 2015, 9:32 PM
hashar updated the task description. (Show Details)

Thanks Greg. I have added some steps to the task description.

I could not find a project/Task related to the Trusty migration :-/

updated description again, to clarify that the scripts don't have any dependency on hhvm, it is being used for its gdb like debug console where you can set breakpoints and step through code as it runs from the CLI.

hashar removed a subscriber: hashar.Feb 3 2015, 12:24 PM
Dzahn added a subscriber: Dzahn.Feb 25 2015, 8:13 PM

Or we might switch them over to Debian jessie right away? What do other ops think in this case? still trusty or jessie already?

Dzahn added a comment.Feb 25 2015, 8:15 PM

Resolving this should also prevent reverts like https://gerrit.wikimedia.org/r/#/c/192866/

Joe added a subscriber: Joe.Apr 7 2015, 3:25 PM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptJun 30 2015, 5:44 AM
Dzahn added a comment.Jul 13 2015, 8:58 PM

Can't they go to jessie right away? I guess they can't because i hear we don't build HHVM for jessie.

Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJul 13 2015, 8:58 PM

Trusty replacement for tin = mira?

hashar added a subscriber: hashar.Sep 15 2015, 7:58 AM

Following on @Dzahn comment, should probably use Jessie instead of Trusty. If so:

bd808 added a subscriber: bd808.Oct 23 2015, 3:36 AM

Following on @Dzahn comment, should probably use Jessie instead of Trusty.

I don't think that Jessie is a good idea unless we are going to reimaging the MW servers to Jessie as well. Having the deployment staging server and the MW fleet running different operating systems sounds like a recipe for strange bugs caused by differing versions of PHP, HHVM, git, etc.

ori raised the priority of this task from Normal to High.Oct 23 2015, 6:33 AM
ori added a project: Blocked-on-Operations.
Joe added a comment.Oct 26 2015, 4:42 PM

I will start working on this in the next couple of weeks.

My current plan for tin is to ask people to use mira instead of tin, as mira is already using trusty. Once everyone (at least in releng) feels confident, we can move on.

For terbium, I still need to understand how much work - if any - will be needed.

bd808 added a comment.Oct 26 2015, 7:17 PM
In T87036#1753813, @Joe wrote:

My current plan for tin is to ask people to use mira instead of tin, as mira is already using trusty.

We will have to either implement cross-master syncing in scap or abandon tin for scap and sync-* for this to work. I started a patch to support cross master syncing (https://gerrit.wikimedia.org/r/#/c/224313/) but it had permissions issues with some files when tested on the beta cluster. At least on deployment-bastion, there are some files under /srv/mediawiki-staging that are not owned by the mwdeploy group.

@Krenair did an audit on tin and found similar issues that need to be resolved:

In production on tin I found that there are some files you can't write to without being root. In particular:
./docroot/noc/createTxtFileSymlinks.sh
./wmf-config/db-codfw.php
./wmf-config/db-eqiad.php
./private/WikitechPrivateLdapSettings.php
./tests/multiversion/MWMultiVersionTest.php

All of these look to me to be things that we could just fix the perms of. The wikitech settings file will need a Puppet change to fix.

Based on this audit I think all of the files that have ownership issues on deployment-bastion are accidents of the history of that server rather than by design, so we can probably just chmod as needed there.

bd808 added a comment.Oct 26 2015, 7:21 PM
In T87036#1753813, @Joe wrote:

For terbium, I still need to understand how much work - if any - will be needed.

The biggest potential issue I know of for terbium is that setting /usr/bin/php to be HHVM may actually make some maintenance scripts run via cron slower due to HHVM's JIT overhead. This really should only effect very short running scripts however and is probably something we can fix on a case by case basis after the conversion by changing some /usr/bin/php to /usr/bin/php5 when needed.

demon added a subscriber: demon.Oct 26 2015, 7:31 PM
In T87036#1753813, @Joe wrote:

My current plan for tin is to ask people to use mira instead of tin, as mira is already using trusty.

We will have to either implement cross-master syncing in scap or abandon tin for scap and sync-* for this to work. I started a patch to support cross master syncing (https://gerrit.wikimedia.org/r/#/c/224313/) but it had permissions issues with some files when tested on the beta cluster. At least on deployment-bastion, there are some files under /srv/mediawiki-staging that are not owned by the mwdeploy group.
@Krenair did an audit on tin and found similar issues that need to be resolved:

In production on tin I found that there are some files you can't write to without being root. In particular:
./docroot/noc/createTxtFileSymlinks.sh
./wmf-config/db-codfw.php
./wmf-config/db-eqiad.php
./private/WikitechPrivateLdapSettings.php
./tests/multiversion/MWMultiVersionTest.php

All of these look to me to be things that we could just fix the perms of. The wikitech settings file will need a Puppet change to fix.
Based on this audit I think all of the files that have ownership issues on deployment-bastion are accidents of the history of that server rather than by design, so we can probably just chmod as needed there.

Let's jfdi and fix them then.

(Note that WikitechPrivateLdapSettings actually comes from puppet so I don't think we need to worry about changes to that)

Joe added a comment.Oct 27 2015, 9:08 AM

@bd808 I will work on the wikitech settings right away, and I'll take a look at the other files as well.

Joe added a comment.Oct 27 2015, 9:17 AM

And btw yes - I think we could anyways move to use mira for the time being instead of tin.

Change 249076 had a related patch set uploaded (by Giuseppe Lavagetto):
wikitech: make private settings file writable by owner

https://gerrit.wikimedia.org/r/249076

Change 249076 merged by Giuseppe Lavagetto:
wikitech: make private settings file writable by owner

https://gerrit.wikimedia.org/r/249076

Joe added a comment.Oct 27 2015, 4:24 PM

The ldap settings file has now permissions that should allow scap not to choke on it.

How do we want to proceed from here?

bd808 added a comment.Oct 27 2015, 4:28 PM
In T87036#1757899, @Joe wrote:

The ldap settings file has now permissions that should allow scap not to choke on it.
How do we want to proceed from here?

We need to finish out T104826: [scap] Add support for syncing /srv/mediawiki-staging including fully working git data to warm spare deploy server which currently needs a Puppet patch to be reviewed and merged (https://gerrit.wikimedia.org/r/#/c/224829/).

mmodell added a subscriber: mmodell.Nov 9 2015, 5:50 PM

Is this really blocked on Blocked-on-RelEng?

Joe added a comment.Nov 16 2015, 4:07 PM

Terbium is now done; I'll look at reimaging tin next.

RobH added a subscriber: RobH.Dec 22 2015, 6:10 PM

I've removed the patch for review, as it seems that all pending patches on this task have been applied.

ori added a subscriber: ori.Jan 11 2016, 9:26 AM

This task will be one year old this Friday.

demon added a comment.Jan 11 2016, 4:44 PM
In T87036#1923927, @ori wrote:

This task will be one year old this Friday.

#newyearnewme

Joe claimed this task.Feb 2 2016, 9:28 AM
Joe added a comment.Feb 2 2016, 9:34 AM

And... it's done. tin has been reimaged to trusty as of now.

Joe closed this task as Resolved.Feb 2 2016, 9:35 AM
Paladox added a subscriber: Paladox.Jul 1 2016, 9:09 PM
This comment was removed by Paladox.