Replace bare-metal promethium.wikitextexp.eqiad.wmflabs with a VM or VMs
Closed, ResolvedPublic

Description

Currently the parsing team is running a bunch of big tests on Promethium. Promethium was a one-off bare-metal-in-the-cloud experiment; it's out of warranty and we don't want to move any of those bare-metal hacks into the new Neutron labs deployment, eqiad1.

Subbu tells me that ideally he'd like a one-for-one replacement (one big VM) but could possibly live with a cluster of smaller ones.

Prometheum's specs are:

32Gb Ram
400 Gb disk
12 cores

It seems silly to build that out in eqiad only to have to immediately copy it over to eqiad1, so let's wait and have this be an early use of eqiad1 once it's ready to go. Tagging @bd808 in case he thinks this has budget implications. As a single big VM it's a big ask but the actual resource implications aren't outrageous for an internal project request.

Andrew created this task.Aug 3 2018, 7:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2018, 7:49 PM

I created a special flavor, 'parsingtest' and a new big VM, 'parsing-qa-01'. That should work as a promethium replacement. It might be a bit hard to migrate things over from the old promethium until T202636 is resolved.

ssastry added a comment.EditedTue, Oct 9, 5:15 PM

Update:

  • All the repos have been mirrored on parsing-qa-01
  • All the dbs have been mirrored on parsing-qa-01
  • I installed / built required packages
  • I set up the pngs directory for dumping the visual diff images
  • I set up all services and mirrored their service files from promethium
  • I set up nginx, and various setup and config files
  • I mirrored /etc/hosts (with the existing comment # SSS: Temporary hack while we wait for T132216 to be resolved)
  • I started a new run of the tidy vs remex visual diffing code there.

So far so good.

Still todo:

  • Once the test completes successfully, the DNS entries for mw-expts-vd.wmflabs.org should be updated to point to parsing-qa-01
  • After that, I'll verify that all the other web UI and web services run properly.
  • After this, we are good to decom promethium

@Andrew Everything looks good so far. I had to tweak some configs (because some of the dir paths are slightly different) but otherwise, all looks good to me. I've updated the DNS in horizon to point to parsing-qa-01.

So, as far as I am concerned, you can decom promethium on your own timeline.

Andrew closed this task as Resolved.Mon, Oct 15, 4:48 PM

thanks subbu! Will start the decom now.