Page MenuHomePhabricator

cloudvps: dumps project trusty deprecation
Closed, ResolvedPublic

Description

Ubuntu Trusty is no longer available in Cloud VPS since Nov 2017 for new instances. However, the EOL of Trusty is approaching in 2019 and we need to move to Debian Stretch before that date.

All instances in the dumps project needs to upgrade as soon as possible.

The list of affected VMs is:

  • bugzilla.dumps.eqiad.wmflabs
  • dumps-stats.dumps.eqiad.wmflabs

Listed administrators are:

More info in openstack browser: https://tools.wmflabs.org/openstack-browser/project/dumps

Event Timeline

Krenair triaged this task as Medium priority.Sep 17 2018, 12:21 PM
Krenair created this task.

Hey @Nemo_bis @Hydriz! Just a friendly reminder that you should get rid of your Trusty instances as described in https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects. The deadline is 2018-12-18. Please get in contact if you need help. Also please assign this task to an individual.

Hey @Nemo_bis @Hydriz! Just a friendly reminder that you should get rid of your Trusty instances as described in https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects.

Thanks for the reminder. In that page I don't see what's the preferred method to do this. Are we really supposed to delete the instances and recreate from scratch?

Thanks for the reminder. In that page I don't see what's the preferred method to do this. Are we really supposed to delete the instances and recreate from scratch?

From that page:

Since an Ubuntu -> Debian migration is not as easy as using apt-get, we recommend that you just rebuild VM instances, drop & create them again using the Debian Stretch base image.

Hope this helps.

Hey @Nemo_bis @Hydriz! Just a friendly reminder that you should get rid of your Trusty instances as described in https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects.

Thanks for the reminder. In that page I don't see what's the preferred method to do this. Are we really supposed to delete the instances and recreate from scratch?

That's a fairly common procedure, and yeah I think that's the best one for this. It's been suggested that it's possible to in-place migrate from Ubuntu to Debian but I don't know the details and wouldn't advise it. Instances should be made afresh from time to time anyway to ensure that any non-puppetised things on them is well known and can be reproduced on new hosts should the old ones be lost.

Another ping. Deadline is approaching (2018-12-18).

Sadly, nothing in the dumps project is puppetized and it's bundled together using a lot of tape. I am okay with both instances being deleted and recreated, but those two servers are mainly used by @Nemo_bis, so he will have to comment.

Offhand I can remember that:

  • bugzilla is hosting, well, Bugzilla at https://bugs.wmflabs.org. @Nemo_bis, do we still need to maintain this archive since the migration was rather long ago and lots of information inside are probably stale? I'm hoping to free up these resources to provision another worker server for Balchivist (i.e. dumps-4).
  • dumps-stats is hosting the main MySQL server for Balchivist, but since things stopped working since mid-2018 and I haven't gotten the time to fix it, we can probably start from scratch again with new instances and possibly with puppet. Other than that, this server is only used for processing dumps, but shouldn't be an issue under Debian.

An update regarding dumps-stats: It uses an old enough base image that it's difficult to gracefully migrate it to the new region. So, I'm moving to a lower-priority virt host and leaving it in the old region for now.

That means there may be some connectivity issues between VMs in your project (since the other VMs are moving to new IPs). It also means that that VM will need to be recreated in the new region (with Debian) so the old one can be deleted.

I tried to create a replacement instance dumps-0 in the new region but I was not able to log in from the bastion host. I hence deleted it, but the project quotas seems to be incorrectly showing 6 instances when it should be 4. Can this be fixed before we attempt to recreate this instance from scratch in the new region?

Also, in the meantime, how do you resolve the connectivity issues with dumps-stats? The other instances depend on this instance as the main MySQL server so now nothing can get done as the data is not accessible outside of this instance.

I tried to create a replacement instance dumps-0 in the new region but I was not able to log in from the bastion host.

Were you getting network errors or access denied? Are the security group rules set to allow SSH in from eqiad1-r IPs? What was showing in the instance's console log in horizon?

I tried to create a replacement instance dumps-0 in the new region but I was not able to log in from the bastion host.

Were you getting network errors or access denied? Are the security group rules set to allow SSH in from eqiad1-r IPs? What was showing in the instance's console log in horizon?

Access denied, public key is denied. Security groups should be fine since the SSH connection can be made, up till the authentication part. Console log shows nothing out of the ordinary. It's worth noting that other instances work fine in both regions.

I tried to create a replacement instance dumps-0 in the new region but I was not able to log in from the bastion host.

Were you getting network errors or access denied? Are the security group rules set to allow SSH in from eqiad1-r IPs? What was showing in the instance's console log in horizon?

Access denied, public key is denied. Security groups should be fine since the SSH connection can be made, up till the authentication part. Console log shows nothing out of the ordinary. It's worth noting that other instances work fine in both regions.

Interesting, so the new instance is coming online broken. Do you apply any classes to the instance by default? Are you sure there are no puppet errors in the console log? Perhaps someone could try as root?

I tried to create a replacement instance dumps-0 in the new region but I was not able to log in from the bastion host.

Were you getting network errors or access denied? Are the security group rules set to allow SSH in from eqiad1-r IPs? What was showing in the instance's console log in horizon?

Access denied, public key is denied. Security groups should be fine since the SSH connection can be made, up till the authentication part. Console log shows nothing out of the ordinary. It's worth noting that other instances work fine in both regions.

Interesting, so the new instance is coming online broken. Do you apply any classes to the instance by default? Are you sure there are no puppet errors in the console log? Perhaps someone could try as root?

Project-wide, the role::labs::lvm::srv puppet class is enabled. I subsequently tried to enable role::mariadb::core and role::simplelamp on dumps-0 instance, but all the while the public key being denied. Soft and hard reboots of the instance didn't help either, and puppet seems to run fine. I would try again by creating another instance in case this was an isolated incident but another issue with the incorrect usage of resources cropped up too.

I'm done with moving the stuff that I need over to dumps-0. Now it's left with @Nemo_bis to sign off once he verifies that there is nothing on the server that we need.

Ping. Today is Friday and the deadline is Tuesday.

Hi! Since the deadline already passed, we agreed on shutting down remaining Trusty instances on 2019-01-18. More info at https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects

It seems I need to update my forwarding to login on those machines, but I manage from the bastion. I'll copy my remaining files now, it shouldn't be much.

Ok, I've checked and all I need to preserve is in the home folder. I've also pruned /srv. The VM can be deleted. Sorry for the wait!

@Nemo_bis to clarify you're talking about dumps-stats, or bugzilla?

Mentioned in SAL (#wikimedia-cloud) [2019-01-21T16:34:27Z] <gtirloni> T204503 shutdown ubuntu VM: bugzilla

Mentioned in SAL (#wikimedia-cloud) [2019-01-21T16:50:08Z] <arturo> T204503 shutdown ubuntu VM: dumps-stat

Mentioned in SAL (#wikimedia-cloud) [2019-01-21T16:50:16Z] <arturo> T204503 shutdown ubuntu VM: dumps-stats*

Mentioned in SAL (#wikimedia-cloud) [2019-03-14T23:18:43Z] <bd808> Deleted dumps-stats and bugzilla (T204503)