Page MenuHomePhabricator

Request increased quota for wmf-research-tools Cloud VPS project
Closed, ResolvedPublic

Description

Project Name: wmf-research-tools (original creation task: T186519)
Type of quota increase requested:

  • CPUs: +16
  • RAM: +48GB
  • Instance count: +8
  • Floating IP: none

Reason: the Research team has been prototyping many more machine learning model APIs in the last year (in part because Cloud VPS has been great! thank you thank you! the template we created for this is here if you're curious). We've maxed out our CPUs/RAM on the wmf-research-tools project as a result. There are an additional 3 instances, 6 VCPUs, and 28GB of RAM taken up in the recommendation-api project that really should be shifted into wmf-research-tools as well for better housekeeping and to open up needed resources there. In Q2/Q3, we expect to have projects on country classification (T263646), content reliability (T263860), and I'll be building out article importance APIs (T155541) that will go in the recommendation-api project.

Event Timeline

This request is approved, but can't be immediately granted.

We have quite a lot of hardware awaiting datacenter work right now and we need some of that back online before we can safely increase quotas; please check back in a week or two if there's no activity on this ticket before then.

Thanks for the update @Andrew ! 1-2 weeks is fine -- we like stable instances :) I'll check back then if I haven't heard.

bd808 changed the task status from Open to Stalled.Oct 29 2020, 5:30 PM
bd808 added a subscriber: bd808.

Stalled pending 2-3 of the open tasks under T216195: Move cloudvirt hosts to 10Gb ethernet being completed and those hypervisors being put back into active service.

update: we needed to order some new hardware to get those cloudvirts online so things are delayed a bit. Hopefully not more than another week or two :(

update: we needed to order some new hardware to get those cloudvirts online so things are delayed a bit. Hopefully not more than another week or two :(

Bummer to hear but thanks for the update and continuing to work on this!

@Andrew just checking in to see if we have a new expected date for these changes? Thanks!

Sorry @Isaac, we still have no news on this.

We have a few more hypervisors online now so will be granting the quota change soon.

One related question: we try to exclude easily-recreated VMs ('cattle') from our backup jobs in order to save space. Can you predict if any of your VMs (old or new) are safe to opt out of backups? and if so can you tell me what regex to use to match their hostnames?

We have a few more hypervisors online now so will be granting the quota change soon.

Yay, thanks!

One related question: we try to exclude easily-recreated VMs ('cattle') from our backup jobs in order to save space. Can you predict if any of your VMs (old or new) are safe to opt out of backups? and if so can you tell me what regex to use to match their hostnames?

I don't have a great answer for you largely because we haven't been taking any formal approach to naming instances. All of mine can be easily recreated if lost but the fact that they all end with -topic is more about their use case than need for backup :) For this particular project, most if not all of the instances are probably easily recreateable because they are research prototypes but I hesitate to speak for ther others. I'd be curious if there's a regex like hostnames that start or end with test or something like that that is globally applied? Because if that's the case, probably easier for us to just adapt to your norms.
@MGerlach @diego FYI but also if you have any thoughts.

I'd be curious if there's a regex like hostnames that start or end with test or something like that that is globally applied? Because if that's the case, probably easier for us to just adapt to your norms.

There isn't; the regex is applied per-project. So you can invent whatever norms you want!

(meanwhile... I've adjusted the quotas as requested so you can get on with your lives)

There isn't; the regex is applied per-project. So you can invent whatever norms you want!

Oooh fun! In that case, instances whose name ends with -test, -build, or -prototype would be the three regexes I'd feel comfortable putting in place for this project (and recommendation-api project if you'd like too). I can confirm that currently those regex would capture two projects that don't require backup and I'll start using them for new instances that won't require backup and try to document this for our team.

(meanwhile... I've adjusted the quotas as requested so you can get on with your lives)

Thank you! It seems we got a lot more RAM than we requested (which to be honest, I am NOT complaining about) but I assume it was a mistake in the config so FYI in case you want to drop it back :)

Hah, yes, clearly got an extra digit in the ram setting. Fixed now!

Change 647003 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup

https://gerrit.wikimedia.org/r/647003

Change 647003 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps VM backups: exclude some more hostnames from backup

https://gerrit.wikimedia.org/r/647003

bd808 assigned this task to Andrew.