Page MenuHomePhabricator

Update quotas for MWoffliner VPS
Closed, ResolvedPublic

Description

Things continuously evolve on Wikipedia Offline Scraping side.

We have during the last 18 months massively improved the performances of the MWoffliner scraper by allowing to create offline ZIM files "on the fly" without using much of fs IO and by optimising seriously CPU/RAM mgmt. There is still a few additional things to do, but most of what is easy to do has been done.

We have as well fully automated the ZIM generation using a scheduling platform called Zimfarm available at https://farm.openzim.org. This fully dockerized application allows us to have an autonomous ZIM building solution which rationally deals with the scrapes to do. This workload is then distributed on decentralised workers. The 5 mwoffliner VPS boxes are in charge of doing all the Wikimedia projects (in different ZIM flavours: no picture, no videos, etc..).

All of this works fine but we still struggle a bit to regenerate all the Wikimedia ZIM files on time once a month - which is the goal we agreed with the WMF. Last upgrade of our quota happened 4 years ago https://phabricator.wikimedia.org/T117095#2807242 but the amount of content to deal with increases constantly. This is why I come back to ask to update our quota.

We have 5 VPS VMs. mwoffliner4 and mwoffliner5, the one created in 2016 have the xlarge-xtradisk profile and we are really happy with them. But the 3 other ones (mwoffliner1, mwoffliner2, mwoffliner3) have pretty old/small(er) profiles which are not really adapted anymore to our usage. My request would be to homogenise the MWoffliner setup to xlarge-xtradisk for these 3 older ones as well.

If my computation is right, this should allow us to achieve our goal of regular monthly release for all the WMF ZIM files.

Here the format request:
Project Name: mwoffliner
Type of quota increase requested (we don't need extra floating ips):

  • mwoffliner1: m1.xlarge -> xlarge-xtradisk
  • mwoffliner2: m1.xlarge -> xlarge-xtradisk
  • mwoffliner3: m1.large -> xlarge-xtradisk
  • mwoffliner4: keep the same
  • mwoffliner5: keep the same
  • wp1: keep the same

Amount of quota increase: In a nutshell, more disk one additional xlarge-xtradisk
Reason: We need to be quicker (xlarge-xtradisk vcpus car twice as fast than others) and be able to make largest scrapes in parallel without coming short on disk

Event Timeline

Kelson created this task.May 28 2020, 8:23 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 28 2020, 8:23 AM

Heja @Kelson, could you please check https://phabricator.wikimedia.org/project/view/2880/ and make sure that all the info required is in this task? Thanks :)

Kelson updated the task description. (Show Details)May 28 2020, 10:54 AM

@Aklapper Thx for pointing me to this, I have updated the task with the expected information.

bd808 removed Andrew as the assignee of this task.May 31 2020, 7:21 PM
bd808 triaged this task as Medium priority.
bd808 added a subscriber: Andrew.
Andrew assigned this task to aborrero.Jun 3 2020, 3:55 PM

This is approved. You don't actually need a quota change for the disk space but we'll adjust quota to move that large VM to an xlarge vm. If you need a temporary bump in order to do rebuilds just let us know.

Kelson added a comment.Thu, Jun 4, 6:13 AM

@Andrew Thank your very much for this! I have been able to delete mwoffliner1 and recreate it successfully with a xlarge-xtradisk profile. The VM is up and running. I wanted to recreate mwoffliner3 the same way, deleted it but failed to create a new xlarge-xtradisk instance. It seems the quota is not proper (too low). Do I'm wrong somewhere?

Kelson added a comment.Thu, Jun 4, 6:34 AM

@Andrew I have been able to recreate mwoffliner2 properly. I believe 4 VCPUs and 8GB or RAM are missing in the quota.

BTW i have remarked we have 3 floating IPs reserved for the mwoffliner project. We use only one and don't plan to use more for the moment. Feel free to take out 2 so other projects can benefit of them.

Mentioned in SAL (#wikimedia-cloud) [2020-06-04T09:10:47Z] <arturo> refresh project quotas: 56 vCPUs (from 44) 96GB RAM (from 88) and 1 floating IP (from 3) (T253836)

aborrero closed this task as Resolved.Thu, Jun 4, 9:12 AM

hey @Kelson you should be all set now. The quota was refreshed just now,

@Andrew was just commenting the outcome of our team meeting and didn't do the actual change (it is me in clinic duty this week!).

I'm closing this task now. Please feel free to reopen if required!

Mentioned in SAL (#wikimedia-cloud) [2020-06-04T09:13:51Z] <arturo> refresh project quotas: 52 vCPUs (typo/wrong math in previous quota) (T253836)

Kelson added a comment.Thu, Jun 4, 7:14 PM

@aborrero Thank you very much. Everything works like a charm now!