Page MenuHomePhabricator

Custom Flavour for Wikidumpparse Cloud VPS project
Closed, ResolvedPublic

Description

Project Name: Wikidumpparse
Type of quota increase requested: Custom Flavour g2.cores8.ram16.disk1500
Reason: The main point of this project is to pre-compute wikidata-dump level statistics about all humans, and store them in a database. Allow me to explain the calculation that leads to the 1.5TB request. Detailed calculation this google doc.

  • The version 1, proof of concept , of this project "Denelezh" ran for 2 years and generated 300GB of data. See T263703#6622726
  • Assuming we want this to run for 3 years or 150 weeks. (Project follows the weekly wikidata dump cycle).
  • Empirically, from the proof of concept, the "input" data can be modelled by y=(4.5+0.005x)GB where x:=weeks since launch
  • Empirically, from the proof of concept, the "output" data can be modelled by y=(4.5+0.013x)GB where x:=weeks since launch
  • Given the strategy of keeping just the last 1-year of input data after 3 years, we have integral y=5.5+0.005x from 0 to 50 weeks (wolfram alpha link)
  • Given keeping all of the output data for 3 yeas, we have integral y=4.5+0.013x from 0 to 150 weeks

Totals:

total (GB)descriptioncalculation
256last 1 year of input dataintegral y=5.5+0.005x from 0 to 50 weeks
8213 years of output dataintegral y=4.5+0.013x from 0 to 150 weeks
3002 years of 'backfill data' from denelezh.org (WMFR)empirical
755 years of 'backfill data' from whgi.wmflabs.orgempirical
1452grand total for 3 years of running humaniki projectsum of above

Thank you in advance. I am happy to answer any questions you have.
-Max

Event Timeline

How much data do you have today? There are 2 reasons that I ask:

  1. There is a very low probability of single cloud vps instance surviving for a 3 year life cycle. It is very likely that an instance you make today will need to be replaced in ~2 years max for operating system deprecation.
  2. I would hope that within the next 12 months we will be able to offer "attachable block storage" using OpenStack Cinder. That feature will functionally be like getting a second hard disk for an instance with the added bonus of that disk being separate from the operating system disk. This will allow the attached storage to be unmounted from instance A and mounted to instance B which will make OS upgrades much, much simpler for large storage needs like yours.

Hi @bd808 thanks for the response.

It is very likely that an instance you make today will need to be replaced in ~2 years max for operating system deprecation.

Fair point, I updated my calculations for a 2 years run way.

I would hope that within the next 12 months we will be able to offer "attachable block storage" using OpenStack Cinder.

That'd be excellent, and perfect for our use case.

∴ Given those two facts. Let's plan on our instance just lasting two years. Updated calculations would indicate that we'll need 1.12TB for 2 years. At that point we would be happy to migrate to attached storage. Does that sound like a reasonable path forwards?

Andrew subscribed.

This is approved, someone will create the new flavor soon

dcaro added subscribers: aborrero, dcaro.

New flavor created and assigned to the project, @notconfusing can you verify and close the task if ok?
Or bounce back to me if there's any issues ;)

# wmcs-openstack flavor show g2.cores8.ram16.disk1120
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                                                        |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                                                        |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                            |
| access_project_ids         | wikidumpparse                                                                                                                                                |
| disk                       | 1120                                                                                                                                                         |
| id                         | 46ce2174-f133-4561-8404-83707ee11367                                                                                                                         |
| name                       | g2.cores8.ram16.disk1120                                                                                                                                     |
| os-flavor-access:is_public | False                                                                                                                                                        |
| properties                 | aggregate_instance_extra_specs:ceph=''true'', quota:disk_read_iops_sec=''5000'', quota:disk_total_bytes_sec=''200000000'', quota:disk_write_iops_sec=''500'' |
| ram                        | 16384                                                                                                                                                        |
| rxtx_factor                | 1.0                                                                                                                                                          |
| swap                       |                                                                                                                                                              |
| vcpus                      | 8                                                                                                                                                            |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

Mentioned in SAL (#wikimedia-cloud) [2020-11-26T15:53:31Z] <dcaro> Created private flavor g2.cores8.ram16.disk1120 for wikidumpparse (T268190)

Mentioned in SAL (#wikimedia-cloud) [2020-11-26T15:53:48Z] <dcaro> Created private flavor g2.cores8.ram16.disk1120 (T268190)