Project Name: Wikidumpparse
Type of quota increase requested: Custom Flavour g2.cores8.ram16.disk1500
Reason: The main point of this project is to pre-compute wikidata-dump level statistics about all humans, and store them in a database. Allow me to explain the calculation that leads to the 1.5TB request. Detailed calculation this google doc.
- The version 1, proof of concept , of this project "Denelezh" ran for 2 years and generated 300GB of data. See T263703#6622726
- Assuming we want this to run for 3 years or 150 weeks. (Project follows the weekly wikidata dump cycle).
- Empirically, from the proof of concept, the "input" data can be modelled by y=(4.5+0.005x)GB where x:=weeks since launch
- Empirically, from the proof of concept, the "output" data can be modelled by y=(4.5+0.013x)GB where x:=weeks since launch
- Given the strategy of keeping just the last 1-year of input data after 3 years, we have integral y=5.5+0.005x from 0 to 50 weeks (wolfram alpha link)
- Given keeping all of the output data for 3 yeas, we have integral y=4.5+0.013x from 0 to 150 weeks
Totals:
total (GB) | description | calculation |
256 | last 1 year of input data | integral y=5.5+0.005x from 0 to 50 weeks |
821 | 3 years of output data | integral y=4.5+0.013x from 0 to 150 weeks |
300 | 2 years of 'backfill data' from denelezh.org (WMFR) | empirical |
75 | 5 years of 'backfill data' from whgi.wmflabs.org | empirical |
1452 | grand total for 3 years of running humaniki project | sum of above |
Thank you in advance. I am happy to answer any questions you have.
-Max