Project Name: Wikidumpparse
Type of quota increase requested: Custom Flavour g2.cores8.ram16.disk1500
Reason: The main point of this project is to pre-compute wikidata-dump level statistics about all humans, and store them in a database. Allow me to explain the calculation that leads to the 1.5TB request. Detailed calculation this google doc.
- The version 1, proof of concept , of this project "Denelezh" ran for 2 years and generated 300GB of data. See T263703#6622726
- Assuming we want this to run for 3 years or 150 weeks. (Project follows the weekly wikidata dump cycle).
- Empirically, from the proof of concept, the "input" data can be modelled by y=(4.5+0.005x)GB where x:=weeks since launch
- Empirically, from the proof of concept, the "output" data can be modelled by y=(4.5+0.013x)GB where x:=weeks since launch
- Given the strategy of keeping just the last 1-year of input data after 3 years, we have integral y=5.5+0.005x from 0 to 50 weeks (wolfram alpha link)
- Given keeping all of the output data for 3 yeas, we have integral y=4.5+0.013x from 0 to 150 weeks
|256||last 1 year of input data||integral y=5.5+0.005x from 0 to 50 weeks|
|821||3 years of output data||integral y=4.5+0.013x from 0 to 150 weeks|
|300||2 years of 'backfill data' from denelezh.org (WMFR)||empirical|
|75||5 years of 'backfill data' from whgi.wmflabs.org||empirical|
|1452||grand total for 3 years of running humaniki project||sum of above|
Thank you in advance. I am happy to answer any questions you have.