Project Name: Wikidumpparse

Type of quota increase requested: Custom Flavour `g2.cores8.ram16.disk1500`

Reason: The main point of this project is to pre-compute wikidata-dump level statistics about all humans, and store them in a database. Allow me to explain the calculation that leads to the 1.5TB request. Detailed calculation this google doc.

- The version 1, proof of concept , of this project "Denelezh" ran for 2 years and generated 300GB of data. See T263703#6622726
- Assuming we want this to run for 3 years or 150 weeks. (Project follows the weekly wikidata dump cycle).
- Empirically, from the proof of concept, the "input" data can be modelled by
`y=(4.5+0.005x)GB`where`x:=weeks since launch` - Empirically, from the proof of concept, the "output" data can be modelled by
`y=(4.5+0.013x)GB`where`x:=weeks since launch` - Given the strategy of keeping just the last 1-year of input data after 3 years, we have
`integral y=5.5+0.005x from 0 to 50 weeks`(wolfram alpha link) - Given keeping all of the output data for 3 yeas, we have
`integral y=4.5+0.013x from 0 to 150 weeks`

Totals:

total (GB) | description | calculation |

256 | last 1 year of input data | integral y=5.5+0.005x from 0 to 50 weeks |

821 | 3 years of output data | integral y=4.5+0.013x from 0 to 150 weeks |

300 | 2 years of 'backfill data' from denelezh.org (WMFR) | empirical |

75 | 5 years of 'backfill data' from whgi.wmflabs.org | empirical |

1452 | grand total for 3 years of running humaniki project | sum of above |

Thank you in advance. I am happy to answer any questions you have.

-Max