Description
To determine important modules that are candidates for centralizing in Abstract Wikipedia some data analysis needs to be performed. So far data has been collected from API and databases across all wikis and stored in toolforge user database. For the next step priority modules need to be determined based on usage, pageviews, links etc.
After analysis, a relative scoring metric was devised to identify important modules.
Findings
Brief compilation of all the findings from my data analysis so far is attached here.
Scoring modules
An example-based documentation of the scoring metric is attached here.
In short:
- Get a limit value from data distribution for each feature
- Modify distribution to set limit as ~87% or less
- Get feature score as 'percentile of the value in the modified distribution'
- Get the total score of modules as the weighted sum of the feature scores.
Work
Notebook for data analysis to find priority modules: Aisha's Notebook IV
Notebook to detect priority modules: Aishas Notebook V