Steps to replicate the issue (include links if applicable):
- Run https://public-paws.wmcloud.org/User:Miriam_(WMF)/queering-wp.ipynb using the data in this snapshot 'part-00000-4001ddac-9eed-47bb-b0ae-251784f815ca.c000.csv' (link)
What happens?:
When we calculate the aggregate using the latest (7/29) snapshot of the data, the totals are drastically lower than the April snapshot.
April :
gender article_created_value
0 female 2,357,289
1 male 9,812,451
2 non-binary 12,487
June :
gender article_created_value
0 female 199,937
1 male 562,693
2 non-binary 1,154
What should have happened instead?:
The numbers should have increased slightly or stayed the same.
Other information (browser name/version, screenshots, etc.):
There seems to be an issue with the latest 7/29 snapshot of the Gender gap data. Im not a 100% sure if this is due to the root causes mentioned in T343067. But this needs to be investigated and fixed.
*For now I am adding this as a subtask to T343067 for the ability to track content gap data issues. we can separate them if the root cause is different.