User Details
- User Since
- Oct 28 2022, 3:06 PM (163 w, 1 d)
- Availability
- Available
- LDAP User
- Hghani
- MediaWiki User
- HGhani-WMF [ Global Accounts ]
Fri, Nov 14
Oct 31 2025
Updated my last notebook with some additional findings:
Oct 23 2025
Following up from the previous update:
@Snwachukwu Based on our discussion it sounds like using March 20-th to October 15 2025 to compute the quantiles and then defining whenever user share becomes >95th percentile as the alert threshold works reasonably well as a starting point. Standard deviation might work as well if we want a more specific signal - maybe as a possible future candidate.
Oct 21 2025
I think the dynamic/static mixed approach makes sense especially as our heuristics change/upgrade and become closer to a true user and automated split we would want to have the alert thresholds become more accurate along with it.
Oct 17 2025
The Wikimedia Descriptive Statistics was updated.
Oct 15 2025
@Snwachukwu Thanks for testing these methods.
Thanks for reviewing, I have summarised the observations so far below and I have added a new notebook to the repo that has streamlined the tables. This should hopefully be easier to interpret.
Oct 3 2025
Dashboard has been created and can be viewed here.
Oct 2 2025
Oct 1 2025
Sep 18 2025
In the team meeting we decided to go with:
Sep 10 2025
@OSefu-WMF everything should be up to date now except the pageviews-related items which I am intending to wait until after the backfill to complete. I've placed a placeholder note in those cells in the meantime if anyone takes a look.
Aug 29 2025
@JAllemandou Thanks for generating the test data. The domain data looks okay to me. I agree with @Mayakp.wiki that we don't want to lose the turnilo data for the reasons she mentioned. I don't think Superset can be treated as a substitute for the easy-to-use turnilo charts yet.
Aug 21 2025
Hi @JAllemandou, we also think it would be a good idea to add the access_method field to the domain tables.
Aug 15 2025
Jul 24 2025
duplicate
Jul 22 2025
Jun 17 2025
A sysop list was generated here (Data for up until June 16 2025) using this notebook. As discussed with @Qgil, the notebook can be run on demand and will generate a list of sysop in csv format with a list of sysops current up to the day the notebook is run.
Jun 13 2025
Update of some new findings:
Jun 11 2025
Jun 3 2025
Hi @Qgil, No problem, and I have a question for clarification: can you clarify on how to interpret the starting date of the constraint 2024-12-15? I can generate the notebook up until 2025-03-31 (all sysops until that point), but I am not sure how to interpret what the starting date represents.
May 30 2025
May 28 2025
Hi, @Qgil I should be able to generate new data by end of the week. I can share the data on this ticket once it is ready. Would it be useful or practical for you or your team to have the ability to alter a single line of code to generate the data on demand according to whatever date range you're interested in? If so I can include that customization capability once I pull the new data.
May 22 2025
May 21 2025
Apr 26 2025
Apr 18 2025
Apr 15 2025
A first draft was created here.
Apr 10 2025
Mar 26 2025
Mar 19 2025
@Mayakp.wiki Yes I updated the pageviews data.
Mar 13 2025
Updated the notes above with the following next steps for clarity:
Mar 12 2025
Notes outlining the initial scope and objectives for this project were documented here:
Feb 28 2025
@nshahquinn-wmf Updated the Github repo and archived it.
Feb 26 2025
Feb 24 2025
Jan 2025 snapshot was posted.
@Samwalton9-WMF Since we will be updating the wiki-comparison tool very shortly, just wanted to provide an update regarding the admins. Our contributor's pipeline is using the same definition of active admins as we see in the wiki-comparison tool and so to maintain consistency and avoid confusion, this year's wiki-comparison tool will continue using that definition. However, the feedback regarding local sysops is very useful and will be taken as a point of improvement for the existing metric in the contributor's pipeline.
Feb 21 2025
Data for jan 2025 and dec 2024 was added
Feb 17 2025
Feb 13 2025
We've also decided to add some additional data this year to group monthly active editors by the following edit buckets: 5 to 24, 25 to 99 and 100 plus. These will be in addition to the total monthly active editors that we already include in the tool.
Jan 31 2025
Jan 17 2025
Jan 14 2025
Jan 10 2025
Jan 9 2025
Jan 8 2025
@cmadeo Hi, I've pulled data for everything except the number of 'total new editors and total editors in 2024' which doesn't appear to be available yet (I will check again this week). Please review and let me know if there any questions/concerns.
Dec 20 2024
Nov and oct 2024 data was added.
Dec 18 2024
Thank you all for the quick turn around on the backfill!
Dec 4 2024
Yes, using research.article_topics. @Mayakp.wiki
Nov 28 2024
I've looked at the additional analysis questions mentioned in the last update and put the results on this google doc along with the underlying queries for review/replication.
Nov 26 2024
Uploaded results of this analysis to gitlab in a notebook.
Nov 25 2024
We completed an initial impact analysis on the proposal to apply automatic traffic at the actor_signature_per_project_family level which we expect will significantly reduce the unique devices count on Singapore, and will be consulting with DPE on implementation/next steps this week.
Nov 22 2024
@cmadeo Hi, I've added the data in our sheet.
Nov 21 2024
The data has been updated with today's pull.
Data was provided in this google sheet and it includes the 10 most edited wikipedias with their edit counts per the new requirements.
Nov 4 2024
@cmadeo I've updated the sheet with the editor data. Few caveats: new and existing editors don't count anonymous since we can't tell anonymous editors apart; the survey data that talks about the gender identity of our editors is only to active editors; and I interpreted bytes added as net bytes added in 2024.
@EdErhart-WMF I've now added the most edited articles and the largest increase in readership (I went with absolute numbers because % showed small articles increasing over a day). Let me know if you have any questions!
Nov 1 2024
I've added the data requests to this google sheet. The only request outstanding should be Largest percentage increase in readership in a single day [EDIT: to a single article or an entire wiki] which will be added on Monday.
I've updated this sheet with some of the data. I believe it contains all the requested data for this data pull except for editor data which won't be available until early next week. As a note for the top 5 pages for every country: I filtered out page titles that we have recognized in the past to be largely saturated by automated viewership or accidental traffic from people trying to get to a site but end up on the wikipedia page (such as typing youtube in their browser). There are also other page titles that look like their traffic is largely automated (which I have highlighted in red - rule of thumb is if mobile pageviews is too high, in this case near 100%, or if referrer type null (i.e., missing a referrer) is over 90% then we would see that as autoamted traffic. I didn't remove these pages from the list (I can if you'd like!) so that you can review them. I've also added more than top 5 to account for any pages that will be removed.
Oct 31 2024
@JAllemandou
The first makes sense.
Oct 29 2024
For the temporary solution, we will add a NOT LIKE '~2%' filter to our monthly metrics queries rather than modifying the underlying tables, which will be addressed later with a permanent solution. The following adjustments will be made: mobile_edits.sql will include the NOT LIKE '~2%' filter under the WHERE clause; active_editors.sql will have the same filter added in the first CTE; and new_editor_retention.sql will include the filter in the WHERE clause. We will defer updates to regional active editors until after the permanent solution is implemented, as these tables do not currently support filtering by user name.
Oct 25 2024
@EdErhart-WMF Sure, that shouldn't be a problem.
