Page MenuHomePhabricator

Fix overcounting of active administrators in the wiki segmentation dataset
Closed, ResolvedPublic

Description

@SPoore was investigating the wiki segmentation dataset and noticed issues in the counting of active administrators:

On my spreadsheet I made a column that listed the total number of admins for the wikis at the top of the sheet. All of them are showing more active admins than the total. Some by double or more so it can't be attrition.

From looking at the definition of active admin that you used, I'm wondering if you double or triple counted admins that used the different admin tools?

Event Timeline

@revi suggested that this might be due to the fact that ordinary users can delete a redirect by moving the page it redirects to on top of it (example).

I'm not sure how we can exclude these moves—maybe we need to just check whether the performing user is an admin?

nshahquinn-wmf raised the priority of this task from Medium to High.Aug 10 2018, 4:45 PM

Yes, it makes sense that it could be including non-admins with page move rights. https://en.wikipedia.org/wiki/Wikipedia:Page_mover#Suppressredirect

I'm wondering if there are other types of non-admin rights that could be causing other similar situations.

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptAug 25 2018, 3:54 AM

I looked at a few of the other possible user action that might show up as admin actions but nothing showed up as obvious.

nshahquinn-wmf renamed this task from Investigate overcounting of active administrators in the wiki segmentation dataset to Fix overcounting of active administrators in the wiki segmentation dataset.Apr 22 2019, 9:09 PM
nshahquinn-wmf lowered the priority of this task from High to Medium.Aug 20 2019, 6:25 PM
nshahquinn-wmf added a subscriber: PEarleyWMF.

I've figured out the problem and come up with a correct version of the query. More details in this notebook.

I've got an email out to @PEarleyWMF and @SPoore asking for review.

nshahquinn-wmf added a subscriber: Iflorez.

This is done! I've gotten helpful input and advice from the Trust and Safety team, documented this metric at meta:Research:Active administrators, and forwarded the correct query to @Iflorez to be included in T221566.

@Iflorez, here is the corrected query:

maa = hive.run("""
select 
    wiki as database_code,
    sum(monthly_active_administrators) / 12 as monthly_active_administrators
from (
    select
        wiki_db as wiki,
        substr(log_timestamp, 1, 6) as month,
        count(distinct log_actor) as monthly_active_administrators
    from wmf_raw.mediawiki_logging
    where
        log_type in ("block", "delete", "protect", "rights") and
        -- Omit the "delete_redir", "move_prot", and "autopromote" actions, which can be done by regular users
        log_action not in ("autopromote", "delete_redir", "move_prot")
        log_timestamp >= "{start}" and
        log_timestamp < "{end}" and
        snapshot = "{snapshot}"
    group by wiki_db, substr(log_timestamp, 1, 6)
) mae
group by wiki
""".format(start="201808", end="201908", snapshot="2019-07"))