Page MenuHomePhabricator

SDS 1.2.2 Support quantitative analysis and reporting
Open, In Progress, HighPublic

Description

Q1 goals

Quantitative support related to metrics

  • Contribute to definition of metrics
  • Contribute to data pulls and preliminary analyses

Q2 goals

Quantitative support related to survey

  • query lists of current admins
  • query lists of potential admins

Quantitative support related to metrics

  • Data pulls, calculations, analysis to answer quantitative questions
  • Reporting on meta-wiki

Event Timeline

No major updates related to the direct support of analysis and reporting, but

  • Familiarizing myself with relevant terminology
  • Familiarizing myself with relevant data
  • Helping with planning docs and beginning annotated bib
CMyrick-WMF renamed this task from Support analysis and reporting of survey results to Support quantitative analysis and reporting.Aug 16 2024, 7:12 PM
CMyrick-WMF updated the task description. (Show Details)
CMyrick-WMF changed the task status from Open to In Progress.Aug 16 2024, 8:55 PM

Weekly update:

  • Helped Claudia and Yu-Ming with survey distribution criteria
  • Worked on annotated bibliography
  • Prepped for quantitative (metrics) support work
    • Eli and I met with Diego, who joined the group to serve as lead for the quantitative portion of work
    • Diego and I will work closely together in the coming weeks on metrics definitions, calculations, etc.

Weekly update:

(with @diego)

Next week:

  • Finish debugging/QA
  • Plot and publish basic trends
  • Review/update questions we can answer from initial list
  • Provided data needed for list of wiki criteria

Weekly update:

  • Continued exploring data sources and variables
  • Continued debugging baseline queries
  • Plotted basic trends:
    • Active admins (monthly) per wiki, 2018-pres
    • Arrivals and departures (monthly) per wiki, 2018-pres
    • Total admins (yearly, yearly) per wiki, 2018-pres
    • Net changes and % changes (monthly) per wiki, 2018-pres

Next week

  • Debugging/QC: some wikis have more instances of arrivals than they do total sysops; (maybe relatedly) some wikis have only instances of arrivals, and no instances of departures
  • Start building a dataset to help as examine
    • distribution of admins' monthly activity levels (low, med, and high)
    • admins' activity levels/changes prior to departure
    • admins' activity levels/changes after departure

(with @diego )

Weekly updates:

  • Quantitative exploration of “active admins” vs. “all admins”.
  • Working on a way to systematically define ‘increasing’, ‘decreasing’, and ‘stable’ (for admins over time).
  • Learning about user groups with administrative rights (e.g., reversores on ptwiki; eliminators on fawiki, jawiki, and ptwiki).
  • Quantitative exploration of the Special:ListGroupRights data, per wiki, via scraping/parsing and plotting

Learnings
New learnings about cross-wiki variation related to user groups with administrative rights, based on quantitative exploration of “active admins” vs. “all admins”:

  1. Our definition of "admins" needs to be expanded to include user groups beyond sysop which have at least some (one or more action falling under log_type block, protect, delete, or rights)
  2. Our definition of “active admins” will align with that of
  1. Our initial reporting of increases/decreases/stability of admins over time (see last week’s update) needs to be revised. We utilized “sysop” and “desysop” actions in the logging data table, which we have determined is an inefficient method, because it
    • excludes non-sysop groups which have administrative rights on other wikis (e.g., eliminators), and
    • doesn’t account for blocks that cause the loss of admin rights
    • doesn’t account for admins who leave/are no longer active, but whose wiki doesn’t apply desysoping for inactivity

Weekly updates:

  • Finalized definitions for admins, admin activity, active admins, inactive admins, former admins, and potential admins
    • Determined user groups to include in the “admin” definition (See parse_user_rights.ipynb Plot 1)
    • Will post definitions on meta-wiki next week
  • Determined method for classifying admin activity over time as
    • Wikis' increase, decrease or stability will be defined by a linear regression slope; then, these slopes are divided into one of 4 quartiles
    • Method: We ran linear regressions for each wiki's active admins over time lm(admin_actions~date) and allocated the regression line slopes into quartiles. (See active_admins_over_time.ipynb, esp Plot 3B for regression lines)

Learnings:

  • Used parseer script to scrape the user rights info for each of the 21 wikis, to identify user groups which have (at least some) admin rights.
    • On all 21: sysop
    • On eswiki: sysop, botadmin
    • On fawiki: sysop, botadmin, image-reviewer
    • On itwiki: sysop, botadmin
    • On jawiki: sysop, eliminator, interface-admin
    • On ptwiki: sysop, eliminator, rollbacker
    • On ruwiki: sysop, closer
    • (See Plot 1 here)

Weekly updates:

  • Aligned and merged definitions documentation
  • Finished drafting in sandbox for meta-wiki definitions page
  • Synced with Movement Comms
  • Planned analysis prep work for Q2
  • Began working on queries needed for two surveys (query 1: list of current admins on the 5 wikis; query 2: list of potential admins on the 5 wikis, with "potential admin" being a user who currently meets the formal requirements for adminship but is not currently an admin)

Learnings:

  • There has been a suggestion made to update Movement Insights' methods of calculating monthly average admins.
    • We'll want to follow that ticket, since our definition of active admins comes from Movement Insights' definition?
    • In the meantime, we can plan to explore ways to include _and_ exclude global admins/sysops from our analysis and examine the different; also share code with Movement Insights

Weekly updates:

Queried lists of current admins and potential admins for the 6 wikis, to use for survey solicitation communication. Below are the criteria used to query these lists.

LISTS OF CURRENT ADMINS

  • enwiki: users in sysop user group, excluding bots*
  • eswiki: users in sysop or botadmin user group, excluding bots*
  • frwiki: users in sysop user group, excluding bots*
  • idwiki: users in sysop user group, excluding bots*
  • ruwiki: users in sysop or closer user group, excluding bots*
  • plwiki: users in sysop user group, excluding bots*

    *bots determined via ‘bot’ user group or user name ending in ‘bot’ or ‘bot+{language code}’ with various capitalizations

LISTS OF POTENTIAL ADMINS

For each of the following wikis, we queried users who met the formal or informal requirements listed below. We also filtered out current sysops, closers (ruwiki only), botadmins (eswiki only), stewards, global-sysops, and founder. :) We also limited to users with at least one edit in the past 6 months.

  • enwiki informal requirements
    • extended confirmed (500 edits, 6 months account age)
    • at least 10000 edits
    • email (authenticated)
  • eswiki formal requirements
    • 12 month account age
    • email (authenticated)
  • frwiki informal requirements
    • 12 mounts account age
    • at least 3000 edits
    • “Participated positively in the work of the Project: Maintenance, patrol, and/or welcoming of novices”
      • Projet:Maintenance member
      • Patrouille RC member
  • idwiki formal requirements
    • 3 months account age
    • email verified
    • 500 edits in last 3 months
    • edited at least 5 namespaces in last 3 months
    • no blocks in last 6 months
    • user page over 500 bytes
  • ruwiki formal requirements
    • 6 months account age
    • at least 1000 edits
    • patrolling flag or auto-patrolled flag (on ruwiki, patrolling = editor; and auto-patrolled flag = autoreview)
  • plwiki formal requirements
    • 1000 undeleted edits in the main space, the first of which took place at least 3 months ago.
CMyrick-WMF renamed this task from Support quantitative analysis and reporting to SDS 1.2.2 Support quantitative analysis and reporting.Oct 7 2024, 6:48 PM

Weekly updates:

  • Finished all survey list queries, random sampling (enwiki and eswiki), and formatting for MassMessage
  • Began creating admin-level database to show timeline of activity per admin
    • Structure:
      • one row per admin
      • act: 1st_become_sysop, 1st_become_eventcoordinator, 1st_admin_action, become_sysop (again), desyop
      • date: date on which the act took place
    • Blocker/Puzzle: Dealing with a large number of admins whom have changed their user names since gaining admin rights. Currently exploring the use of 'event_user_text_historical' and 'event_user_text' (via wmf.mediawiki_history) as a solution
    • Blocker/Puzzle: Sleuthing the reason(s) for many users with 1st_admin_action dates preceding 1st_become_sysop dates OR users with 1st_admin_action dates and no 1st_become_sysop date. Need to determine when the logging of sysoping began.

Learnings:

  • When querying for administrative actions via the logging table, we need to exclude actions logged August-November 2016
    • Why? Before August 2016, no deletion log event was generated when moving a page on top of a redirect. When the event was added, it used the same log_action as regular deletions. In November 2016, the delete_redir action was created to allow the two types of events to be distinguished. As a result, this definition produces incorrect results between August and November 2016. (source)
    • A solution has been logged on Phab: T154373
  • The only way to query when a user became a sysop is to query the logging table for log_action="rights" WHERE log_title={that user's username} AND log_params shows an addition of the sysop usergroup.
    • So? Because user names are not necessarily permanent, this requires a list of all former usernames of each user.

Weekly updates:

  • Finished notebook compiling "admin milestones" for all 21 shortlisted wikis
    • Here's enwiki example notebook (public)
    • Admin milestones:
      • date of first becoming sysop (or other admin group);
      • date of first admin action;
      • date(s) of de-sysop (or being removed from admin group);
      • date(s) of being blocked (since this sometimes causes auto-removal from admin group);
      • date(s) of being re-sysop'ed (or re-added to other admin group)
  • Finished notebook compiling "admin inflow and outflow" for all 21 shortlisted wikis

Learnings

Answers to previous week's blockers/puzzles:

  • Dealing with a large number of admins whom have changed their user names since gaining admin rights.
    • Solution: use 'event_user_text_historical' and 'event_user_text' (via wmf.mediawiki_history)
  • Sleuthing the reason(s) for many users with 1st_admin_action dates preceding 1st_become_sysop dates OR users with 1st_admin_action dates and no 1st_become_sysop date.
    • Solution: determined the date that the logging of sysop'ing began: 4 Aug 2006. See learnings below for details.

Learnings about the logging table:

  • Changes to the log_params field of the logging table:
    • In Nov 2011, the format of field changed (from, e.g., "sysop\nsysop, checkuser" to "a:2:{s:12:"4::oldgroups";a:2:{i:0;s:5:"sysop";}s:12:"5::newgroups";a:1:{i:2;s:9:"checkuser";}}"
    • For most wikis, this change is reflected 19 Nov 2011; for ruwiki and fawiki this change is reflected 20 Nov 2011
  • Logging of of rights being given to a user (i.e., adding them to membership in a usergroup) in the logging tablebegan 4 Aug 2006
    • For analysis, when looking at data that far back, we'll take the number of folks we know were in the sysop group (or other admin group) by EOY 2006, divide by 6, and spread among 2001, 2002, 2003, 2004, and 2005

Weekly updates:

Created notebook for analysis of tenure

Created notebook for analysis of bot vs human admin activity

  • Obstacle: query kept timing out/breaking
    • Solution: wrote query as a function, and applied function to wikis, also had to break the query down into multiple month-based queries to decrease the query size
    • Status: enwiki and dewiki queries are still breaking
  • Obstacle: Multiple important edge-cases were discovered
    • See Learnings below
  • Coming soon: plots showing bot administrative activity vs human administrative activity

Learnings
While investigating bot vs human actions and edits with @Samwalton9-WMF and @cwylo, we learned the following:

  1. log_action='flow-delete-post' in the logging data refers to deleting Structured Discussion posts (Thanks, Claudia!)
  2. log_action='blockautopromote' in the logging data refers to this: AbuseFilter has an option that, when triggered, will revoke the user's autoconfirmed user group. It seems that blockautopromote is then a time-bound period ($wgAbuseFilterBlockAutopromoteDuration) where the user will not be automatically re-added to that group (since they already meet the criteria). (Thanks, Sam!)
  3. Edits by AbuseFilter system accounts and Automoderator are not currently flagged as bot edits in mediawiki_history.
    • Long-term solution: T378104
    • Short-term solution: manually (via CASE WHEN clause) classify user as bot when......
      • `event_user_text=='Automoderator'
      • `event_user_text=='مرشح الإساءة' (for arwiki)
      • `event_user_text=="Filtre d'edicions" (for cawiki)
      • `event_user_text=='Editační filtr' (for cswiki)
      • `event_user_text=='Bearbeitungsfilter' (for dewiki)
      • `event_user_text=='Edit filter' or ‘Abuse filter’ (for enwiki)
      • `event_user_text=='Filtro de ediciones' or 'Filtro antiabusos' (for eswiki)
      • `event_user_text=='پالایه ویرایش' or “پالایه ویرایش" (for fawiki)
      • `event_user_text=='Väärinkäyttösuodatin' (for fiwiki)
      • `event_user_text=='AbuseFilter' (for frwiki)
      • `event_user_text=='מסנן השחתות' (for hewiki)
      • `event_user_text=='Filter penyuntingan' or ‘Filter penyalahgunaan’ (for idwiki)
      • `event_user_text=='Filtro anti abusi' (for itwiki)
      • `event_user_text=='編集フィルター' (for jawiki)
      • `event_user_text=='Misbruikfilter' or ‘Filter’ (for nlwiki)
      • `event_user_text=='Redigeringsfilter' (for nowiki)
      • `event_user_text=='Filtr nadużyć' (for plwiki)
      • `event_user_text=='Filtro de edições' (for ptwiki)
      • `event_user_text=='Фильтр правок' (for ruwiki)
      • `event_user_text=='Redigeringsfilter' (for svwiki)
      • `event_user_text=='Фільтр редагувань' or 'Фільтр зловживань' (for ukwiki)
      • `event_user_text=='防滥用过滤器' or '滥用过滤器' (for zhwiki)

Source for wiki-specific Abuse Filter translations: https://www.wikidata.org/wiki/Q4582485

Weekly updates:

  • Finished notebook compiling "admin milestones" for all 21 shortlisted wikis
    • Here's enwiki example notebook (public)
    • Admin milestones:
      • date of first becoming sysop (or other admin group);
      • date of first admin action;
      • date(s) of de-sysop (or being removed from admin group);
      • date(s) of being blocked (since this sometimes causes auto-removal from admin group);
      • date(s) of being re-sysop'ed (or re-added to other admin group)
  • Finished notebook compiling "admin inflow and outflow" for all 21 shortlisted wikis

Claudia was just showing me this and I think this analysis is missing de-adminning in some contexts, but I haven't been able to figure out why yet. On es.wiki the graphs show no de-admins, but their admin page lists more than 100 admins who had their admin rights removed.

When I looked into some of those users, something weird is showing up in their user groups logs - this user, for example, clearly gains the librarian right in 2010, but in 2016 when that right is removed, it is displayed as (none), instead of bibliotecario.

Weekly updates:

  • Finished notebook compiling "admin milestones" for all 21 shortlisted wikis
    • Here's enwiki example notebook (public)
    • Admin milestones:
      • date of first becoming sysop (or other admin group);
      • date of first admin action;
      • date(s) of de-sysop (or being removed from admin group);
      • date(s) of being blocked (since this sometimes causes auto-removal from admin group);
      • date(s) of being re-sysop'ed (or re-added to other admin group)
  • Finished notebook compiling "admin inflow and outflow" for all 21 shortlisted wikis

Claudia was just showing me this and I think this analysis is missing de-adminning in some contexts, but I haven't been able to figure out why yet. On es.wiki the graphs show no de-admins, but their admin page lists more than 100 admins who had their admin rights removed.

When I looked into some of those users, something weird is showing up in their user groups logs - this user, for example, clearly gains the librarian right in 2010, but in 2016 when that right is removed, it is displayed as (none), instead of bibliotecario.

Today I learned that bureaucrats (local advanced users) can only add/remove administrators on some projects. On most projects, Stewards add/remove rights on Meta: https://meta.wikimedia.org/wiki/Bureaucrat#Removing_access

You can see the user right log for the user I linked above at https://meta.wikimedia.org/wiki/Special:Log?type=rights&user=&page=User%3AObelix83%40eswiki&wpdate=&tagfilter=&subtype=&wpFormIdentifier=logeventslist

Weekly updates:

  • Exploration of bot vs human administrative actions
  • Updating queries of admin milestones to include desysop'ings via stewards (metawiki)
  • Lots of debugging

Learnings:

  • Biggest learning related to the desysop'ings via stewards (metawiki) detailed above by @Samwalton9-WMF (Thanks, Sam!)
    • Solution: querying metawiki for desysop'ings for usernames with username@XXwiki
  • We categorized admins as bot or human based on event_user_is_bot_by_historical in mediawiki_history
    • Puzzle: For some users, past was human and present was bot. How to categorize them? Especially when we don't know if the switch to bot was an actual switch to bot, or simply a posthoc correction of classification?
    • Solution: If the user was ever classified as a bot, they are classified as a bot in our analysis.
  • Regarding the list of translated names for Abuse filter across language wikis, via the wikidata item for Abuse fiter (Q4582485), for many of these language translations (including enwiki), the wikidata item is the wikidata item for Edit filter. So for Abuse filter specifically, the translation must be found manually. Below are those affected, of our 21 shortlisted wikis:
    • zhwiki: '防滥用过滤器' --> '滥用过滤器']
    • ukwiki 'Фільтр редагувань' --> 'Фільтр зловживань'
    • nlwiki:'Misbruikfilter' --> 'Filter'
    • idwiki: 'Filter penyuntingan' --> 'Filter penyalahgunaan'
    • fawiki: "پالایه ویرایش" <-- "پالایهٔ ویرایش"
    • eswiki: 'Filtro de ediciones' --> 'Filtro antiabusos'
    • enwiki': 'Edit filter' --> 'Abuse filter'
    • I have updated these where they appear in a previous post

Weekly updates

  • Reran queries of Potential Admins to exclude former admins, using two methods
  • (From above queries) generated new lists of pending surveys, and reminder lists for active surveys
  • Assisted with the deployment of pending surveys
  • Reran queries of bot vs human admin activity, designating abusefilter and automoderator as separate categories from 'human' and 'bot', to distinguish them

Learnings

  • Distinguish within Potential Admins those who have been admins in the past and those who have not, via the query methods listed in the first sub-points above.
  • Distinguish within "non humans" doing administrative actions bots vs. automoderator vs. abusefilter, as discussed above

Weekly updates

  • Began implementing feedback on bots-vs-human notebook
  • Troubleshooting
  • Team meeting to discuss status of surveys, interviews, and quant portion, as well as planning for forthcoming reporting
  • Began focus on these questions (see guiding questions):
    • What is the best metric, or set of metrics, to use for determining departure, dormancy, or disengagement?
    • Is there any way to identify metrics that could predict (either in terms of correlation or causation) likelihood of departure?
    • What else relevant can we measure about the activity of these individuals, particularly in terms of impact on projects?

      ^ These questions are what next week's work will focus on.

Weekly updates:

  • Continued revisions of bots-vs-human analysis
    • See admin_inflow_outflow.ipynb
    • Updated to separate abusefilter out from the bots
    • Updated to include distribution of medians (for monthly administrative-actions-per-admin)
  • Continued revisions of milestones and inflow/outflow based on the new incorporation of desysopings that are logged via metawiki