Page MenuHomePhabricator

[SDS 1.2.3] Quantitative lead to support the definition of moderators
Closed, ResolvedPublic

Description

Quantitative lead to aid T376684. Details to come but @Isaac to lead but with support from @KCVelaga_WMF and @Pablo and following the lead of the outcomes from @cwylo's qualitative work in T376945.

Event Timeline

Weekly report:

Weekly report:

  • Log actions:
  • Edit actions:
    • KC was out part of this week so I dumped my thoughts about the various edit actions that we might be able to detect into our notes doc as a start. We will discuss and iterate on this next week.
    • One major takeaway from the above plus a conversation with @cwylo about what actions initially feel most important to moderation: two core moderation actions (beside reverts which can be detected from edit metadata alone and don't require diffs) are actions related to maintenance templates+categories. For instance adding AFD templates or flagging an article as having potential NPOV violations. In both cases, this is done via a template. Detecting this in the wikitext requires having a complete list of the relevant templates, which is very hard to curate, keep up-to-date, and scale to more language editions (anecdotally templates have some sitelinks but many wikis implement their own). On the other hand, moderation templates are usually expressed as messageboxes in the HTML, which can be accurately detected in a language-agnostic manner (it depends on appropriate, shared css classes). Computing diffs from HTML then should be a much more scalable, sustainable, and accurate way of detecting moderation actions in edits. The problem is that the mwedittypes library was written for wikitext and adapting to HTML will take some work. We have the basis for it -- mwparserfromhtml mirrors much of the functionality of mwparserfromhell that mwedittypes depends on -- but presumably some challenges will arise in the adaptation. Additionally HTML data is not currently available on the cluster and so we would need HTML history or at least an HTML edit stream to begin computing these diffs in bulk. Given that most edits are not going to be moderation-related, it would be very hard to use the APIs because we will need to collect a very large sample of edits to find enough moderation-related edits to draw conclusions.

I'm calling out the above as a likely blocker to discuss -- summary: initial indications is that doing any useful analysis of edit actions will depend on overcoming two major technical limitations:

  • At a minimum: adapting mwedittypes to work with HTML. I documented the likely steps/challenges for this here: https://github.com/geohci/edit-types/issues/81
  • To get a good sample of edits for analysis: having current+parent HTML of revisions available on the cluster for processing. The closest we have to this is the stream that Muniza drafted here (T360794) but this would still need further enrichment for retrieving the parent HTML (either in the stream or via HTML snapshots/history also being available on the cluster) and is a non-trivial amount of work.

The above is work that I would love to see happen but is a substantial increase in scope and I doubt is possible within this quarter.

Weekly report:

  • Put together a unified State of Research (data) deliverable (sheet), which has tabs broken out for log actions, edit actions (namespace 0), edit actions (other namespaces), and edit metadata. We tried to be pretty comprehensive here so not everything will be relevant to moderation (those are judgments we'll make with Claudia later in November). The log actions have some initial data with them to get a sense of coverage (thanks Pablo!). The edit actions don't have data but does have our notes on how effectively each one could be detected.
  • Documented one of the main barriers to producing edit action statistics (T378617). The other major barrier (HTML access) is being documented by Research Engineering.

Weekly report:

Weekly report:

  • @Pablo continued review of extensions to determine if there were any major sources of log data that we were missing. He identified four major ones with confirmation from SW to make sure are included

does SW refers to Sam Walton?

does SW refers to Sam Walton?

yes sorry that's a habit I picked up a while ago of just initializing folks when I don't think it requires a notification but the full name is probably better

Miriam triaged this task as High priority.Nov 20 2024, 1:52 PM

Weekly update:

  • Shared a large number of findings this week. I'll just copy a few of the types of findings here but leave the full reporting to the final report:
This is the percentage of namespace-0 edits (ignoring revert-related edits) that we flag as moderation-related
(adding/removing messageboxes + in-line cleanup tags). We should look into some of the higher numbers to 
make sure they're not artifacts -- for example, we've found a few messagebox templates in English Wikipedia
that include dynamic information about the last editor or count of pages in a category, which would look like
a "change" between edits even though the editor isn't adjusting anything.

Wiki	% Moderation (ignoring revert-related)
arzwiki	0.53%
dewiki	0.06%
enwiki	3.74%
eswiki	1.59%
frwiki	1.46%
itwiki	1.30%
jawiki	4.08%
nlwiki	0.98%
plwiki	0.59%
ruwiki	9.59%
svwiki	3.14%
zhwiki	3.68%
Again ignoring revert-related edits, this is the proportion of moderation actions broken down by
wiki and edit-count/user-type. As expected, more senior editors take on more of the
moderator work (both in absolute counts and relative to their activity). The more interesting comparison
here is probably exploring differences across wikis. 

English Wikipedia:
Editor type	% Mod	% Non-Mod
Anon		6.94%	13.30%
Bot		1.98%	1.35%
1-4		0.69%	0.80%
5-99		4.66%	4.35%
100-999		8.57%	8.05%
1000-9999	17.38%	15.49%
10000+		59.78%	56.67%

I also went through a spreadsheet that Claudia had compiled of core moderator processes on-wiki and made a summary of what we could and could not measure. General takeaways from that:

  • We often have good insight into explicit outcomes of moderation (page is deleted, edit is reverted, user is blocked, etc.) but much less insight into 1) the processes that lead to those outcomes, and, 2) all of the content/users that are reviewed but for whom no follow-up actions are taken.
  • Where there is centralized tooling for a process, we generally have reasonably good data about usage. When a process has largely been constructed over time on the wikis, it often is much harder to measure because it's part of standard use of templates, wikitext, etc. as opposed to a tool with dedicated logging. There are some exceptions where e.g., standardized templates for sockpuppet investigations make that process a bit more legible.
  • It's hard to know how much of "moderation" we can measure at this point. We can certainly measure outcomes for a number of important processes but it's harder to know how to interpret differences in these numbers. For example, a drop in reverts or blocks could either mean that there's less moderation and issues are not being addressed (bad) or it could mean that issues are being addressed before they require corrective actions (good).
  • While it will be hard to measure the volume of moderation on a wiki in a useful way (given how much we can't see), hopefully we can still get useful trends around what types of users are taking corrective actions in a given wiki and notable gaps here (e.g., minimal automated moderation, newer editors are not getting involved, etc.).

Hi @Isaac ,

Thanks for the report , I'm not sure how to interpret this:

Editor type % Mod % Non-Mod
Anon 6.94% 13.30%

Why %Mod + %Non-Mod != %100 ?

Good question -- the percentages are of the full column so hopefully each column sums to 100%. As far as intepretation, it's saying that while IP editors were responsible for 6.94% of all moderation-related edits, they were responsible for 13.30% of non-moderation edits. To me that suggests that that group of editors participates in moderation-related activity relatively less than you'd expect based on their level of activity on wiki.

Weekly update:

  • Responding to small questions as they arise about interpretation of the results etc.
  • Pablo added a summary of the log analyses he did in the Superset dashboard
  • Otherwise mainly waiting for feedback and discussion about next steps

Resolving this task as report is written. I'm sure there will be follow-up work but that can be a new task.