Page MenuHomePhabricator

[SPIKE] Identify potential metrics which could be computed for tools
Open, Needs TriagePublicSpike

Description

Context

Tech Community Building Key Result
Users can assess the reliability of tools for adoption, contribution, or research based on a system of quality signals (co-maintainers, docs, recent editing usage, published source code, endorsements, etc) within Toolhub.

The outcome of this spike is to be able to identify which metrics would be feasible to expose on a per tool basis. The acceptance criteria define which metrics we want to investigate, not necessarily to implement.

By each AC, please state either "Yes" or "No" if the metric is available to us to expose. If "Yes", showing an example of the data or a short explanation of how to provide it.

Acceptance Criteria

On-wiki

  • User Scripts
  • Gadgets
  • Lua Modules
  • Templates

Off-wiki

  • Web Services
  • “Bots” (Software that makes edit changes)
    • Usage
      • Supported Wikis
      • Number of edits applied
  • Desktop Apps
  • Native Mobile Apps
    • Number of unique users per tool
  • CLI
    • Number of unique users per tool
  • Coding Framework
    • Number of unique users per tool

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptNov 24 2021, 8:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Notes from our conversation in Toolhub Sync:
All on-wiki tool-types exist within MediaWiki deployment.

Attributes

  • Who (users, types of user roles, etc.)
  • What (Category, purpose)
  • Where (Projects, geographic)
  • How

Scope

  • Programmatically available to aggregate
bd808 renamed this task from [SPIKE] Identify & Expose Tool Metrics to [SPIKE] Identify potential metrics which could be computed for tools.Dec 1 2021, 11:05 PM

From past discussion of this same topic, we can compute measurements for some types of tools:

  • User scripts
    • A rough estimate of number of users using the script can be obtained by searching for imports in userspace common.js or skin js pages. See https://en.wikipedia.org/wiki/Wikipedia:User_scripts/Most_imported_scripts for more information including an example search and bot maintained information on enwiki user scripts.
    • Watchers of a page can be retrieved via the action=query&prop=info API. Spot checking a few enwiki user scripts this looks to be a low value signal for most scripts, but more through investigation could disprove that.
    • Number of edits and last edit date can be computed using action=query&prop=revisions or via a wiki replicas query
  • Gadgets
  • Lua modules
    • Watchers of a page can be retrieved via the action=query&prop=info API
    • Number of edits and last edit date can be computed using action=query&prop=revisions or via a wiki replicas query
  • Templates
    • Number of transclusions can be computed using action=query&prop=transcludedin or via a wiki replicas query
    • Watchers of a page can be retrieved via the action=query&prop=info API
    • Number of edits and last edit date can be computed using action=query&prop=revisions or via a wiki replicas query
  • Web Services
    • For the specific sub-set of web services hosted on Toolforge, we can get "hit counts" per tool per day from the https://toolviews.toolforge.org/ tool which exposes an API for data collected at the Toolforge front proxy by parsing the nginx access logs. These counts are the number of 2xx status code responses served by the tool. These are not "page views" which would ignore ancillary requests for images, scripts, etc so there is no real way to compare the numbers for one tool with another to show popularity or usage in a head-to-head comparison.

OAuth grants are used by some tools. Toolhub does not currently have a way for a toolinfo.json record to provide an OAuth consumer key value, but most OAuth grants include a callback URL which could be used for matching a web service to a collection of grants. Exension:OAuth does not provide Action API endpoints for querying data and the wiki replicas do not currently expose the oauth_registered_consumer and oauth_accepted_consumer tables. This means we would need to work towards one or both to meaningfully use OAuth data. With access to oauth_accepted_consumer data we could compute the number of users who have authorized a tool to act as them via OAuth. Edits made via an OAuth grant are marked with a tag containing the OAuth grant's numeric id (for example "OAuth CID: 1352" for the SWViewer [1.3] grant. These tags could be used to compute edit counts for a given OAuth grant and by extension tool.

Thank you for the in-depth write-up @bd808 ! I'll take a look and parse through it before our next sync next week

  • For user scripts and gadgets, it seems we have the ability to compute and expose the number of users (by # of users who have enabled) and active users (used in the past 30 days) for a given tool.
  • Templates could provide a view of number of uses/instances of the template? If I understood transclusions correctly.
  • Lua modules I'm less clear on if there's any data collected that could be leveraged to compute a user-centric metric.

Questions

  • Number of edits and last edit date can be computed using action=query&prop=revisions or via a wiki replicas query
  • I'm assuming this is calculating the number of edits done to the user script/gadget/template page itself? Is there any way for us to query edits that have been done using the tool? (e.g. HotCat -- 1.7k Edits Done using this tool)
  • Are we able to compute which projects and languages the tool is supported for? Example: Tool user for Belarusian Wikipedia would like to filter which tools are supported for their wiki/language
  • For user scripts and gadgets, it seems we have the ability to compute and expose the number of users (by # of users who have enabled) and active users (used in the past 30 days) for a given tool.

The "active" part here is not about activity engaging with the tool, but instead about the general activity of the users themselves in the Wikimedia projects. This is really a filter to determine if something was widely installed at some point in the past, but that usage has been abandoned by more modernly active users.

  • Templates could provide a view of number of uses/instances of the template? If I understood transclusions correctly.
  • Lua modules I'm less clear on if there's any data collected that could be leveraged to compute a user-centric metric.

Questions

  • Number of edits and last edit date can be computed using action=query&prop=revisions or via a wiki replicas query
  • I'm assuming this is calculating the number of edits done to the user script/gadget/template page itself?

Yes, the action=query&prop=revisions information would be the count of edits made to a specific wiki page. This is roughly like looking at the git history for an off-wiki piece of software. Its information about who is working on the creating and maintaining the item (user script, gadget, template, module) and how frequently.

Is there any way for us to query edits that have been done using the tool? (e.g. HotCat -- 1.7k Edits Done using this tool)

Maybe, but that depends on the tool leaving some kind of trace due to it's implementation. For your example of HotCat this is possible (at least for some forks of the gadget) due to HotCat itself adding edit tags to track usage: https://commons.wikimedia.org/wiki/Help:Gadget-HotCat#Tracking.

  • Are we able to compute which projects and languages the tool is supported for? Example: Tool user for Belarusian Wikipedia would like to filter which tools are supported for their wiki/language

This is human curatable data, and the intent of the "for_wikis" and "available_ui_languages" attributes of a toolinfo.json record, but generally no not something we can programmatically compute by looking at the source code of a user script, gadget, template, lua module, web service, bot, etc.

This comment was removed by sdkim.

Maybe, but that depends on the tool leaving some kind of trace due to it's implementation. For your example of HotCat this is possible (at least for some forks of the gadget) due to HotCat itself adding edit tags to track usage: https://commons.wikimedia.org/wiki/Help:Gadget-HotCat#Tracking.

EditTags... Interesting! I assume this nowhere inclusive of tools in toolhub and probably no way to retroactively apply edit tags for tools that have not been tracking usage, right? This could be a consideration.

This is human curatable data, and the intent of the "for_wikis" and "available_ui_languages" attributes of a toolinfo.json record, but generally no not something we can programmatically compute by looking at the source code of a user script, gadget, template, lua module, web service, bot, etc.

I'll create another task to think how might we have this data populated

Removing inactive task assignee.