Page MenuHomePhabricator

[SPIKE] Investigate user edit impact APIs
Open, HighPublic

Description

Background

Investigate what's possible to implement to show the impact of a user's edit history

Results

A list of recent contributions can be obtained from:
/w/api.php?action=query&format=json&list=usercontribs&ucuser=usernamehere

But that call would potentially have duplicate articles, would not include the earliest date the article was edited, and would not allow for obtaining the top 5 edited pages. This was worked around for the "Impact" home page modules section by directly querying the revisions table:
https://phabricator.wikimedia.org/diffusion/EGRE/browse/master/includes/HomepageModules/Impact.php$412

To get the pageviews per article, there's an existing API on wikimedia.org at:
/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}

Byte difference for each edit can be obtained by including sizediff in the list passed to ucprop for action=query&list=usercontribs

Event Timeline

JoeWalsh created this task.Aug 6 2019, 4:30 PM
Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptAug 6 2019, 4:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@schoenbaechler if you're ok with showing a paginated list of recent edits and the pageviews for those articles since those edits, pre-existing APIs can be used to collect that client-side. The caveat is that if a user edits the same article multiple times, you would have duplicate entries in the list. You also would not know the date the user started editing the article. Duplicate entires could be coalesced app-side but it appears there'd be no easy way to get the earliest edit date.

The growth team built out special functionality to get a list of unique articles that the user has edited recently and the earliest date the user started editing each article so that they could show the pageviews since the user first started editing the article. However, there is no existing API for this and they used a direct connection to the revisions table in the database. This could potentially be exposed via an API that the Android app could use if you feel strongly that the additional information would be worth it. CC @Jhernandez to follow up on the API side and @Dbrant to confirm these options sound reasonable to build out client-side

schoenbaechler added a comment.EditedAug 9 2019, 1:37 PM

Thanks for investigating and for rounding it up so comprehensible @JoeWalsh. May I ask for clarification about:

  • Would a simple edit history list without displaying pageviews also result in duplicate entries?
  • How about outputting cumulated pageviews for all user edits, e.g. for all articles edited in a certain period? (e.g. pageviews in the past 7 days/past month or pageviews for the last 100 articles that the user has edited?) What would be a reasonable period (in regards to feasibility/performance)?
JoeWalsh added a comment.EditedAug 9 2019, 3:02 PM

@schoenbaechler

Would a simple edit history list without displaying pageviews also result in duplicate entries?

Yes, the edit history list we can get from the API is a list of recent edits, so if the user edited the same article multiple times, there would be multiple entries for each edit. This is independent of whether or not page views are include

How about outputting cumulated pageviews for all user edits, e.g. for all articles edited in a certain period? (e.g. pageviews in the past 7 days/past month or pageviews for the last 100 articles that the user has edited?) What would be a reasonable period (in regards to feasibility/performance)?

The existing API will give us up to 500 edits with a single request. It can return either the earliest or most recent 500 or from a given time range. For example, if a user made 500 edits on the same article for the range we requested, we'd only see that article. In practice, my guess is that 500 edits would give us plenty of different articles, and the limiting factor will be how many articles to request pageviews for. Requesting pageviews for the past week or month for 5-10 articles at a time should be reasonable.

JoeWalsh updated the task description. (Show Details)Aug 9 2019, 3:36 PM

There is also the mediawiki API pageviews prop where you can batch a number of titles on the same request and get other things from the mediawiki api if you want while you are at it: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=pageviews&titles=Banana%7CPhone%7CApple&formatversion=latest

Seems restricted to the last 60 days though.

Also, the numbers seem slightly different? https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Banana/daily/2019061000/2019081000

In terms of exposing the Growth's team API, we would have to discuss with them if they are going to use it for a long time to work with them, or if we should expose it somewhere else (Mobile apps extension?) in case they decide to stop supporting that piece of code.

Based on what Rita mentioned on the doc, it seems to me that they are using the media wiki api that I posted above (they count the last 60 days), instead of the REST api.

I’ve checked and there isn’t a generator for usercontribs, which means we need to query the last X edits, and then make a new request with all the titles deduplicated.

2 queries seems better than 1 + n if we have to go to the rest api many times. So if we can live with the page views of the last 60 days like Growth, it makes the whole thing easier and more performant.

Hey @Jhernandez, thanks for the additional infos, these are very helpful. We’re performing a user questionnaire at Wikimania this week and are trying to find out which metrics could be motivational for users. Will keep you posted about it!

JoeWalsh removed JoeWalsh as the assignee of this task.Tue, Sep 17, 2:53 PM