Page MenuHomePhabricator

Reader gets list of contributors to a page
Open, MediumPublic

Description

"As a Reader, I want to know who contributed to the content of the page, so I can correctly attribute them according to the content license."

We took out the contributors element from T229663: Contributor gets page source because it could be long and would be difficult to retrieve. This user story is for a segmented list of contributors to the page.

These contributors should only be users who've made edits on the shortest path in the edit history.

Also note that the list of contributors is not in reverse-chronological order, so the paging parameters and properties don't need to be explicitly about time, as in T231343: Curator gets page history.


GET /page/{title}/editors

Get a unique list of editors of a page.

Parameters:

  • before: a user name. In segment order, only users strictly before this user (non-inclusive).
  • after: a user name. For segmentation. Only users strictly after this user (non-inclusive).

If there are no parameters, return the first segment of contributors.

Payload: empty

Notable headers: none

Status codes:

  • 200: OK, body is a list segment
  • 404: No such page
  • 400: No such user (before or after) or user is not a contributor to this page

Response body: a JSON object with the following properties:

  • editors: an array of 0 to 100 user references, in no specified order, each with the following properties:
    • id: user ID, or null if unregistered or ID is unavailable
    • name: registered user name or other unique identifier such as IP address
  • next: if there are more contributors than fit in this or previous segments, a link to the API endpoint to get the next segment of contributors (typically, the link for this page plus an after parameter with the ID of the last contributor)
  • prev: if there are previous segments, a link to the API endpoint to get the previous segment of contributors (typically, the link for this page plus a before parameter with the ID of the first contributor)
  • first: a link to the API endpoint without any parameters

Event Timeline

These contributors should only be users who've made edits on the shortest path in the edit history.

I don't think we can calculate that without very significant changes in our infrastructure. I will have more precise look into it, but please don't set your expectations too high.

GET /page/{title}/contributors

In history counts we call this editors. How are contributors different from editors?

400: No such user or user is not a contributor to this page

We don't provide a user to this endpoint, which user are you talking about here?

before: a user ID. In segment order, only users strictly before this user (non-inclusive).

In general are we talking registered users only or registered plus anons here? Anons don't have user ID. We can have actor_id and sort these by it, but it's a pretty arbitrary number from the client perspective.

I don't think we can calculate that without very significant changes in our infrastructure. I will have more precise look into it, but please don't set your expectations too high.

How am I supposed to get lower expectations when you all keep doing such impressive work?!?

GET /page/{title}/contributors

In history counts we call this editors. How are contributors different from editors?

Good point! "editors" is fine; I'll change it.

400: No such user or user is not a contributor to this page

We don't provide a user to this endpoint, which user are you talking about here?

The segmenting parameters, before and after.

before: a user ID. In segment order, only users strictly before this user (non-inclusive).

In general are we talking registered users only or registered plus anons here? Anons don't have user ID. We can have actor_id and sort these by it, but it's a pretty arbitrary number from the client perspective.

Ummmmm... duh. Should we change this to user name instead? It's mostly for segmenting the list, rather than clients caring that much. The order of editors is arbitrary and up to the implementation.

eprodromou added a subscriber: WDoranWMF.

@WDoranWMF I wonder if we could get data from analytics about this?

@eprodromou Yeah I was thinking so as well, I'll include it.

@WDoranWMF I wonder if we could get data from analytics about this?

Currently this is not exposed via druid but I don't see why not. You can see https://wikimedia.org/api/rest_v1/#/ for all currently available statistics endpoints from analytics.

Ummmmm... duh. Should we change this to user name instead? It's mostly for segmenting the list, rather than clients caring that much. The order of editors is arbitrary and up to the implementation.

I can't answer this right now without looking more into how do we construct the query. And if I look at it in enough detail, I'll probably just implement it :) Given that we mostly need something for pagination and it seems you don't quite care what it is, I propose to use whatever will make sense from technical perspective. Be it actor ID or user name or some random hash. Also, given that it's a DISTINCT query, pagination might provide very limited benefit regarding performance. We can probably drop pagination altogether and return the full list if DB performance was your only reason for introducing pagination.