Page MenuHomePhabricator

Add to article schema number of revision the article has, at the current revision
Open, Needs TriagePublic5 Estimated Story PointsFeature

Description

Feature summary:

In articles schema, I propose version.count field which would have the number of revision this article has, at the current revision of the dumped article. So a completely new article with only one revision would have count == 1.

Use case(s):

When you are updating data based on a new dump, you currently have only version identifier available so you can know if article changed or not since the last dump. But it would be also useful to know how many versions are in between as a proxy for how much has the article changed (imperfect, but still).

Another use case is to know which articles are more "active" than others. More edits mean also in a way more important article.

Benefits:

Reading historical data of articles requires many API calls, but having a bit of high-level statistics (like number of revisions available) can already be valuable.

Event Timeline

@Mitar Question — any reason why you couldn't do this with revision.id? it gives you the specific revision number for the article so you can compare it to as the as many previous revision.id and see the change # . very low chance we'll be calculating it on our side. :)

@FNavas-foundation Not sure what you mean. Maybe I am missing something, but revision.id by itself does not give you the count I am asking for, i.e., the id is not local to the article and does not increase by 1 every time the article has changed. Or am I mistaken?

So I see that editors have "edit count". Also articles have "version" and "previous version" now. Also "date created" and "date modified" and "date previously created". All those are useful signals. I think that having the total edit count for an article would also be a great signal.

REST API has already that: https://www.mediawiki.org/wiki/API:REST_API/Reference#Get_page_history_counts In fact having counts based on different types available there would be great.

So I could query the REST API for every article, but I think it would really be nicer if this would just be included for all articles in the dump.

yeah the ID is just that, non-incremental. count is something we'd need to add to the API. I agree it's a nice addition.