Page MenuHomePhabricator

prop=info API query should surface number of edits and created date
Open, MediumPublic

Description

Using https://en.m.wikipedia.org/wiki/San%20Francisco?action=info I can access the number of edits a page has.
When I try the same query in the API this information is not available.
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=info&titles=San+Francisco&formatversion=2

Why not?

It seems I'm not alone in wanting this...
http://stackoverflow.com/questions/22158779/how-to-get-total-number-of-edits-for-a-given-wikipedia-page-from-the-api

This would be useful for a MobileFrontend browser test which needs to assure that a page has a certain amount of edits
features/special_history.feature

In addition to this it seems like other fields are missing from the action=info API including the created date...

Event Timeline

The tricky part might be efficiently querying the number of edits for up to 5000 pages. Is SELECT rev_page, COUNT(*) FROM revision WHERE rev_page IN (/* 5000 integers */) GROUP BY rev_page efficient enough?

Typically an API client that needs this information would just hit prop=revisions&rvprop=&rvlimit={$X+1} (where $X is the most it cares about) and see how many revisions get returned.

Wait we query the database for this? Can we not store this as a page property and update it on save/move/edit?

Such a thing could be done, if someone wanted to do it.

The implementation in InfoAction makes the DB query (plus several more queries) for the one page and caches it; the API implementation could potentially reuse this cache when it exists, if hitting 5000 cache keys isn't a problem; populating the cache would require making all the queries that InfoAction does, which again would potentially be a lot of work.

Even if a lot of work it seems like it would be a valuable thing to do.

ovasileva triaged this task as Medium priority.Oct 12 2016, 3:29 PM
ovasileva moved this task from Incoming to Triaged but Future on the Web-Team-Backlog board.
Legoktm renamed this task from action=info API query should surface number of edits to prop=info API query should surface number of edits.Jan 30 2017, 4:11 AM

It was my understanding Developer-Wishlist was a project developers could add items to over the course of the year for nomination for the following year (in this case 2018) rather than during a specific week where they may not remember what's causing them pain and Developer-Wishlist (2017) was the project capturing the results for 2017.

It was my understanding Developer-Wishlist was a project developers could add items to over the course of the year for nomination for the following year (in this case 2018) rather than during a specific week where they may not remember what's causing them pain and Developer-Wishlist (2017) was the project capturing the results for 2017.

Jdforrester-WMF subscribed.

I've created a sub-project for proposals for the next run of the developer wishlist and moved this there.

Just going through some old bugs and I can update that there are still no edits in action=info

Jdlrobson renamed this task from prop=info API query should surface number of edits to prop=info API query should surface number of edits and created date.Jan 11 2022, 2:48 AM
Jdlrobson updated the task description. (Show Details)

I don't think prop=info in the Action API and the action=info parameter to index.php should be thought of as different expressions of the same thing. The code paths are completely different, the features are quite different, the engineering constraints are quite different due to the API being batched...

For the creation date you can use prop=revisions already.

Showing the date of the first edit would be the same query as in T147676#2705959, just with MIN(rev_timestamp) instead of COUNT(*), with the same performance issue - you'd have to filesort [sum of edit count of up to 5000 pages] rows. That's easily millions. I guess the limit could be reduced to something like 50 for expensive properties, and then it's a more manageable amount.

Yeh I understand it would be expensive. If we were to do this we'd presumably need to save this meta data once as a page property rather than running a query on every item. For created that only needs to happen on first edit and edit count on every edit (presumably that would allow optimization of the action=info page).

Plus undeletion, partial deletion, history merge, import. But most of those require reparsing, so I think you'd get them for free. Not quite sure about history merge.