Page MenuHomePhabricator

Provide page id and revision timestamp in revision data end point
Closed, ResolvedPublic

Description

Our current revision metadata end point is missing

  • the numeric page id, and
  • the revision timestamp

Example output:

http://rest.wikimedia.org:80/en.wikipedia.org/v1/page/revision/655413910

Adding this information would be very helpful for HTML dumps, as we need this metadata for XML dump parity.

Another small bit of information needed there is a boolean indicating whether the page is a redirect. We might be able to extract that information from the HTML, but if it is available from the API & can be stored in the revision too then that would be even easier.

Event Timeline

GWicke raised the priority of this task from to Medium.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.

Hello,

I would quite like to have a go at working on this as a new contributor if that is ok?

Do I still need to follow the same contribution guidelines for the RESTBase project as per using Gerrit etc..??
http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker

Or it is as simple as cloning the github project and creating a pull request? Thanks.

Jon

Hello @Jcook

I would quite like to have a go at working on this as a new contributor if that is ok?

Of course! All contributions are welcome!

Do I still need to follow the same contribution guidelines for the RESTBase project as per using Gerrit etc..??
http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker

Or it is as simple as cloning the github project and creating a pull request? Thanks.

You can simply fork the repo on GitHub and start cracking at it :) Note that to have a working local set-up you will need to have Cassandra as well.

Here are some pointers. The page ID and the revision timestamp are already returned in the response given by the MW API, e.g. https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids|timestamp|user|userid|size|sha1|contentmodel|comment|tags&revids=655413910&continue= , we are just not storing it (yet). The call to the MW API happens in mods/page_revisions.js. In order to store it, you will need to alter the page revision table definition and then store these extra fields.

Hope this helps! If you need more help/info, let us know here or on IRC@freenode #wikimedia-services

Hello,

Thanks for the pointers, I have done the first part. Should I make a PR for this part?

About the second part, the page redirect part I probably need a few more pointers. I was having a look here:

For example I can see a property redirect here (its empty):
http://en.wikipedia.org/w/api.php?action=query&titles=Main%20page&prop=info|revisions

But it doesn't appear for example here:
https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=ids|timestamp|user|userid|size|sha1|contentmodel|comment|tags&revids=655413910&continue=

I also saw another property in the page source called:
wgIsRedirect":false

So perhaps it is not available from the API?
Thanks.

Jcook set Security to None.

@Jcook, thanks for working on this!

Should I make a PR for this part?

Yes, please do!

But it doesn't appear for example here:
https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=ids|timestamp|user|userid|size|sha1|contentmodel|comment|tags&revids=655413910&continue=

The PHP API JSON output is a bit XML-flavored, which in this case leads to the 'redirect' property only appearing if a page is a redirect. The revision you picked belongs to a page that is not currently a redirect. If you instead use Main_page's current revision, you will however see the redirect property even with the longer request:

https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=ids|timestamp|user|userid|size|sha1|contentmodel|comment|tags&revids=591082967&continue=

OK thanks. Do you have an example where the property is true?

Or does its inclusion e.g)
"redirect": "",

Mean it is a redirect?

Jon

Or does its inclusion e.g)

"redirect": "",

Mean it is a redirect?

Yes, that's the XML-ism. redirect not undefined -> is a redirect.

OK Thanks. I made a Pull Request. Well it is my first one so hope it is ok. Thanks. The tests I changed/added pass and seems to work ok if I've understood.

Resolved by @Jcook's patch, and now deployed. Thank you, @Jcook!