Undocumented usage of generator=revisions
Open, Needs TriagePublic

Description

  1. It is supposed to be possible to use the revisions module as a generator since 1.25, but this use case is not documented anywhere. Example queries would be useful.
  1. On a related note, all the queries I tried so far give an empty result:

What is the expected behaviour of this generator?

Lahwaacz updated the task description. (Show Details)
Lahwaacz raised the priority of this task from to Needs Triage.
Lahwaacz added a subscriber: Lahwaacz.
Restricted Application added a project: Documentation. · View Herald TranscriptSep 27 2015, 10:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Anomie added a subscriber: Anomie.Sep 27 2015, 1:19 PM
  1. It is supposed to be possible to use the revisions module as a generator since 1.25, but this use case is not documented anywhere.

It's documented in the auto-generated documentation, see https://en.wikipedia.org/w/api.php?modules=query+revisions (in the box at the right) and https://en.wikipedia.org/w/api.php?modules=query (in the list of allowed values for the generator parameter).

On a related note, all the queries I tried so far give an empty result

The last, https://wiki.archlinux.org/api.php?action=query&generator=revisions&titles=Main%20page, does not give an empty result.

What is the expected behaviour of this generator?

It generates the revids that would be output by a non-generator use of prop=revisions. Like any other prop module, you still need to give it titles, pageids, or revids as input when used as a generator.

I don't know of a use for prop=revisions as a generator, really, since the only things that use generated revids directly are prop=revisions itself and prop=deletedrevisions (but generator=revisions will never generate deleted revisions). Everything else uses the corresponding pages generated, so you'd likely do better to just use titles, pageids, or revids without the generator. I suppose there could be something else in the future that might have use for generated revids, then it could become useful. The only reason it exists at the moment is because Gerrit change 168646 made prop=revisions, prop=deletedrevisions, and list=alldeletedrevisions all share much of their backend code, and list=alldeletedrevisions needed to be a generator so they all had to be.

Anomie moved this task from Unsorted to Non-Code on the MediaWiki-API board.Sep 27 2015, 1:19 PM

I was expecting it could feed other queries like information about users that edit a given page, but that doesn't seem to be the case:

https://www.mediawiki.org/w/api.php?action=query&list=users&usprop=groups|editcount|gender&generator=revisions&titles=Main%20Page&grvprop=user

User generators do not exist. See T16027: Implement user generators into the API for the feature request.

Lahwaacz added a comment.EditedSep 27 2015, 2:00 PM

The last, https://wiki.archlinux.org/api.php?action=query&generator=revisions&titles=Main%20page, does not give an empty result.

To be more precise, it gives exactly the same result as if generator=revisions were not specified (i.e. plain https://wiki.archlinux.org/api.php?action=query&titles=Main%20page itself). Adding other parameters like rvprop or rvstartid does not help either.

It generates the revids that would be output by a non-generator use of prop=revisions. Like any other prop module, you still need to give it titles, pageids, or revids as input when used as a generator.

That's what I expected, but it seems that it only generates pageids for its input (like any other generator for that matter), which I think is the reason that makes it useless. I was hoping to use generator=revisions to operate on a range of revision IDs instead of passing them with the revids parameter one by one. Using a generator would also allow to use higher limits (5000 revisions instead of 500), which would be very handy for a grabber. I'm sure you will argue that this is not very useful for huge wikis like Wikipedia, but think as well of the small wikis without available dumps.
Edit: Also since incremental dumps are not standard yet, it would be a way to fetch the necessary information incrementally on top of an older dump.

Lahwaacz added a comment.EditedSep 27 2015, 2:55 PM

All in all, there may just be a missing module like list=allrevisions to complete the functionality of prop=revisions, prop=deletedrevisions and list=alldeletedrevisions. Taking the same parameters as list=alldeletedrevisions would be sufficient for my use case outlined above. This would also likely solve T21314.

The last, https://wiki.archlinux.org/api.php?action=query&generator=revisions&titles=Main%20page, does not give an empty result.

To be more precise, it gives exactly the same result as if generator=revisions were not specified (i.e. plain https://wiki.archlinux.org/api.php?action=query&titles=Main%20page itself).

That's true. OTOH, https://wiki.archlinux.org/api.php?action=query&generator=allpages&gapfrom=Main%20page&gapto=Main%20page also generates exactly the same result.

Adding other parameters like rvprop or rvstartid does not help either.

prop-style parameters never work with generators.

rvstartid can make a difference, for example https://wiki.archlinux.org/api.php?action=query&generator=revisions&titles=Main page&grvstartid=600&grvendid=100 produces an empy result because Main page on that wiki doesn't have any revisions with revid between 100 and 600.

That's what I expected, but it seems that it only generates pageids for its input (like any other generator for that matter), which I think is the reason that makes it useless.

No, it generates revids. But, as I already noted, most prop modules just use the corresponding pages instead of the generated revids.

I was hoping to use generator=revisions to operate on a range of revision IDs instead of passing them with the revids parameter one by one. Using a generator would also allow to use higher limits (5000 revisions instead of 500), which would be very handy for a grabber. I'm sure you will argue that this is not very useful for huge wikis like Wikipedia, but think as well of the small wikis without available dumps.

You'd probably want a generator=allrevisions (corresponding to a list=allrevisions module, similar to list=alldeletedrevisions) for that. Or maybe if generator=recentchanges were able to generate revids rather than titles.

Lahwaacz added a comment.EditedSep 27 2015, 7:43 PM

Thank you for your input. The problem with generator=recentchanges is that it is affected by $wgRCMaxAge, but for the purpose of incremental grabbing the time span might be larger. Seems that I'll need to wait for list=allrevisions to make things more efficient than with revids...