Page MenuHomePhabricator

Wikimedia Commons JSON API does not sort by timestamp correctly when requesting media in category
Closed, InvalidPublicBug

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 28 2019, 11:11 PM

Not sure if this is MediaWiki-API but task definitely needs a code project. Feel free to correct.

Anomie closed this task as Invalid.Apr 15 2019, 8:02 PM
Anomie added a subscriber: Anomie.

I see how that could be confusing, as I don't think it's really documented anywhere. The output of a generator isn't intended to be sorted, so there's no bug here from a MediaWiki perspective. For that reason I'm going to mark this as "Invalid". Thanks for reporting it though.

Note that the gcmsort and gcmdir parameters do still control the batching, e.g. the first batch with have the first 10 pages in order by descending timestamp, it's only within the batch that the JSON object can't represent ordering and doesn't try to maintain it.

I also note that JSON objects are defined as not preserving ordering at RFC 7159 § 4, so even if we did preserve an ordering in the output there's no guarantee that the client language would preserve it when reading it in. The fact that the XML output format does maintain that underlying ordering could be considered a bug, actually, but it's not one that's really worth fixing.

domdomegg updated the task description. (Show Details)Apr 16 2019, 5:35 PM

@Anomie Thanks for investigating this.

I'm not sure whether you mean it's just that JSON formatting means the output isn't sorted or that's by design. Either way I still think there is still an issue somewhere:

The JSON objects not preserving ordering makes sense. However, JSON formatversion 2 uses arrays (which do definitely preserve ordering), and has the issue. I've updated the link in my original post to use it.

If it's by design, then I think the design isn't great and should probably be changed - it seems a weird case that someone would want (and expect given the parameter names) the category members to be sorted, then selected but then returned to them out of order within themselves. It also seems to go against the documentation and sandbox: where 'gcmsort' is described as "Property to sort by". It seems confusing to then get results back that aren't sorted by that property.

JSON formatting does mean that the output isn't intended to be conceptually sorted, although you have a point that with formatversion=2 we output an array instead.

A larger issue, which I didn't think of earlier to mention, is that with the way generators work it may not even be feasible to preserve the order. The current implementation of the categorymembers module does happen to do that in this case, but if say you added &redirects=1 or used a different generator the ordering the generator would produce as a non-generator might not be preserved without significant work in mapping the generator's output to the produced titles. Rather than have some generators being defined as preserving the ordering and others not being so defined, we've gone with consistency and defined that all generators do not preserve the ordering of the generated results beyond the batching that they do internally.

With respect to the fact of gcmsort existing, we don't try to mark which parameters to a module only apply to generator use or to non-generator use. Similarly, you may note that there's a "gcmprop" field listed but it too has no effect on the generator as generators don't return properties, they only generate a set of titles. Also, as I mentioned earlier, gcmsort does still control the ordering of the batches.