Page MenuHomePhabricator

MWAPI requests for external links return fewer than expected
Closed, DuplicatePublicBUG REPORT

Description

Steps to Reproduce:
Request for external links from a page on Wikimedia Commons, using WDQS
https://w.wiki/47t

Alternatively, request external links from an English Wikipedia article:
https://w.wiki/46e

Actual Results:
Each of the above queries returns only one external link.

Expected Results:
Applying the Commons query via the REST API returns 22 links: https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=extlinks&titles=File%3AWilliam_Holman_Hunt_-_A_Converted_British_Family.jpg&ellimit=500

The enwiki query via its own REST APi also returns many links:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=extlinks&titles=Ashmolean_Museum&ellimit=500

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptMay 19 2019, 8:24 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Second query returns 10 results for me.

The second query doesn't return all the external links that are present in the article (and visible in the REST query). It may be returning a different number of results at different times- the bug is that it's less than the full number. Adding an mwapi:ellimit property seems to produce fewer results.

As for the second one, it's unfortunately a limitation of the current API - only one result per page (<page> tag in generator) can be returned. In general, only one result per XPath match - i.e. if XPath matches more than one thing per page it won't work.

Further finding: with no mwapi:ellimit property it returns 10 results. With mwapi:ellimit "max" it returns one result. With a numerical mwapi:ellimit property it returns varying (small) numbers of results. With a nonsense mwapi:ellimit property (e.g. "xam" "qwerty") it returns a set of 92 results which seems complete. So I only get the desired result when entering a nonsense input.

As for the second one, it's unfortunately a limitation of the current API - only one result per page (<page> tag in generator) can be returned. In general, only one result per XPath match - i.e. if XPath matches more than one thing per page it won't work.

I don't understand- it's possible to get multiple results through this interface: as many as 92 in this example.

As for the second one, it's unfortunately a limitation of the current API - only one result per page (<page> tag in generator) can be returned. In general, only one result per XPath match - i.e. if XPath matches more than one thing per page it won't work.

I don't understand- it's possible to get multiple results through this interface: as many as 92 in this example.

Yes – with ellimit=1, there will only be one <page> and one <el> per API result. The 92 overall results you’re seeing are the results of 92 separate API requests, thanks to continuation. Of course, this is massively slower than a single API request returning all 92 results :/

Esc3300 added a subscriber: Esc3300.EditedMay 19 2019, 4:08 PM

Interesting. "mad" outputs more than "max" ;)

What is needed to run the generator just once, at least once for a given variable?

Smalyshev triaged this task as Normal priority.Jun 24 2019, 11:30 PM

What is needed to run the generator just once, at least once for a given variable?

If I understand right what you mean, then from https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI

bd:serviceParam wikibase:limit "once" .

Smalyshev added a comment.EditedTue, Aug 27, 9:45 PM

I've created T231390: MWAPI can only match one result per page for handling the multiple values in one result issue, so that we have clearly focused task.