Page MenuHomePhabricator

API:Exturlusage
Closed, InvalidPublic

Description

Unable to get API:Exturlusage working a number of ways:

  • Docs say to use euoffset but the API returns an invalid parameter so I used eucontinue
  • when using eulimit=500 it does not return a eucontinue code even though there are more than 500 - changing to 10 works but returns a limited number before the continue code runs out.

Using washingtonpost.com as example there are many 100s to 1000s available on enwiki.

https://en.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=washingtonpost%2Ecom&euprop=title&eulimit=50&eunamespace=0&format=json&formatversion=2&maxlag=5

{"batchcomplete":true,"continue":{"eucontinue":"http://com.washingtonpost./wp-dyn/content/article/2007/08/26|18897214","continue":"-||"},"query":{"exturlusage":[{"ns":0,"title":"Golden Triangle (Southeast Asia)"},{"ns":0,"title":"Vic Sussman"},{"ns":0,"title":"Presidential Service Badge"},{"ns":0,"title":"Proclamation No. 1081"},{"ns":0,"title":"History of the Philippines (1965–86)"},{"ns":0,"title":"Plaza Miranda bombing"},{"ns":0,"title":"Conjugal dictatorship"},{"ns":0,"title":"Ferdinand Marcos"},{"ns":0,"title":"Martial law in the Philippines"},{"ns":0,"title":"Bobby Sullivan"},{"ns":0,"title":"James Fogle"},{"ns":0,"title":"Charles Barsotti"},{"ns":0,"title":"Justin Timberlake discography"},{"ns":0,"title":"The National (band)"},{"ns":0,"title":"Mistaken for Strangers (film)"},{"ns":0,"title":"Endorsements in the 2012 Republican Party presidential primaries"},{"ns":0,"title":"Farm Animal Rights Movement"},{"ns":0,"title":"Rick Santorum"},{"ns":0,"title":"Nomad Shadow"},{"ns":0,"title":"National Security Agency in popular culture"},{"ns":0,"title":"Big Dig"},{"ns":0,"title":"Red Lake shootings"},{"ns":0,"title":"Hurricane Rita"},{"ns":0,"title":"Victoria's Secret Fashion Show"},{"ns":0,"title":"2005 in American television"},{"ns":0,"title":"Fat Camp: An MTV Docs Movie Presentation"},{"ns":0,"title":"Weight loss camp"},{"ns":0,"title":"Theresa LePore"},{"ns":0,"title":"Jadarite"},{"ns":0,"title":"Kryptonite"},{"ns":0,"title":"Pageant Place"},{"ns":0,"title":"Charm School (TV series)"}]}}
__________________

https://en.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=washingtonpost%2Ecom&euprop=title&eulimit=50&eunamespace=0&format=json&formatversion=2&maxlag=5&continue=%2D%7C%7C&eucontinue=http%3A%2F%2Fcom%2Ewashingtonpost%2E%2Fwp%2Ddyn%2Fcontent%2Farticle%2F2007%2F08%2F26%7C18897214

{"batchcomplete":true,"query":{"exturlusage":[{"ns":0,"title":"Flavor of Love Girls: Charm School"},{"ns":0,"title":"Dmitry Medvedev"},{"ns":0,"title":"Big Dig"},{"ns":0,"title":"Rock of Love with Bret Michaels"},{"ns":0,"title":"Muhammad Rahim al Afghani"},{"ns":0,"title":"Mohammed Al Afghani (CIA detainee)"},{"ns":0,"title":"Asymmetric warfare"},{"ns":0,"title":"Colin Sargent"},{"ns":0,"title":"Zhao Ziyang"},{"ns":0,"title":"Mark Weston"},{"ns":0,"title":"Belsat TV"},{"ns":0,"title":"District of Columbia home rule"}]}}

In the above example it aborts after the second request because no continue code

https://en.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=washingtonpost%2Ecom&euprop=title&eulimit=500&eunamespace=0&format=json&formatversion=2&maxlag=5

{"batchcomplete":true,"query":{"exturlusage":[{"ns":0,"title":"Golden Triangle (Southeast Asia)"},{"ns":0,"title":"Vic Sussman"},{"ns":0,"title":"Presidential Service Badge"},{"ns":0,"title":"Proclamation No. 1081"},{"ns":0,"title":"History of the Philippines (1965–86)"},{"ns":0,"title":"Plaza Miranda bombing"},{"ns":0,"title":"Conjugal dictatorship"},{"ns":0,"title":"Ferdinand Marcos"},{"ns":0,"title":"Martial law in the Philippines"},{"ns":0,"title":"Bobby Sullivan"},{"ns":0,"title":"James Fogle"},{"ns":0,"title":"Charles Barsotti"},{"ns":0,"title":"Justin Timberlake discography"},{"ns":0,"title":"The National (band)"},{"ns":0,"title":"Mistaken for Strangers (film)"},{"ns":0,"title":"Endorsements in the 2012 Republican Party presidential primaries"},{"ns":0,"title":"Farm Animal Rights Movement"},{"ns":0,"title":"Rick Santorum"},{"ns":0,"title":"Nomad Shadow"},{"ns":0,"title":"National Security Agency in popular culture"},{"ns":0,"title":"Big Dig"},{"ns":0,"title":"Red Lake shootings"},{"ns":0,"title":"Hurricane Rita"},{"ns":0,"title":"Victoria's Secret Fashion Show"},{"ns":0,"title":"2005 in American television"},{"ns":0,"title":"Fat Camp: An MTV Docs Movie Presentation"},{"ns":0,"title":"Weight loss camp"},{"ns":0,"title":"Theresa LePore"},{"ns":0,"title":"Jadarite"},{"ns":0,"title":"Kryptonite"},{"ns":0,"title":"Pageant Place"},{"ns":0,"title":"Charm School (TV series)"},{"ns":0,"title":"Flavor of Love Girls: Charm School"},{"ns":0,"title":"Dmitry Medvedev"},{"ns":0,"title":"Big Dig"},{"ns":0,"title":"Rock of Love with Bret Michaels"},{"ns":0,"title":"Muhammad Rahim al Afghani"},{"ns":0,"title":"Mohammed Al Afghani (CIA detainee)"},{"ns":0,"title":"Asymmetric warfare"},{"ns":0,"title":"Colin Sargent"},{"ns":0,"title":"Zhao Ziyang"},{"ns":0,"title":"Mark Weston"},{"ns":0,"title":"Belsat TV"},{"ns":0,"title":"District of Columbia home rule"}]}}

This example is the same except eulimit=500 .. no continue code is returned

Event Timeline

I assume the "docs" are https://www.mediawiki.org/wiki/API:Exturlusage ? Best to provide links to make sure we're looking at the same things. :)

I agree it's baffling. https://en.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=washingtonpost.com&euprop=title&eulimit=3&eunamespace=0&format=json&formatversion=2 asks for a max of 3 pages to list but it lists only 1 result, while there are more than 3 results when using a higher eulimit.

Anomie subscribed.
  • Docs say to use euoffset but the API returns an invalid parameter so I used eucontinue

The documentation page at https://www.mediawiki.org/wiki/API:Exturlusage probably needs to be updated after rMWd65e96b76382: Use new externallinks.el_index_60 field. Although clients shouldn't have been directly using euoffset in the first place, it was only for continuation.

Anyway, it's a wiki, feel free to edit it.

  • when using eulimit=500 it does not return a eucontinue code even though there are more than 500 - changing to 10 works but returns a limited number before the continue code runs out.

There really are only 69 links from enwiki to URLs at http://washingtonpost.com/, and that's in all namespaces. Most of the links are to other subdomains and/or protocols:

DomainCount (all namespaces)
https://www.washingtonpost.com103227
http://www.washingtonpost.com21459
http://voices.washingtonpost.com3454
http://blog.washingtonpost.com1064
https://articles.washingtonpost.com962
http://projects.washingtonpost.com862
https://apps.washingtonpost.com472
http://newsweek.washingtonpost.com323
http://articles.washingtonpost.com265
http://live.washingtonpost.com195
http://stats.washingtonpost.com120
https://media.washingtonpost.com96
http://views.washingtonpost.com74
http://discuss.washingtonpost.com73
http://washingtonpost.com69
http://apps.washingtonpost.com58
http://m.washingtonpost.com44
http://onfaith.washingtonpost.com43
https://live.washingtonpost.com42
//www.washingtonpost.com30
http://media3.washingtonpost.com26
http://blogs.washingtonpost.com26
http://loudounextra.washingtonpost.com23
http://media.washingtonpost.com21
https://img.washingtonpost.com18
http://doonesbury.washingtonpost.com15
https://washingtonpost.com14
http://chinawatch.washingtonpost.com13
http://mobile.washingtonpost.com11
http://knowmore.washingtonpost.com11
https://voices.washingtonpost.com7
https://s2.washingtonpost.com7
http://mp3.washingtonpost.com6
http://cdn.washingtonpost.com6
http://syndication.washingtonpost.com5
http://search.washingtonpost.com5
https://subscribe.washingtonpost.com4
http://reviews.washingtonpost.com4
http://polls.washingtonpost.com4
http://feeds.washingtonpost.com4
http://yellowpages.washingtonpost.com3
http://washingtonpost.com:803
http://russianow.washingtonpost.com3
http://failover.washingtonpost.com3
http://commerce.washingtonpost.com3
http://yp.washingtonpost.com2
http://www.washingtonPost.com2
http://www.stats.washingtonpost.com2
http://wpcomics.washingtonpost.com2
https://projects.washingtonpost.com2
http://sitesearch.washingtonpost.com2
http://md2.washingtonpost.com2
http://img.washingtonpost.com2
http://yp.washingtonpost.com:801
http://youthvote.washingtonpost.com1
http://wwww.washingtonpost.com1
http://www.washingtonpost.com:801
Http://www.washingtonpost.com1
http://www.WashingtonPost.com1
http://www.washingtonpost.com.1
http://wirelessfinder.washingtonpost.com1
http://tablet.washingtonpost.com1
https://www.washingtonpost.com:801
Https://www.washingtonpost.com1
https://www.washingtonpost.com.1
https://wwGesch%C3%A4ftw.washingtonpost.com1
https://.washingtonpost.com1
https://syndication.washingtonpost.com1
https://nie.washingtonpost.com1
https://mobile.washingtonpost.com1
https://js.washingtonpost.com1
https://images.washingtonpost.com1
http://primary.washingtonpost.com1
http://media2.washingtonpost.com1
http://media10.washingtonpost.com1
http://liveblog.washingtonpost.com1
http://link.washingtonpost.com1
http://help.washingtonpost.com1
http://games.washingtonpost.com1
http://financial.washingtonpost.com1

You might try euquery=*.washingtonpost.com instead to query all those subdomains. Although you'd still have to do euprotocol=http and euprotocol=https separately.

I agree it's baffling. https://en.wikipedia.org/w/api.php?action=query&list=exturlusage&euquery=washingtonpost.com&euprop=title&eulimit=3&eunamespace=0&format=json&formatversion=2 asks for a max of 3 pages to list but it lists only 1 result, while there are more than 3 results when using a higher eulimit.

That's expected. If you check the auto-generated documentation at https://en.wikipedia.org/w/api.php?modules=query+exturlusage, you'll see the following note on the eunamespace parameter:

Note: Due to miser mode, using this may result in fewer than eulimit results returned before continuing; in extreme cases, zero results may be returned.

Got it! Thanks again for the info, and about euprotocol which I didn't have. My results are matching the above. I'll update the wiki.

You can't search for protocol relative links specifically. They'll show up with both 'http' and 'https'.