Page MenuHomePhabricator

Titles returned by geosearch are non canonical
Closed, ResolvedPublicBUG REPORT

Description

From T289738:

The article URLs are generated incorrectly, e.g. derived from page titles with spaces appearing as %20 instead of the official URLs. This causes:

  • Delay from appserver due to likely CDN cache miss.
  • Possibly a 404 Not Found on non-English wikis due to incorrect encoding.
  • Possibly stale content due to being a non-canonical URL and thus missing purges.
  • Full extra network roundtrip due to redirect.

These originate from the the query to the geosearch API

https://en.wikipedia.org/w/api.php?action=query&format=json&origin=*&formatversion=2&prop=coordinates|pageprops|pageimages|description&colimit=max&generator=geosearch&ggsradius=10000&ggsnamespace=0&ggslimit=50&redirects=no&uselang=en&ppprop=displaytitle&piprop=thumbnail&pithumbsize=150&pilimit=50&codistancefrompoint=37.78347840548435|-122.468480578051&ggscoord=37.78347840548435|-122.468480578051

The existing Nearby calls mw.util.getUrl( title ) on the title to get the correct URL. We can get the same value by using the inprop=url in the request.

TODO

  • Add inprop=url to the request
  • Pass the URL to the Typeahead Suggestion.

Developer notes

Don't worry about the fact we're using WVUI. We'll switch to Codex at a a later point.

Event Timeline

Jdlrobson triaged this task as Medium priority.Feb 14 2022, 11:30 PM

The given example API query does not return results from the geosearch API. It returns results from the pageimages, pageprops, coordinates, and description APIs. These are internally fed the result of the geosearch API query.

In other words, the geosearch module generates a private list of pages (not exposed here), and feeds them as a generated list of titles to each of the actually targeted APIs.

To receive page information, you need the same kind of query as for any MediaWiki API query that returns page information. We use https://www.mediawiki.org/wiki/API:Info for this.

For example:
https://en.wikipedia.org/w/api.php?action=query&formatversion=2&prop=info&inprop=url&titles=Presidio%20of%20San%20Francisco

"pages": [
    {
        "pageid": 59480,
        "ns": 0,
        "title": "Presidio of San Francisco",
        ..
        "fullurl": "https://en.wikipedia.org/wiki/Presidio_of_San_Francisco",
        "editurl": "https://en.wikipedia.org/w/index.php?title=Presidio_of_San_Francisco&action=edit",
        ..
    }
]

In other words, include at least prop=info and inprop=url in your batch query, and use fullurl from the result.

With the above in mind, I hope the following now makes sense:

Titles returned by geosearch are non canonical.
These originate from the the query to the geosearch API.

The API query in question doesn't return results from the geosearch API.
These titles are canonical. (Afaik, none of our APIs return non-canonical titles.)
To get URLs in the API response, include inprop=url in the query, whch comes from the info API.

Change 824136 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/NearbyPages@master] Fixes: Titles returned by geosearch are non canonical

https://gerrit.wikimedia.org/r/824136

Change 824136 merged by jenkins-bot:

[mediawiki/extensions/NearbyPages@master] Fixes: Titles returned by geosearch are non canonical

https://gerrit.wikimedia.org/r/824136