Page MenuHomePhabricator

Investigate slow loading of complex SPARQL queries
Closed, ResolvedPublic2 Estimated Story Points

Description

Issue: geopoints loading takes long for complex sparql queries

Example:
https://en.wikipedia.beta.wmflabs.org/wiki/Maptests/Geopoints_by_SPARQL

Investigate

  • Why does it take so long to load?
    • One issue is that the beta cluster causes other delays when saving or viewing a page, so it's hard to see how the new feature affects timing.
    • Parser debugging reports that 10s of real time are spent parsing. This is much lower than the expected timeouts...
  • How will slow queries interact with geoshape expansion enabled by T323113?
  • Evaluate possible solutions and identify the pro/cons of each direction, including what other problems could be solved in addition

Alternatives considered

  1. Cache the results of SPARQL requests in a local database for a day or so. SPARQL should take care of it’s own result caching and does already, but it doesn’t
  2. Cache the final GeoJSON results in a local database for a day or so.
  3. Do the query only on page parsing
    • Implemented as T322353: Investigation: Move geoshape expansion to Kartographer parse-time.
    • Has the advantage that purging is a solved concept
    • Potential issue: page rendering (also preview) will take a long time, during which the page is blank and even text is unavailable.
      • Page rendering in steps (not available yet, but discussed, see T282585)
      • Parser already has a fall back to go for a stale version when loading is too long
      • Most pages won't have very complex, slow queries.
    • If expansion fails, fall back to the previous behavior and leave it to the client to expand external data.

Event Timeline

lilients_WMDE set the point value for this task to 8.

We're hoping this will be closed by T323113, but to verify let's copy the mapframe from an affected page to the Beta Cluster, where the feature has been enabled.

  • Identify an affected page or otherwise find a slow geopoints query.
  • How long does it take to save the page and see it rendered?
  • How long does it take to hard-refresh the maps.wmo image? Hint: Tweak the image URL with a fake query parameter to defeat the web cache.
  • Compare with timings on the beta cluster.
awight renamed this task from Investigate slow loading of large number of GeoPoints to Investigate slow loading of complex SPARQL queries.Jan 4 2023, 12:33 PM
awight updated the task description. (Show Details)
awight changed the point value for this task from 8 to 2.
awight updated the task description. (Show Details)

On the beta cluster, saving a page with a query known to time out results in pain.

<mapframe text="Churches in Italy" latitude="43.74" longitude="7.43" zoom="13" width="400" height="400" align="center">
{
  "type": "ExternalData",
  "service": "geopoint",
  "query": "SELECT distinct ?id ?geo WHERE {?id wdt:P31/wdt:P279* wd:Q16970; wdt:P625 ?geo. ?id p:P131 ?statement1. ?statement1 (ps:P131/(wdt:P131*)) wd:Q38.}"
}
</mapframe>

Saving takes 10-15s and then the page renders a blank map, spinner continues for 60s for the static thumbnail. This eventually fails with "400 - bad request" for the geoshape https://maps-beta.wmflabs.org/img/osm-intl,13,43.74,7.43,400x400.png?lang=en&domain=https://en.wikipedia.beta.wmflabs.org&title=Maptests/Geopoints+by+SPARQL&groups=_4b06c94786eacd5bbd007e0fe5a9023a0fd522cc . The response contains "Bad GeoJSON - unknown type ExternalData".

The delay when saving doesn't exist in previous code, as expected. The 60s static thumbnail timeout matches existing behavior.

There is a longer delay when making the first pageview of an uncached page, as expected.

There is no delay when viewing a cached page, as expected.

In all cases, the map is blank and takes 60s to finish failing and logging to the console.

It's an open question

If I'm reading this correctly, the parser cache dashboard shows that only c. 50% of pages requested from the app servers are coming from cache. There could be additional layers of caching such as ATS which makes this overall rate higher, but I don't know yet. If the other 50% of the requests must be parsed before responding to the pageview, then our expansion will have a big and negative impact on the few pages with complex queries.

awight moved this task from Doing to Done on the WMDE-TechWish-Sprint-2023-01-04 board.