Page MenuHomePhabricator

Investigation: Move geoshape expansion to Kartographer parse-time
Open, Needs TriagePublic

Description

Currently, the Kartographer extension converts ExternalData in mapframe tags to a maps.wikimedia.org URL with the appropriate geoshape request to fetch the shape. Each dynamic map or snapshot tile request will query the geoshape service to expand the shape. It's possible that it would be more efficient to immediately query the geoshape and stuff into the extension ParserCache data.

  • Investigate parser cache storage requirements for a typical shape.
    • Initial results from T322351 found an average of 3kB per shape for a small sample.
    • Take a larger sample, eg. for all of Germany or Europe data.
    • Show the difference between "native", binary geometry data and seralized GeoJSON.
    • How many articles have a geoshape on all wikis? (maybe using search_insource)
  • Estimate the impact this might have on the infrastructure, internal and external traffic, storage, and processing.
    • Parse time: how long do these geoshape requests take?
    • Internal traffic is increased by average geoshape payload, multiplied by roughly number of pages with geoshapes.
    • External traffic changes somehow because clients get the geoshape through ResourceLoader in wgKartographerLiveData.
    • How many fewer calls will the geoshape endpoint receive?

Event Timeline

awight renamed this task from Move geoshape expansion to Kartographer parse-time to Investigation: Move geoshape expansion to Kartographer parse-time.Nov 8 2022, 8:08 AM

If I can have some ideas on how many articles will gonna get bigger and how much, I can judge if it's going to break PC or not. Generally I support the idea though

If I can have some ideas on how many articles will gonna get bigger and how much, I can judge if it's going to break PC or not. Generally I support the idea though

In T322351: Investigation: Keep simplified geoshapes in maps database we found an average of 2.7kB for a sample of 1,500 geoshapes. Not many articles have maps (0.3% of enwiki pages and 10% of enwikivoyage), and many fewer of those include a geoshape. There's probably a long tail, a few articles might have 1 MB shapes for example. We can take a more careful survey as the proposal looks feasible.

Is it possible to generally pick a threshold below which we don't need to worry? Such as staying below a 1% bloat of parser cache...

Change 855984 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/extensions/Kartographer@master] [POC] Crude geoshape expansion

https://gerrit.wikimedia.org/r/855984

If I can have some ideas on how many articles will gonna get bigger and how much, I can judge if it's going to break PC or not. Generally I support the idea though

In T322351: Investigation: Keep simplified geoshapes in maps database we found an average of 2.7kB for a sample of 1,500 geoshapes. Not many articles have maps (0.3% of enwiki pages and 10% of enwikivoyage), and many fewer of those include a geoshape. There's probably a long tail, a few articles might have 1 MB shapes for example. We can take a more careful survey as the proposal looks feasible.

Is it possible to generally pick a threshold below which we don't need to worry? Such as staying below a 1% bloat of parser cache...

With that numbers you can't get to 1% even if you try really hard. I think the change is fine and can go ahead from DBA point of view, just to be safe add a limit (maybe 1MB?) and if it's larger than that, trigger a statsd metric and avoid storing it for now. Then we can decide depending on the impact and so on what to do next.

just to be safe add a limit (maybe 1MB?) and if it's larger than that, trigger a statsd metric and avoid storing it for now

That's a great suggestion—and I should have added that this approach is quite easy, we can decide per-request if the expansion is possible or desirable, and fall back to the old data flow for that page by simply not expanding.

lilients_WMDE updated the task description. (Show Details)