Page MenuHomePhabricator

Investigation: impact of expanding "page" map data at parse-time
Closed, ResolvedPublic

Description

As part of T323113, we've decided that map data from Commons has sufficiently different characteristics that we should not include it in the first phase of expanding map data, and will discuss whether it makes sense to expand at all.

Informational points:

  • One use case for Commons map data is to reuse shapes and points across pages, and even across wikis. This is similar to how templates are used.
  • The other use case is to provide translations for labels and popup text, via the JsonConfig mechanism. This use is similar to i18n messages.
  • Our investigation into average geoshape payload did not cover Commons map data, it was not included.

What we want to know:

  • How prevalent is Commons map data in maps?
  • What is the average additional payload which would be added if maps are expanded?
  • What is the average latency of the call to expand map data?

With this, we can estimate the impact of expanding Commons map data at parse-time.

Event Timeline

I run this and some similar queries via https://quarry.wmcloud.org:

select count(page_len), min(page_len), max(page_len), avg(page_len), std(page_len), variance(page_len)
from page
where page_namespace = 486
and page_is_redirect = 0
and page_title like '%.map'

Findings:

  • There are about 38,000 .map files on Commons.
  • Average size is 23 KB. Standard deviation is 45 KB. Which means the range is quite wide.
  • 90% are <55 KB. 99% are <220 KB.
  • Only 27 individual files are >500 KB. The biggest is 1.3 MB, but since it's https://commons.wikimedia.org/wiki/Data:Sandbox/PinkPanda272/Test.map it doesn't even count.
  • Another quick test shows that gzip can cut these files down to 15%. Which means we talk about <30 KB extra (gzipped) traffic in 99% of the cases, and <8 KB in 90%.
awight claimed this task.
awight moved this task from Doing to Done on the WMDE-TechWish-Sprint-2022-11-29 board.

Using the following query I was able to put a lower bound on the number of pages using .map files,

./search_insource 'insource:mapframe insource:/\"service\":[^:]*\"page\"/'

There are a total of 485 pages directly including a .map, on all wikis. English Wikipedia has the highest concentration at 107 pages, and Commons Wiki has the next highest at 58. The full list of pages on enwiki shows that only two are templates, and commons includes no templates using .map data.

The impact of expanding page data will be very small, we can close this investigation.

Change 869754 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/extensions/Kartographer@master] Also expand "page" data

https://gerrit.wikimedia.org/r/869754

Change 869754 merged by jenkins-bot:

[mediawiki/extensions/Kartographer@master] Also expand "page" data

https://gerrit.wikimedia.org/r/869754