Page MenuHomePhabricator

Extra 100KB of JS loaded by mobilemaps gadget on pageview
Closed, ResolvedPublic

Description

The deployment of this gadget caused the size of JS loaded on first view for loggedin users on mobile enwiki to jump by more than 500%:

Before:

After:

Where the extra JS requests are:

https://en.m.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.kartographer.linkbox%2Cstyle&skin=minerva&version=1ulxz7o
https://en.m.wikipedia.org/w/load.php?debug=false&lang=en&modules=oojs-ui-core%2Coojs-ui-widgets%7Coojs-ui-core.styles%7Coojs-ui.styles.icons-alerts%2Cicons-content%2Cicons-editing-advanced%2Cicons-interactions%2Cicons-location%2Cicons-moderation%2Cicons-movement%2Cindicators%2Ctextures&skin=minerva&version=16bxt5a

This might only affect articles that have a location set. The two articles we happen to track for JS size in WPT (Facebook and Sweden) happen to both have one.

The feature this is for is behind an icon that needs to be clicked on:

It might make more sense to lazy-load that JS when the feature is actually clicked on, instead of downloading it defensively like it seems to do at the moment. Thankfully after the pageload, but still, seems like wasted bandwidth when most visits won't see a click on that feature.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 14 2017, 12:59 AM
Gilles renamed this task from Extra 100KB of JS loaded on pageview for authenticated mobile users to Extra 100KB of JS loaded by Kartographer on pageview for authenticated mobile users.Aug 14 2017, 1:00 AM
Gilles triaged this task as High priority.
Gilles updated the task description. (Show Details)Aug 14 2017, 1:04 AM
Gilles updated the task description. (Show Details)

Definitely looks like it! That would explain why there was no deployment, I assume it's just been enabled for all loggedin users on enwiki yesterday.

TheDJ added a comment.EditedAug 14 2017, 1:26 PM

Yeah that was me, i'll make it a bit more efficient in an hour or so, can strip some stuff from it I think.

Wait... extra requests, or changed requests ??? Because i have no idea why icons-moderation would start loading, based on the modules i'm referencing...

TheDJ added a subscriber: debt.Aug 14 2017, 1:33 PM

Tagging deb, to keep her informed, this being me bold and stuff..

Wait... extra requests, or changed requests ??? Because i have no idea why icons-moderation would start loading, based on the modules i'm referencing...

They're all extra. I'm guessing that the ones you didn't expect are coming as dependencies somehow.

Ideally, even if it requires refactoring where the icon lives, this feature should just load the icon it needs and everything else would be downloaded on demand on click.

Gilles renamed this task from Extra 100KB of JS loaded by Kartographer on pageview for authenticated mobile users to Extra 100KB of JS loaded by mobilemaps gadget on pageview for authenticated mobile users.Aug 14 2017, 2:55 PM
Gilles updated the task description. (Show Details)
debt moved this task from Backlog to In progress on the Maps-Sprint board.
Gilles assigned this task to TheDJ.Aug 15 2017, 2:21 PM

If this is all logged in users shouldn't this be unbreak now? That's quite a big jump...

It's mostly a bandwidth waste, since the download happens after the pageload. I'll let others be the judge of how urgent that makes it.

Gilles moved this task from Inbox to Radar on the Performance-Team board.Aug 16 2017, 7:00 PM
Gilles edited projects, added Performance-Team (Radar); removed Performance-Team.
dr0ptp4kt added subscribers: JKatzWMF, Nirzar, dr0ptp4kt.EditedAug 16 2017, 9:56 PM

Hey all, are we sure this is only (EDIT: logged in) and is the service down? I noticed the button on the Sierra Leone page as an anon and on tap the map load seems to stall out. Here's what I'm seeing.

Indeed, it seems like this applies to anonymous users as well. It works fine for Sweden, but same as you it seems to be broken for the Sierra Leone article, where the tiles all seem to 404. @TheDJ could this gadget be undeployed until these issues are resolved?

Gilles raised the priority of this task from High to Unbreak Now!.Aug 16 2017, 10:25 PM
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes. · View Herald TranscriptAug 16 2017, 10:25 PM
Gilles renamed this task from Extra 100KB of JS loaded by mobilemaps gadget on pageview for authenticated mobile users to Extra 100KB of JS loaded by mobilemaps gadget on pageview.Aug 16 2017, 10:27 PM

The graph for anons:

https://en.wikipedia.org/w/index.php?title=MediaWiki%3AMobile.js&action=historysubmit&type=revision&diff=795856847&oldid=795473593 I temporarily disabled the gadget by default - people can still opt-in to it via their preferences.

Thank you, lowering priority back to normal

Gilles lowered the priority of this task from Unbreak Now! to Normal.Aug 16 2017, 10:40 PM

Related comment.... https://phabricator.wikimedia.org/T173309#3525851

Why are we launching gadgets to mobile anons? This sets a dangerous precedent as illustrated by this bug...

why have we broken anon gadgets for mobile ?

Seriously i'm mildly fucking pissed off. If the foundation isn't working on maps and if the foundation decides to break mobile and gadgets, what am I still doing here ?

I think there are a few things being discussed on this ticket. Are people around next week to do a hangout? We should do a review to get everyone up to speed, and then figure out how we can all support each other better in the future around things of this nature.

I can see how TheDJ could feel a little overwhelmed with 5 WMF engineers bringing concerns over the performance of his work. TheDJ demoed this at the recent Wikimania Hackathon and it was well-received. Improvements like this that the Maps team have not, or can not, accomplish should be encouraged. Volunteer developers have far more opportunities to experiment and can do some interesting things that staff sometimes can't. :)

I don't think anyone was trying to be a jerk - the concerns are valid - and I hope TheDJ will consider continuing to work on this. To echo @dr0ptp4kt, @TheDJ would you be interested in a discussion about this work and how we can move forward together?

@debt I think you're currently traveling, but I think as the PM for maps you should be part of the discussion.

I don't think anyone criticized the value of the feature, it's a great feature! But launching it in this state to all users in production was too rushed. Opt-in would have been more sensible and this issue discovered/worked on before deploying it to all enwiki users. Bandwidth is very expensive for some people, a 60% increase of anon first pageview JS weight is considerable. Things written by staff and volunteers alike get rolled back for similar reasons all the time, it's never personal.

I'm happy to do a performance review of the improved gadget before it's released to all users again. I think it's a much nicer experience to go through a constructive review before launch and avoid the disappointment of rollback. If you want a performance review of anything, create a task tagged with Performance-Team explaining what you need and we'll let you know how soon we can do it.

TheDJ added a comment.Aug 19 2017, 4:03 AM

I don't care about the rollback.

1: We have non-extendable mobile ui that does it's own thing whenever it can
2: A UI widget set that is apparently unusable on mobile (OOJS UI)
3: A broken gadget system for mobile
4: Foundation who seem intent on taking away the options and freedoms that have repeatedly proven to be extremely valuable, for short term benefits to readers.
5: The notion that Maps are not valuable enough to continue to work on them in the first place.

^^^^Those are our problems and I can't fix them, that's what pisses me off.

TheDJ added a comment.EditedAug 19 2017, 4:28 AM

This should probably make a lot of difference for payload. Tomorrow i'm in a plane, so talk to you on sunday evening :)
https://en.wikipedia.org/w/index.php?title=MediaWiki%3AGadget-mobilemaps.js&type=revision&diff=796193949&oldid=795264572

TheDJ added a comment.EditedAug 21 2017, 11:31 AM

So to reiterate:

  1. Payload concerns
    1. Alleviated by avoiding oojs ui widgets
    2. Only affected location pages, which should be 4% of our pageviews (according to JKatz)
  2. Concerns regarding Sierra Leone tiles
    1. Could be a Kartotherian issue
    2. Could be a connection problem
    3. Could be an overload due to increased usage ?
      1. I do not see a major increase in resource usage in grafana, though it does seem there was a half to 1 million served tiles 'bump' when the gadget was active.
    4. Regardless, this seems likely to be serverside, if anything, so not directly related to the gadget
  3. Amire reported issues with Firefox
    1. These issues cannot be reproduced
    2. No logging is available from the problems.
  4. Gadgets don't work for anymous mobile users
    1. We can avoid this by loading from MediaWiki:Mobile.js instead
  5. The default gadget throws a non-problematic console error
    1. We can avoid this by loading from MediaWiki:Mobile.js instead

With regards to performance a few questions:

  • Is ops aware of this? This would send significant traffic to the tile servers and I have no idea what is going on with Maps these days, but I'm guessing there were reasons the team was disbanded and work suspended. Ignore if they've been notified.
    • Are we adding the gadget on desktop too? Same applies.
  • The addition of the map logo causes a repaint and reflow - a significant problem on mobile devices.
    • The reflow should be removed
    • The repaint can only be solved if we bake the HTML into the response, which as far as I'm aware is not possible with gadgets..(?)
  • Have we tested this on lower end/older devices with low memory? Historically we've had problems with JS crashing the browser. If not I'd advise we organise some QA before increasing the audience.
Yurik added a subscriber: Yurik.Aug 21 2017, 10:22 PM

@Jdlrobson, the tile servers were ready for this (and much higher) level of traffic a year ago per multiple chats with ops. The disbanding of the team was not due to technology. This feature is actually very minor from the tile server perspective.

TheDJ added a comment.EditedAug 22 2017, 1:00 AM

With regards to performance a few questions:

  • Is ops aware of this? This would send significant traffic to the tile servers and I have no idea what is going on with Maps these days, but I'm guessing there were reasons the team was disbanded and work suspended. Ignore if they've been notified.

I was told during Wikimania, that the servers are mostly idling and currently are used more by 3rd parties than by ourselves. Also previous enablement didn't show significant effects on the tiling server in grafana. But if there are, it can easily be disabled again.

  • Are we adding the gadget on desktop too? Same applies.

No, desktop already has it's own toys.

  • The addition of the map logo causes a repaint and reflow - a significant problem on mobile devices.
    • The reflow should be removed

Should be fixed now. I think it was caused by the float change of this week, to the CSS for the page actions of tablet vs phone mode.

  • The repaint can only be solved if we bake the HTML into the response, which as far as I'm aware is not possible with gadgets..(?)

Indeed. We could eventually consider moving this into the extension perhaps, but I don't think that's hugely beneficial right now.

  • Have we tested this on lower end/older devices with low memory? Historically we've had problems with JS crashing the browser. If not I'd advise we organise some QA before increasing the audience.

I tested with Chrome profiler on a more complicated page like: https://en.m.wikipedia.org/wiki/List_of_monuments_of_the_Gettysburg_Battlefield
And while there is an increase in memory and heap size, it's not more than 5% (at the very most). Regardless, if 5% of devices has problems on 4% of the pageviews, and the rest of us get maps, then it's time to say goodbye to a few oldies.

Gehel added a subscriber: Gehel.Aug 22 2017, 7:07 PM

@Jdlrobson, the tile servers were ready for this (and much higher) level of traffic a year ago per multiple chats with ops. The disbanding of the team was not due to technology. This feature is actually very minor from the tile server perspective.

Yep, I agree, there is no reason to be worried in term of load on the maps servers at this point.

debt added a comment.Aug 22 2017, 7:08 PM

I will set up a meeting this week to chat about these concerns.

debt added a comment.Aug 24 2017, 12:11 AM

I've scheduled a meeting on Aug 29, let me know if you'd like to join and didn't get an invite (I didn't have everyone's email addresses).

mpopov added a subscriber: mpopov.Aug 24 2017, 12:34 AM

Per request from @JKatzWMF:

Data:

Hive query that generated it:

SELECT
  DATE(CONCAT(year, '-', month, '-', day)) AS `date`,
  INSTR(PARSE_URL(referer, 'HOST'), '.m.') > 0 AS is_mobile,
  COUNT(1) AS tiles
FROM wmf.webrequest
WHERE
  webrequest_source = 'upload'
  AND year = 2017 AND month = 8
  AND uri_host = 'maps.wikimedia.org'
  AND http_status IN('200', '304')
  AND uri_path RLIKE '^/([^/]+)/([0-9]{1,2})/(-?[0-9]+)/(-?[0-9]+)(@([0-9]\.?[0-9]?)x)?\.([a-z]+)$'
  AND uri_query <> '?loadtesting'
  AND REGEXP_EXTRACT(uri_path, '^/([^/]+)/([0-9]{1,2})/(-?[0-9]+)/(-?[0-9]+)(@([0-9]\.?[0-9]?)x)?\.([a-z]+)$', 1) != '' -- style
  AND REGEXP_EXTRACT(uri_path, '^/([^/]+)/([0-9]{1,2})/(-?[0-9]+)/(-?[0-9]+)(@([0-9]\.?[0-9]?)x)?\.([a-z]+)$', 2) != '' -- zoom
  AND referer_class = 'internal'
GROUP BY
  DATE(CONCAT(year, '-', month, '-', day)),
  INSTR(PARSE_URL(referer, 'HOST'), '.m.') > 0;

Visualization code:

# install.packages("devtools")
# devtools::install_git("https://gerrit.wikimedia.org/r/wikimedia/discovery/polloi")
tiles <- read.csv("~/Downloads/Internal Maps usage over the past couple weeks.csv")
tiles$date <- as.Date(tiles$date)
tiles$platform <- factor(
  ifelse(tiles$is_mobile == "True", "Mobile", "Not mobile"),
  c("Not mobile", "Mobile")
)
tiles <- tiles[tiles$date < max(tiles$date), ]

library(ggplot2) # install.packages("ggplot2")
ggplot(tiles, aes(x = date, y = tiles, fill = platform)) +
  geom_area(position = "stack", color = "black") +
  geom_vline(
    xintercept = as.numeric(as.Date(c("2017-08-13", "2017-08-17"))),
    linetype = "dashed"
  ) +
  scale_x_date(
    date_breaks = "4 days",
    date_minor_breaks = "2 days",
    date_labels = "%d %b"
  ) +
  scale_fill_brewer(palette = "Set1") +
  scale_y_continuous("total tiles", labels = polloi::compress) +
  labs(
    title = "Kartotherian tiles served internally, by platform",
    subtitle = "Dashed lines indicate the period during which the module was enabled by default for anonymous visitors"
  ) +
  theme_minimal()

Thanks! @mpopov Super interesting. I was kind of expecting a bigger jump.

TheDJ added a comment.Aug 24 2017, 2:50 PM

So like 0.5 mil. bump on a daily basis

1: We of course need to consider that it would actually take a bit of time for people to start recognising what a new button is for.
2: If I interpret this correctly, that's on internal traffic, so after varnish ? Historic tileserver stats seem to show a 1 to 10 ratio (5M to 55M) for tileserver vs. varnish tiles. We cannot really compare those numbers of course, but I guess it's safe to say that raw tiles served could have seen a 5M bump
3: I'm also very curious what this does/means for the '3rd party' vs. wmf usage ratio's, since that ratio was also highly skewed.

I also note some other interesting things besides the tiles served during that period:
1: After the gadget was disabled, the tileserver didn't drop back completely to the previous level. Seems like there is a residual increase for some reason on mobile...
2: Geohack usage seems to have a significant increased (20% ish?), both during and AFTER the gadget: https://discovery.wmflabs.org/maps/#geohack_usage I'm not exactly sure what specific interaction we are measuring here ? Is this is a sideeffect of the gadget having been enabled (were gadget clicks counted as geohacklink clicks maybe ?), or a side effect of Wikimania map enthusiasm, geohack usage having expanded to more language variants, something else... ?

Yurik added a comment.Aug 24 2017, 7:05 PM

@TheDJ I would call this a low signal to noise ratio - tile servers are mostly used by non-wikipedia sites, so that little blimp in internal usage is not consequential. Besides, we already saw that blimp before, when geohack enabled tiles for enwiki. I'm still not sure why it is still being discussed per @Gehel above, there is no problem with performance there. The fact that these servers benefit unrelated 3rd party sites, but does not benefit most of Wikipedia itself is a major misuse of donated funds, and should be investigated further.

The bigger problem is that if WMF, without any discussion with the community, made a new "super protect" - disabling community's ability to customize Wikipedia, we should raise it as a major violation of trust. Wikipedia is a community, so it should not be managed as a corporation. Please update what is the current status of this issue, so that more attention can be brought to it if needed.

The major increase in per-page-view traffic was clearly a problem, and should have been quickly reverted when noticed -- allowing the community to fix it. Some changes may be too hacky to to in javascript, and may require more substantial change to the platform, but this would usually affect the UI of all wikis at once, and may require community consensus. If submitted to gerrit, I will review, and if acceptable, merge any patches that will improve Wikipedia content and bring it into 21st century.

debt moved this task from In progress to Stalled/Waiting on the Maps-Sprint board.Aug 29 2017, 7:11 PM
TheDJ closed this task as Resolved.Aug 30 2017, 10:03 AM

I'm closing this ticket, as the primary source of this issue has been dealt with. Follow up and future steps are going to be tracked from T174538: Relaunch Geohack mobile maps gadget experiment on English Wikipedia.

Also, let's please not drag Superprotect into this, as it is totally unrelated to this, as this was not a community consensus based issue (people acting on their own repute cannot be compared to a community consensus).

debt moved this task from Stalled/Waiting to Done on the Maps-Sprint board.Aug 30 2017, 7:27 PM