Page MenuHomePhabricator

Popups: [Spike, 16hrs] Replicate Popups memory leak without other extensions
Closed, ResolvedPublic0 Story Points

Description

A memory leak has been detected in the Popups extension.
The purpose of this spike is to isolate it further by reducing noise from other extensions.

Outcomes of the spike

  • Someone on Reading web team is able to replicate the memory leak on a production page via the detailed notes in https://phabricator.wikimedia.org/T205127#4675717
  • The same memory leak has been replicated on a localhost/staging environment with minimal setup (to isolate leaks from other extensions)

[ :( ] The origin of the memory leak has been located. Ideally we should know exactly what is causing this, but at minimum let's aim to understand which module/part of the feature is at fault.

  • Is the memory leak OS specific or for all browsers?

Output

  • This ticket has been closed with an actionable next step to remove the memory leak. Note: Fixing the memory leak is not necessarily in scope for this task.

Event Timeline

Jdlrobson triaged this task as Normal priority.Nov 2 2018, 10:11 PM
Jdlrobson raised the priority of this task from Normal to High.
Jdlrobson created this task.
Jdlrobson moved this task from To Triage to Upcoming on the Readers-Web-Backlog board.
Jdlrobson lowered the priority of this task from High to Normal.Nov 6 2018, 5:37 PM

Reminder for @Jdlrobson: This doesn't seem to be assigned to anyone yet.

Jdlrobson added a comment.EditedNov 12 2018, 10:33 PM

@Whatamidoing-WMF That's because nobody is actively working on it right now. A time-sensitive A/B test (T209306) is draining the entire team's time (sorry about this!).

Jdlrobson renamed this task from [Spike, 16hrs] Replicate Popups memory leak without other extensions to Popups: [Spike, 16hrs] Replicate Popups memory leak without other extensions.Nov 15 2018, 6:48 PM
Jdlrobson updated the task description. (Show Details)Nov 21 2018, 5:30 PM
Jdlrobson set the point value for this task to 0.Nov 21 2018, 5:33 PM
JFG added a comment.Nov 26 2018, 8:58 PM

Any news? Can I help in any way?

ovasileva raised the priority of this task from Normal to High.Nov 27 2018, 5:05 PM

I'm still digging into the code. It looks like there is a memory leak (I don't want to confirm it yet. First I need to understand what's going on.). Most probably the leak is caused by hover events when Popups and EventLogging are enabled.
I also noticed that after removing PagePreviews MediaWiki is still leaking (not that much but still).-

^ and to clarify this is on a VANILLA instance (e.g. little to no extensions) \o/ some progress....

We think we've isolated this problem to https://github.com/wikimedia/mediawiki-extensions-Popups/blob/master/src/reducers/nextState.js
We notice that using Object.assign here makes 90% of the memory leaks go away. I believe this relates to using Redux with mutable states rather than the immutable ones it was intended to. We also think we've found a small memory leak in VisualEditor @pmiazga will follow up with some actionables early next week once he has time to write up his findings.

I can replicate memory leak, and honestly, it leaks a lot. Good part is that we do not leak HTML DOM nodes, most of the allocated (and not freed up) memory is code/closures/variables/arrays/objects. I was testing most of the stuff using local vagrant environment with only Popups and EventLogging extensions. Popups were configured to use restbaseHTML gateway - pointing to the production wiki.

I was able to narrow searching to couple elements:

  • nextState() is copying over lots of elements (objects/arrays/DOM nodes), changing nextState to object.assign() reduces the leak (although still some memory is allocated)
  • wait() is called very, very often (it returns a promise that resolves after some time). I'm not sure if this is causing a memory leak, but definitely, something that we can verify as the code is primarily calling that function many times on each link interaction
  • when I remove all related tracking (statsvTracker, eventLoggingTracker, pageviewTracker) the memory footprint is lower, but that's most probably because the mw.track keeps a history of all events.

To solve the nextState() issue, I made T211168. Whenever I have time, I'll keep playing with the Memory tab and I'll try to find more memory leaks. Every day I learn something new about how to use Chromium profiling tools.

environment with only Popups and EventLogging extensions. Popups were configured to use restbaseHTML gateway

Nice detail!

Chromium profiling tools.

Is the nextState() leak identified reproducible in Firefox?

I'm wanting to dig into this a little more to see if I can isolate it to a particular part of nextState (see T211168#4818446)

Jdlrobson updated the task description. (Show Details)Dec 13 2018, 9:16 PM
Jdlrobson updated the task description. (Show Details)

I'm actually struggling to replicate this now. I've been taking memory snapshots on the home page

Without Popups installed (and subsequent hovers): 4.7mb
Enable Popups (but no hovers): 5.2mb
After one hover: 5.5mb
After several hovers: 5.8mb

Next state using "Object assign":
Enable Popups: 5.4mb
After one hover: 5.6mb
After several hovers: 5.8mb

I'm not 100% convinces nextState is the culprit here but I do see increases in memory as I hover (but nothing that's getting out of control).
Installed skins and extensions:

  • MinervaNeue
  • Timeless
  • Vector
  • Collection
  • Echo
  • MassMessage
  • SiteMatrix
  • Babel
  • GeoCrumbs
  • ParserFunctions
  • PageImages
  • EventLogging
  • MobileFrontend
  • Page previews
  • QuickSurveys
  • RelatedArticles
  • TextExtracts
  • Thanks
  • WikidataPageBanner
  • WikimediaEvents

@pmiazga can you double check if you can still reproduce this and if so print me a list of what you see on Special:Version (above list generated by Array.from($('.mw-version-ext td:first-child')).map((row)=>'* ' + $(row).text()).join('\n')

@Jdlrobson this also might be related to the browser, did you call GC after doing tests? what is your chrome/os version? Sure, I'll try to reproduce that locally once again.

I'm testing that only with Popups and EventLogging extension enabled, using Chrome 71 on Debian 9.6. Mediawiki sha 76396ff888e134fdcb86be0560c9f51acce577bb, Popups sha 53015b7237e99b51fdf575bb8bdab6af61887b18

The memory leak occurs, nextState() is not the main cause, it never was. But it looked like we can patch the small leak I thought I found in the nextState() function. From my testing, I found that if we use Object.assign() and compare two different js heap snapshots - using Object.assign() helps with a bit - It causes the system to store fewer objects allocated (delta column from heap snapshot comparison).

I re-did tests:
Testing scenario: - show 25 page previews + 25 dwellButAbandoned in ~ 3minutes, eventLogging, statsvLogging and pageviewTracker disabled, debugging flag off
I did lots of test and now I have to agree with Jon, now it looks like the footprint both for original nextState and the Object.assign is very similar.

I need to dig deeper.

We've exhausted our timebox here and sadly no new information.
Three of us have looked into this and have not found any evidence of a memory leak in Popups while the web page is idle. In fact snapshots taken hours apart show the memory heap size stays constant. While there is likely a small memory leak, (<0.1mb on every link hover) combing through the codebase and snapshots as not shown any culprits.

However, we do notice large memory allocations in Chrome, but it's not clear what that is for. Could it be Chrome internals? Similar memory spikes are happening on tabs with slack.com and google inbox so this could well be a browser problem.

As a next step, it might be worth pairing with the performance team (@Krinkle) in January to investigate other ways we can get to the bottom of this, but right now it doesn't look like Popups is the culprit.

Jdlrobson closed this task as Resolved.Dec 21 2018, 6:53 PM
Jdlrobson updated the task description. (Show Details)

I'm closing this task. Next step will be to talk to performance (per T205127#4841038) and work out other ways we can identify what's happening here. Sadly this spike didn't get us any answers.