Page MenuHomePhabricator

Investigate memory issues surrounding trial drafttopic deployment
Closed, ResolvedPublic

Description

Around the time that I did a trial deployment to ores1001, that machine became sad and memory use climbed to the ceiling repeatedly. This is a dedicated box, so we're almost certainly to blame.

Check the exact timing of this issue relative to my deployment and rollback, and see if any diagnostics are still available.

ores1001 windows of doom:
https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&from=1525812264000&to=1526082204142&panelId=24&fullscreen

  • 2018-05-09 22:02 memory nosedives
  • 2018-05-09 22:45 memory bottoms out and begins to sawtooth from 17GB to 0GB
  • 2018-05-10 01:49-ish, memory pressure seems to ease although sawtooth still goes deeper than usual, down to 6GB.
  • 2018-05-10 20:46 hits bottom again
  • 2018-05-11 00:45 hits bottom
  • 2018-05-11 01:33 seems to rebound to normal levels, although more bottoming-out occurs

Other servers show similar deep spikes, but not the severe memory pressure of ores1001 on 2018-05-09.

ores1001 deployment timeline:

  • 2018-05-01 21:48 git-lfs test (rORESDEPLOY4601497c4f43)
  • 2018-05-01 21:54 git-lfs test (52347e0)
  • 2018-05-01 22:01 rolled back (bf182e2)
  • 2018-05-09 22:05 git-lfs test 1 (c0db102)
  • 2018-05-09 22:09 git-lfs test 2 (c0db102)
  • 2018-05-09 22:20 git-lfs test 3 (2a09939)
  • 2018-05-09 23:06 drafttopic trial deployed (bf1e2b1)
  • 2018-05-09 23:24 rolled back (bf182e2)

Event Timeline

awight triaged this task as Medium priority.May 25 2018, 2:55 PM
awight created this task.
awight moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.

This glitch doesn't correspond with our deployments in any way I can make sense of. My instinct is to go ahead with production deployment of the drafttopic model.

Vvjjkkii renamed this task from Investigate memory issues surrounding trial drafttopic deployment to 69baaaaaaa.Jul 1 2018, 1:08 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed awight as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: Aklapper, gerritbot.
CommunityTechBot renamed this task from 69baaaaaaa to Investigate memory issues surrounding trial drafttopic deployment.Jul 2 2018, 4:22 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to awight.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: Aklapper, gerritbot.