Maniphest T192293

Check drafttopic model memory usage
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Apr 16 2018, 2:30 PM

Description

Simple check:

Start python interpreter.
Check memory usage. Record RES.
Load drafttopic model. Use it to make a prediction.
Check memory usage. Record RES.
Post results here.

ORES check:

Load ORES web and celery workers as configured from ores-wmflabs-deploy
Check memory usage (both celery and web workers)
Add drafttopic model to configuration and restart ORES service
Use drafttopic model to make a prediction.
Check memory usage (both celery and web workers)

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T176324 Scoring platform team FY18 Q2
Resolved	Halfak	T183198 Scoring Platform FY18 Q3
Resolved	awight	T176336 Deploy drafttopic model to production ORES
Resolved	awight	T192293 Check drafttopic model memory usage

Event Timeline

Halfak triaged this task as Medium priority.Apr 16 2018, 2:30 PM

Halfak created this task.

awight claimed this task.May 1 2018, 11:10 PM

awight added a subscriber: Sumit.

Simple check:

Not loaded

halfak    5029  7.0  0.8 480916 69776 pts/1    Sl+  17:13   0:01 python

Loaded:

halfak    5029 32.4  6.6 1036444 521708 pts/1  S+   17:13   0:10 python

Note that making a prediction did not change memory usage at all.

Slightly different results when running under ORES, it seems that performing drafttopic scoring does have a one-time effect on RES for each thread.

Not configured:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
awight   28404  0.0  3.4 745336 272956 pts/6   S+   15:48   0:00 python ores_celery.py

Configured but not yet scored:

awight   31781  0.0  7.6 1079512 598680 pts/6  S+   15:53   0:00 python ores_celery.py

After scoring:

awight   31820  0.0  7.6 1081632 600860 pts/6  S+   15:53   0:00 python ores_celery.py

Reading your notes, should we expect a ~2x increase in per-worker memory usage (based on a naive interpretation of RSS?)

I'm pretty certain this won't cause 2x overall memory usage, although it is the upper bound. Some of RSS ends up being copy-on-write shared across processes, but we can't predict what proportion that will be. I think we should check memory usage on the canary box before doing the full deployment.

awight moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.May 9 2018, 6:33 PM

https://graphite-labs.wikimedia.org/graphlot/?width=792&height=372&_salt=1525981244.19&target=ores-staging.ores-staging-01.memory.MemFree&from=18%3A00_20180510&until=23%3A59_20180510

I see a shift from 7.4 to 3.7GB free on ores-staging-01.

I see 8 celery workers and 48 uwsgi workers.

We're running really low on memory for the celery workers so I'm reducing concurrency from 16 to 14.

OK everything looks good. We're running nicely on ores.wmflabs.org. See https://ores.wmflabs.org/v3/scores/enwiki/80413871 :)

awight closed this task as Resolved.Jun 4 2018, 9:28 PM

Check drafttopic model memory usageClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Check drafttopic model memory usage
Closed, ResolvedPublic
Actions

Related Objects
Search...