Today celery was killed repeatedly in ores codfw hosts:
ores2001.codfw.wmnet: Jan 14 06:25:15 ores2001 kernel: [20960005.387705] Out of memory: Kill process 10948 (celery) score 36 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:20 ores2001 kernel: [20960009.490598] Out of memory: Kill process 11429 (celery) score 36 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:23 ores2001 kernel: [20960013.535409] Out of memory: Kill process 14602 (celery) score 36 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:29 ores2001 kernel: [20960014.446386] Out of memory: Kill process 11256 (celery) score 35 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:29 ores2001 kernel: [20960015.442553] Out of memory: Kill process 20074 (celery) score 35 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:29 ores2001 kernel: [20960017.391891] Out of memory: Kill process 13538 (celery) score 35 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:29 ores2001 kernel: [20960019.497964] Out of memory: Kill process 14019 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:29 ores2001 kernel: [20960019.557037] Out of memory: Kill process 15973 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:31 ores2001 kernel: [20960020.429315] Out of memory: Kill process 13777 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:31 ores2001 kernel: [20960021.413731] Out of memory: Kill process 13693 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:39 ores2001 kernel: [20960024.471984] Out of memory: Kill process 10363 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:39 ores2001 kernel: [20960026.468521] Out of memory: Kill process 10533 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:39 ores2001 kernel: [20960028.431911] Out of memory: Kill process 10630 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:43 ores2001 kernel: [20960031.406687] Out of memory: Kill process 11099 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:43 ores2001 kernel: [20960033.151720] Out of memory: Kill process 11508 (celery) score 34 or sacrifice child ores2001.codfw.wmnet: Jan 14 06:25:44 ores2001 kernel: [20960034.367117] Out of memory: Kill process 10868 (celery) score 34 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:29 ores2002 kernel: [20960564.242765] Out of memory: Kill process 24990 (celery) score 35 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:30 ores2002 kernel: [20960565.541888] Out of memory: Kill process 23085 (celery) score 35 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:32 ores2002 kernel: [20960567.298123] Out of memory: Kill process 21942 (celery) score 34 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:35 ores2002 kernel: [20960570.294081] Out of memory: Kill process 22547 (celery) score 34 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:36 ores2002 kernel: [20960571.285116] Out of memory: Kill process 22791 (celery) score 34 or sacrifice child ores2002.codfw.wmnet: Jan 14 06:25:37 ores2002 kernel: [20960572.043351] Out of memory: Kill process 25893 (celery) score 34 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:08 ores2005 kernel: [20957911.232125] Out of memory: Kill process 25376 (celery) score 41 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:15 ores2005 kernel: [20957916.541963] Out of memory: Kill process 30622 (celery) score 37 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:21 ores2005 kernel: [20957921.581982] Out of memory: Kill process 29757 (celery) score 36 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:21 ores2005 kernel: [20957923.510427] Out of memory: Kill process 30451 (celery) score 36 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:23 ores2005 kernel: [20957925.286007] Out of memory: Kill process 25932 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:23 ores2005 kernel: [20957926.367625] Out of memory: Kill process 19148 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:24 ores2005 kernel: [20957927.283672] Out of memory: Kill process 32869 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:27 ores2005 kernel: [20957928.412759] Out of memory: Kill process 28230 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:27 ores2005 kernel: [20957929.421421] Out of memory: Kill process 30671 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:29 ores2005 kernel: [20957931.354109] Out of memory: Kill process 28560 (celery) score 35 or sacrifice child ores2005.codfw.wmnet: Jan 14 06:25:31 ores2005 kernel: [20957933.294028] Out of memory: Kill process 28387 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:19 ores2008 kernel: [20956860.721316] Out of memory: Kill process 30348 (celery) score 36 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:20 ores2008 kernel: [20956862.774618] Out of memory: Kill process 32102 (celery) score 36 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:24 ores2008 kernel: [20956865.918839] Out of memory: Kill process 38756 (celery) score 36 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:28 ores2008 kernel: [20956869.855087] Out of memory: Kill process 6424 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:28 ores2008 kernel: [20956869.906351] Out of memory: Kill process 4290 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:30 ores2008 kernel: [20956872.729098] Out of memory: Kill process 38836 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:31 ores2008 kernel: [20956873.701863] Out of memory: Kill process 30511 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:32 ores2008 kernel: [20956874.699207] Out of memory: Kill process 33477 (celery) score 35 or sacrifice child ores2008.codfw.wmnet: Jan 14 06:25:35 ores2008 kernel: [20956877.735192] Out of memory: Kill process 33768 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:10 ores2009 kernel: [20955449.877789] Out of memory: Kill process 738 (celery) score 39 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:10 ores2009 kernel: [20955451.110142] Out of memory: Kill process 7289 (celery) score 36 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:14 ores2009 kernel: [20955455.424752] Out of memory: Kill process 7209 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:18 ores2009 kernel: [20955459.592900] Out of memory: Kill process 11906 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:23 ores2009 kernel: [20955463.567749] Out of memory: Kill process 1323 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:27 ores2009 kernel: [20955468.150691] Out of memory: Kill process 509 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:35 ores2009 kernel: [20955472.300157] Out of memory: Kill process 9055 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:38 ores2009 kernel: [20955476.284701] Out of memory: Kill process 3433 (celery) score 35 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:38 ores2009 kernel: [20955477.440998] Out of memory: Kill process 4931 (celery) score 34 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:38 ores2009 kernel: [20955478.353819] Out of memory: Kill process 372 (celery) score 34 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:38 ores2009 kernel: [20955479.289623] Out of memory: Kill process 1184 (celery) score 34 or sacrifice child ores2009.codfw.wmnet: Jan 14 06:25:39 ores2009 kernel: [20955480.202927] Out of memory: Kill process 10438 (celery) score 34 or sacrifice child
Metrics around that time: https://grafana.wikimedia.org/d/HIRrxQ6mk/ores?orgId=1&from=1578980790712&to=1578985954606
From the /srv/log/ores/main.log there seem to be a ton of ChangeProp related entries like:
[2020-01-13T06:25:57] [pid: 8261] 10.64.16.21 (-) {30 vars in 418 bytes} [Mon Jan 13 06:25:54 2020] POST /v3/precache => generated 2239 bytes in 474 msecs (HTTP/1.1 200) 8 headers in 288 bytes (1 switches on core 0) user agent "ChangePropagation/WMF"
On ores2001 I moved all the /srv/log/ores dir to /home/elukey/14012020_celery_oom to prevent logrotate to delete data if needed for debugging.