Page MenuHomePhabricator

ORES workers using dramatically higher CPU, increasing linearly with time
Closed, ResolvedPublic

Event Timeline

awight triaged this task as High priority.Oct 10 2018, 4:45 PM

Looks like the trouble started on Sept 22nd, 16:00 UTC.

Restarting celery workers on each codfw box reset the usage back to zero, which buys us some time. I'm assuming that it'll behave the same way but on eqiad now. We'll see!

Some detailed graph of the jump: https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=5&fullscreen&orgId=1&from=1537477850460&to=1537720679875

image.png (982×1 px, 646 KB)

It's Saturday and nothing happened around the time.
Poolcounter probably is not at fault here, it was deployed three days before this jump.
One weird thing is that ores1009 has high cpu usage right now but eqiad should not get much requests.

Might be related: https://github.com/celery/celery/issues/1558
Our current kombu version is 3.0.37, people reported such issues with that version and said upgrading to kombu 4.0.1 would solve it

And also https://github.com/celery/celery/issues/2142 Both of them basically mean we need to upgrade kombu, celery 3.1.26 (latest pre 4 version) doesn't use new version of kombu: https://github.com/celery/celery/blob/11e98c48357e589a045c5397bf3d8f5e7f11603e/requirements/default.txt i.e. This will be resolved if we upgrade to celery 4

Both issues mention a poll syscall returning POLLNVAL that we should be able to see in the strace if it's the same problem. Is it the worker manager or the worker threads which go 100% CPU? Do the straces match this profile?

Here's an interesting comment on a related celery bug: https://github.com/celery/celery/issues/2906#issuecomment-153082379

Are we still killing the workers after serving every N tasks?

The bugs you found suggest that it's the connection to Redis which is causing the grief, so I tried to correlate Redis glitches with the CPU spikes with no luck. However, look at these strange cut-outs, which might indicate network flakiness:
https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=31&fullscreen&orgId=1&from=1537647142541&to=1537699593504
https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=10&fullscreen&orgId=1&from=1537647142541&to=1537699593504

mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000

Here's the strace profile

And gdb bt output

#0  0x00005628905a4140 in ?? ()
#1  0x00007fbf68c51b8b in Tokenizer_push_textbuffer (self=self@entry=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_support.c:61
#2  0x00007fbf68c51d3e in Tokenizer_pop_keeping_context (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_support.c:123
#3  0x00007fbf68c518fd in Tokenizer_handle_table_cell (self=self@entry=0x7fbfd9867718, markup=markup@entry=0x7fbf68c53105 "|", 
    tag=tag@entry=0x7fbf68c52f63 "td", line_context=line_context@entry=34359738368) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2349
#4  0x00007fbf68c4f0aa in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2782
#5  0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#6  Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#7  0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#8  Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#9  0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#10 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#11 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#12 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#13 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#14 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#15 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#16 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#17 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#18 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#19 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#20 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#21 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#22 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#23 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#24 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#25 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#26 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#27 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#28 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#29 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#30 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#31 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#32 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#33 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#34 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#35 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#36 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#37 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#38 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#39 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#40 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#41 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#42 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#43 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#44 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#45 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#46 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#47 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#48 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
---Type <return> to continue, or q <return> to quit--- 
#49 0x00007fbf68c4ee2e in Tokenizer_parse_italics_and_bold (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1886
#50 Tokenizer_parse_style (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1967
#51 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2703
#52 0x00007fbf68c50a1d in Tokenizer_parse_wikilink (self=self@entry=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:357
#53 0x00007fbf68c4d86d in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2639
#54 0x00007fbf68c4ee2e in Tokenizer_parse_italics_and_bold (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1886
#55 Tokenizer_parse_style (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1967
#56 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2703
#57 0x00007fbf68c50a1d in Tokenizer_parse_wikilink (self=self@entry=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:357
#58 0x00007fbf68c4d86d in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2639
#59 0x00007fbf68c518fd in Tokenizer_handle_table_cell (self=self@entry=0x7fbfd9867718, markup=markup@entry=0x7fbf68c53105 "|", 
    tag=tag@entry=0x7fbf68c52f63 "td", line_context=line_context@entry=34359738368) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2349
#60 0x00007fbf68c4f0aa in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2782
#61 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#62 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#63 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#64 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#65 0x00007fbf68c4ee2e in Tokenizer_parse_italics_and_bold (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1886
#66 Tokenizer_parse_style (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1967
#67 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2703
#68 0x00007fbf68c50a1d in Tokenizer_parse_wikilink (self=self@entry=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:357
#69 0x00007fbf68c4d86d in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2639
#70 0x00007fbf68c4ee2e in Tokenizer_parse_italics_and_bold (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1886
#71 Tokenizer_parse_style (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:1967
#72 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2703
#73 0x00007fbf68c50a1d in Tokenizer_parse_wikilink (self=self@entry=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:357
#74 0x00007fbf68c4d86d in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2639
#75 0x00007fbf68c518fd in Tokenizer_handle_table_cell (self=self@entry=0x7fbfd9867718, markup=markup@entry=0x7fbf68c53105 "|", 
    tag=tag@entry=0x7fbf68c52f63 "td", line_context=line_context@entry=34359738368) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2349
#76 0x00007fbf68c4f0aa in Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2782
#77 0x00007fbf68c4f193 in Tokenizer_handle_table_row (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2311
#78 Tokenizer_parse (self=0x7fbfd9867718, context=<optimized out>, push=<optimized out>)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2776
#79 0x00007fbf68c4e33e in Tokenizer_parse_table (self=0x7fbfd9867718) at mwparserfromhell/parser/ctokenizer/tok_parse.c:2251
#80 Tokenizer_parse (self=self@entry=0x7fbfd9867718, context=<optimized out>, push=push@entry=1)
    at mwparserfromhell/parser/ctokenizer/tok_parse.c:2728
#81 0x00007fbf68c52640 in Tokenizer_tokenize (self=0x7fbfd9867718, args=0x7fbf540a95e8)
    at mwparserfromhell/parser/ctokenizer/tokenizer.c:167
#82 0x000056289059d6df in PyCFunction_Call ()
#83 0x000056289055d6e9 in PyEval_EvalFrameEx ()
#84 0x0000562890562286 in ?? ()
#85 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#86 0x0000562890562286 in ?? ()
#87 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#88 0x00005628905630df in PyEval_EvalCodeEx ()
#89 0x000056289059f7bf in ?? ()
#90 0x00005628905e6647 in PyObject_Call ()
#91 0x000056289055b180 in PyEval_EvalFrameEx ()
#92 0x00005628905630df in PyEval_EvalCodeEx ()
#93 0x000056289059f5d3 in ?? ()
#94 0x00005628905e6647 in PyObject_Call ()
#95 0x000056289050694e in ?? ()
#96 0x00005628905e6647 in PyObject_Call ()
#97 0x00005628905a6c46 in ?? ()
#98 0x00005628905e6647 in PyObject_Call ()
#99 0x000056289055b180 in PyEval_EvalFrameEx ()
#100 0x0000562890562286 in ?? ()
#101 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#102 0x0000562890562286 in ?? ()
#103 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#104 0x0000562890562286 in ?? ()
---Type <return> to continue, or q <return> to quit---
#105 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#106 0x0000562890562286 in ?? ()
#107 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#108 0x0000562890562286 in ?? ()
#109 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#110 0x0000562890562286 in ?? ()
#111 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#112 0x00005628905a0ccc in ?? ()
#113 0x00005628905d8de4 in ?? ()
#114 0x00005628905ae5c4 in ?? ()
#115 0x00005628905e6647 in PyObject_Call ()
#116 0x000056289055dee1 in PyEval_EvalFrameEx ()
#117 0x0000562890562286 in ?? ()
#118 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#119 0x0000562890562286 in ?? ()
#120 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#121 0x00005628905630df in PyEval_EvalCodeEx ()
#122 0x000056289059f7bf in ?? ()
#123 0x00005628905e6647 in PyObject_Call ()
#124 0x000056289055b180 in PyEval_EvalFrameEx ()
#125 0x00005628905627bf in ?? ()
#126 0x000056289055e4c9 in PyEval_EvalFrameEx ()
#127 0x0000562890562286 in ?? ()
#128 0x000056289055e4c9 in PyEval_EvalFrameEx ()
#129 0x0000562890563817 in PyEval_EvalCodeEx ()
#130 0x000056289059f7bf in ?? ()
#131 0x00005628905e6647 in PyObject_Call ()
#132 0x000056289055b180 in PyEval_EvalFrameEx ()
#133 0x0000562890563817 in PyEval_EvalCodeEx ()
#134 0x000056289059f7bf in ?? ()
#135 0x00005628905e6647 in PyObject_Call ()
#136 0x000056289050694e in ?? ()
#137 0x00005628905e6647 in PyObject_Call ()
#138 0x00005628905a6c46 in ?? ()
#139 0x00005628905e6647 in PyObject_Call ()
#140 0x000056289055b180 in PyEval_EvalFrameEx ()
#141 0x00005628905627bf in ?? ()
#142 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#143 0x00005628905630df in PyEval_EvalCodeEx ()
#144 0x000056289059f7bf in ?? ()
#145 0x00005628905e6647 in PyObject_Call ()
#146 0x000056289055b180 in PyEval_EvalFrameEx ()
#147 0x00005628905627bf in ?? ()
#148 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#149 0x00005628905627bf in ?? ()
#150 0x000056289055e4c9 in PyEval_EvalFrameEx ()
#151 0x000056289055d93f in PyEval_EvalFrameEx ()
#152 0x00005628905630df in PyEval_EvalCodeEx ()
#153 0x000056289059f5d3 in ?? ()
#154 0x00005628905e6647 in PyObject_Call ()
#155 0x000056289050694e in ?? ()
#156 0x00005628905e6647 in PyObject_Call ()
#157 0x00005628905a6aa9 in ?? ()
#158 0x00005628905ae5c4 in ?? ()
#159 0x00005628905e6647 in PyObject_Call ()
#160 0x000056289055dee1 in PyEval_EvalFrameEx ()
#161 0x000056289055d93f in PyEval_EvalFrameEx ()
#162 0x000056289055d93f in PyEval_EvalFrameEx ()
#163 0x00005628905627bf in ?? ()
#164 0x000056289055e4c9 in PyEval_EvalFrameEx ()
#165 0x000056289055d93f in PyEval_EvalFrameEx ()
#166 0x000056289055d93f in PyEval_EvalFrameEx ()
#167 0x000056289055d93f in PyEval_EvalFrameEx ()
#168 0x00005628905630df in PyEval_EvalCodeEx ()
#169 0x000056289059f5d3 in ?? ()
#170 0x00005628905e6647 in PyObject_Call ()
#171 0x000056289055b180 in PyEval_EvalFrameEx ()
#172 0x00005628905a0ccc in ?? ()
#173 0x00005628905649d1 in ?? ()
#174 0x000056289059d6df in PyCFunction_Call ()
#175 0x000056289055d6e9 in PyEval_EvalFrameEx ()
#176 0x00005628905630df in PyEval_EvalCodeEx ()
#177 0x000056289059f648 in ?? ()
---Type <return> to continue, or q <return> to quit---
#178 0x00005628905e6647 in PyObject_Call ()
#179 0x000056289055b180 in PyEval_EvalFrameEx ()
#180 0x000056289055d93f in PyEval_EvalFrameEx ()
#181 0x000056289055d93f in PyEval_EvalFrameEx ()
#182 0x000056289055d93f in PyEval_EvalFrameEx ()
#183 0x000056289055d93f in PyEval_EvalFrameEx ()
#184 0x000056289055d93f in PyEval_EvalFrameEx ()
#185 0x000056289055d93f in PyEval_EvalFrameEx ()
#186 0x00005628905630df in PyEval_EvalCodeEx ()
#187 0x000056289059f7bf in ?? ()
#188 0x00005628905e6647 in PyObject_Call ()
#189 0x000056289055b180 in PyEval_EvalFrameEx ()
#190 0x00005628905630df in PyEval_EvalCodeEx ()
#191 0x000056289059f7bf in ?? ()
#192 0x00005628905e6647 in PyObject_Call ()
#193 0x000056289050694e in ?? ()
#194 0x00005628905e6647 in PyObject_Call ()
#195 0x00005628905a6c46 in ?? ()
#196 0x00005628905e6647 in PyObject_Call ()
#197 0x000056289055b180 in PyEval_EvalFrameEx ()
#198 0x0000562890562286 in ?? ()
#199 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#200 0x0000562890562286 in ?? ()
#201 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#202 0x000056289055d93f in PyEval_EvalFrameEx ()
#203 0x0000562890562286 in ?? ()
#204 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#205 0x00005628905627bf in ?? ()
#206 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#207 0x0000562890562286 in ?? ()
#208 0x000056289055e0a9 in PyEval_EvalFrameEx ()
#209 0x000056289055d93f in PyEval_EvalFrameEx ()
#210 0x0000562890562286 in ?? ()
#211 0x0000562890562f9f in PyEval_EvalCode ()
#212 0x00005628906308f2 in ?? ()
#213 0x0000562890632e1d in PyRun_FileExFlags ()
#214 0x00005628906335be in PyRun_SimpleFileExFlags ()
#215 0x00005628906614d7 in Py_Main ()
#216 0x00005628904f1c01 in main ()

Unfortunately due to the venv I could not get (yet) the niceties of py-bt and py-list working but there's already an indication that this is busy looping parsing text. The ltrace output reinforces that idea

memset(0x7fbfd88ca7c8, '\0', 56)                                                      = 0x7fbfd88ca7c8
malloc(224)                                                                           = 0x5628e48e4e10
free(0x5628e48e4d20)                                                                  = <void>
realloc(0, 32)                                                                        = 0x5628e2db1390
memset(0x7fbfd88ca818, '\0', 56)                                                      = 0x7fbfd88ca818
malloc(224)                                                                           = 0x5628e48e4d20
free(0x5628e48e4c30)                                                                  = <void>
realloc(0, 40)                                                                        = 0x5628e36fda40
memmove(0x5628e36fda50, "", 0)                                                        = 0x5628e36fda50
free(0x5628e2db1390)                                                                  = <void>
malloc(224)                                                                           = 0x5628e48e4c30
strlen("text")                                                                        = 4
memcpy(0x7fbfd88c9258, "text", 4)                                                     = 0x7fbfd88c9258
memcmp(0x7fbfe0dcaed0, 0x7fbfd88c9258, 4, 64)                                         = 0
memset(0x7fbfd88ca868, '\0', 56)                                                      = 0x7fbfd88ca868
malloc(224)                                                                           = 0x5628e48e4f00
free(0x5628e48e4c30)                                                                  = <void>
realloc(0, 32)                                                                        = 0x5628e2db1390
malloc(224)                                                                           = 0x5628e48e4c30
strlen("|")                                                                           = 1
strlen("wiki_markup")                                                                 = 11
memcpy(0x7fbfd9b1ede0, "wiki_markup", 11)                                             = 0x7fbfd9b1ede0
memcmp(0x7fbf7410df20, 0x7fbfd9b1ede0, 11, 64)                                        = 0
memset(0x7fbfd88ca8b8, '\0', 56)                                                      = 0x7fbfd88ca8b8
malloc(224)                                                                           = 0x5628e48e4ff0
free(0x5628e48e4c30)                                                                  = <void>
malloc(224)                                                                           = 0x5628e48e4c30
strlen("text")                                                                        = 4
memcpy(0x7fbfd88c9290, "text", 4)                                                     = 0x7fbfd88c9290
memcmp(0x7fbfe0dcaed0, 0x7fbfd88c9290, 4, 64)                                         = 0
memset(0x7fbfd88ca908, '\0', 56)                                                      = 0x7fbfd88ca908
malloc(224)                                                                           = 0x5628e48e50e0
free(0x5628e48e4c30)                                                                  = <void>
memmove(0x5628e48e4a20, "", 0)                                                        = 0x5628e48e4a20
free(0x5628e36fda40)                                                                  = <void>
malloc(224)                                                                           = 0x5628e48e4c30
strlen("|")                                                                           = 1
strlen("wiki_markup")                                                                 = 11
memcpy(0x7fbfd9b1ede0, "wiki_markup", 11)                                             = 0x7fbfd9b1ede0
memcmp(0x7fbf7410df20, 0x7fbfd9b1ede0, 11, 64)                                        = 0
strlen("padding")                                                                     = 7
memcpy(0x7fbfd88c9290, "padding", 7)                                                  = 0x7fbfd88c9290
memcmp(0x7fbfdd6c5760, 0x7fbfd88c9290, 7, 64)                                         = 0
memset(0x7fbfd88ca958, '\0', 56)                                                      = 0x7fbfd88ca958
malloc(224)                                                                           = 0x5628e48e51d0
free(0x5628e48e4c30)                                                                  = <void>
memmove(0x5628e48e4a30, "", 0)                                                        = 0x5628e48e4a30
free(0x5628e2db1390)                                                                  = <void>
malloc(224)                                                                           = 0x5628e48e4c30
strlen("")                                                                            = 0
strlen("wiki_markup")                                                                 = 11
memcpy(0x7fbfd9b1ede0, "wiki_markup", 11)                                             = 0x7fbfd9b1ede0
memcmp(0x7fbf7410df20, 0x7fbfd9b1ede0, 11, 64)                                        = 0
memset(0x7fbfd88ca9a8, '\0', 56)                                                      = 0x7fbfd88ca9a8
malloc(224)                                                                           = 0x5628e48e52c0
free(0x5628e48e4c30)                                                                  = <void>
malloc(224)                                                                           = 0x5628e48e4c30
strlen("text")

I found this config on production, so in theory our workers should be restarting themselves:

CELERYD_MAX_TASKS_PER_CHILD: 100

Following up from IRC, I set loggers.celery.level to DEBUG in logging_config.yaml, and was able to demonstrate locally that celery debug logs would appear in /srv/log/ores/app.log.

I found this config on production, so in theory our workers should be restarting themselves:

CELERYD_MAX_TASKS_PER_CHILD: 100

I might be wrong but IIRC this works by the master counting the number of tasks a child completes (successfully or unsuccessfully, it's irrelevant) and killing them when that count reaches 100, starting a new process after that. It actually works, we 've witnessed that and can be easily witnessed by the age of celery workers on a machine (it's usually a few mins up to a few tens of mins).

But if a child goes into a forever loop and never completes ? This setting ain't gonna help.

Conceptually speaking, one solution to that would be a heartbeat between the celery master and the child and the master killing the child if it hasn't received a heartbeat in a preconfigured and sensible threshold. I do see celery supporting heartbeats per http://docs.celeryproject.org/en/2.1-archived/internals/reference/celery.worker.heartbeat.html since version 2.1 at least. Maybe it could help ?

Celery has timeouts and it should catch them. I guess revoking tasks might be at fault here.

One other observation, CPU usage doesn't go up linearly, it actually goes up with jumps. Since we restarted. Some nodes stayed stable but ores2004, ores2003 and ores2009 had the jump on midnight and stayed like this since then. If you look at other cases, it's all sudden jumps.

From @akosiaris in chat

[06:45:33] <akosiaris> mwparserfromhell honors it's name
[06:45:43] <akosiaris> (gdb) py-bt 
[06:45:43] <akosiaris> Traceback (most recent call first):
[06:45:44] <akosiaris>   <built-in method tokenize of _tokenizer.CTokenizer object at remote 0x7fbfd9867718>
[06:45:44] <akosiaris> Python Exception <class 'RuntimeError'> Type does not have a target.: 
[06:45:46] <akosiaris> sigh
[06:46:12] <akosiaris> good luck figuring out now the python stacktrace

I think that we ought to (1) monitor CPU usage. Should never go above 50% for any sustained period of time and (2) update mwparserfromhell in hopes that the issue has already been addressed.

@Ladsgroup just noted in IRC that we are running the most recent version of mwparserfromhell already (0.5.1). So I'm looking into why these processes are not timing out.

We are *supposedly* using Unix signals to time out a processing job that takes too long. See https://github.com/wikimedia/ores/blob/master/ores/util.py#L25

See this for a discussion of threading/signal timeouts: https://pypi.org/project/stopit/#comparing-thread-based-and-signal-based-timeout-control

I've demonstrated that ORES timeout mechanism works for an old mwparserfromhell recursion bug in version 0.4.4 based on this report: https://github.com/earwig/mwparserfromhell/issues/190

See my work here: https://gist.github.com/halfak/281a692873d538e77c6096f5042e8be2

I also checked if ores extension has changed the way it does requests and jobs. Nothing of interest came out.

I've demonstrated that ORES timeout mechanism works for an old mwparserfromhell recursion bug in version 0.4.4 based on this report: https://github.com/earwig/mwparserfromhell/issues/190

See my work here: https://gist.github.com/halfak/281a692873d538e77c6096f5042e8be2

Is this however being used when calling out to mwparserfromhell ? The only usage of it I tracked down is https://github.com/wikimedia/ores/blob/master/ores/scoring_systems/scoring_system.py#L163. I don't know the code enough to understand if it's sufficient though.

Yes. All calls to mwparserfromhell happen within that timeout() call.

I 'm adding some extra gdb info in P7669

I think I 've managed to reproduce the problem. After fetching and trying the last 500 revisions from https://es.wikipedia.org/wiki/Usuario:Danielalfredo/Taller and failing to reproduce it, I returned to the gdb process. After a very long fight understanding Tokenizer and with how the data is stored internally in Tokenizer values I 've managed to obtain the following

(gdb) p *(Tokenizer *) self
$37 = {
  ob_base = {
    ob_refcnt = 2, 
    ob_type = 0x7fa7df1004a0 <TokenizerType>
  }, 
  text = {
    object = 0x56178bc3ab40, 
    length = 633838, 
    kind = 2, 
    data = 0x56178bc3ab88
  }, 
  topstack = 0x7fa84fc9d720, 
  head = 633087, 
  global = 0, 
  depth = 39, 
  route_state = 0, 
  route_context = 0, 
  bad_routes = 0x7fa849fd91f8, 
  skip_style_tags = 0
}

which points out we have a text unicode object, with kind=2, that is UCS2 (since pep393 unicode strings in python are size variable, and that's the way it's implemeted AFAICT), of length 633838 starting at address 0x56178bc3ab88. . Calculating 0x56178bc3ab88 + 633838*2 bytes for the end address we have 0x56178BD70364. I 've dumped then that memory.

dump memory esbugmem 0x56178bc3ab88 0x56178BD70364

And crafted the following simple python script to parse it

import mwparserfromhell
import time
import os
import glob

# The file is binary memory dump, so read it as binary and then decode it as utf16 which for the purposes of this reproduction is close enough to UCS2.
with open('esbugmem', 'rb') as f:
    data = f.read()
    text = data.decode('utf16')
    mwparserfromhell.parse(text)

using the virtualenv environment we have on ores hosts in order to reproduce fully the environment

Sure enough the thing has gone into a forever loop. It also is fully unresponsive to Ctrl-C making me think there is some catching of the SIGINT signal somewhere in mwparserfromhell.

For purposes of reproduction the memory dump is at

Now to figure out what is going on in that file.

The dumped memory when converted back to utf8 has the exact same MD5 as https://es.wikipedia.org/w/index.php?title=Usuario:Danielalfredo/Taller&oldid=111186880

which DOES NOT trigger the bug in its own.

This is weird.

OK trying to replicate... I have found that you absolutely *must* use the file provided by @akosiaris directly. Even re-saving it UTF-16 will not allow one to replicate the issue.

Here's some notes from IRC:

[09:34:05] <halfak> Aha!  I wasn't using the right file.  I have also managed to replicate!
[09:34:38] <halfak> Indeed the signal does not work to stop it. 
[09:34:55] <halfak> Confirmed that ^C does nothing. 
[09:35:40] <halfak> OK.  So now, I want to see if I can process this revision in labs without anything going crazy. 
[09:37:53] <halfak> Confirmed.  I can reproduce on ores-staging with http://ores-staging.wmflabs.org/v3/scores/eswiki/111186880/damaging
[09:38:54] <halfak> OK!  We're getting somewhere. 
[09:39:06] <halfak> Now can I reproduce on my localhost in ORES?
[09:49:01] <halfak> Aha!  So, the problem persists on my local machine too (ubuntu 16.04)
[09:49:45] <halfak> I need to kill -9 the process to get it to stop

So, now the trick is to figure out how the text can work and not work at the same time. It seems like celery (maybe?) is doing some memory gymnastics that are causing this to be totally un-parseable.

I tried simply pickling and un-pickling the text, but that doesn't seem to cause an issue.

I've shown that the timeout still works with the same revision in mwparserfromhell==0.5.0 So the problem seems to exist with mwparserfromhell==0.5.1

I can reproduce it with the version downloaded from wikipedia:

$ wget "https://es.wikipedia.org/w/index.php?title=Usuario:Danielalfredo/Taller&oldid=111186880&action=raw" -O 111186880.txt

Plus the code of P7674 which is only a tiny variation from the original P7672.

Maybe when moving the wikitext from the wikipedia revision to a local file something 'fixed' it?

$ md5sum 111186880.txt 
e306383a70dae20a0d6451a619ee5af2  111186880.txt

OK we have a good working demo script and an issue filed against mwparserfromhell. See https://github.com/earwig/mwparserfromhell/issues/206

Turns out version 0.5.0 and 0.5.1 don't matter. I was able to get the timeout to work in 0.5.0 because the compilation was failing and I was using a pure-python version of mwparserfromhell. When solving the compiling issue, it still refused to be interrupted.

OK so my proposal is that we deploy the python-only build of mwparserfromhell so that we can timeout effectively. I've run a basic performance test using the following code on Barack Obama -- a very large and difficult to parse article.

import time

import mwapi
from mwparserfromhell import parse

result = mwapi.Session("https://en.wikipedia.org", user_agent="mwp test").get(
    action="query", prop="revisions", revids=863662486, rvprop=["content"],
    rvslots=["main"], formatversion=2)
wikitext = result['query']['pages'][0]['revisions'][0]['slots']['main']['content']


start = time.time()
for i in range(50):
    parse(wikitext)
print("Parsing took {0} seconds on average.".format((time.time() - start) / 50))

With the binary C parser:

$ python demo_parser_performance.py 
Parsing took 0.16366254806518554 seconds on average.

With the pure python parser:

$ python demo_parser_performance.py 
Parsing took 0.3874843120574951 seconds on average.

So we're about 2x slower. I think that is acceptable when trading for robustness until this bug is fixed.

I can reproduce it with the version downloaded from wikipedia:

$ wget "https://es.wikipedia.org/w/index.php?title=Usuario:Danielalfredo/Taller&oldid=111186880&action=raw" -O 111186880.txt

Plus the code of P7674 which is only a tiny variation from the original P7672.

Maybe when moving the wikitext from the wikipedia revision to a local file something 'fixed' it?

$ md5sum 111186880.txt 
e306383a70dae20a0d6451a619ee5af2  111186880.txt

Thanks for that. I did figure out why I could not reproduce it before. My locally downloaded version of the revision had a single newline at the end of file.

--- 111186880.txt	2018-10-12 21:19:24.246644446 +0000
+++ rev111186880	2018-10-12 09:19:27.321432415 +0000
@@ -20597,4 +20597,4 @@
 | '''{{bandera2|Montenegro}}'''
 | '''{{bandera2|San Marino}}'''
 | '''{{bandera2|Australia}}'''
-|-
\ No newline at end of file
+|-

I have no idea why that newline makes it work, but here it is.

OK so my proposal is that we deploy the python-only build of mwparserfromhell so that we can timeout effectively. I've run a basic performance test using the following code on Barack Obama -- a very large and difficult to parse article.

import time

import mwapi
from mwparserfromhell import parse

result = mwapi.Session("https://en.wikipedia.org", user_agent="mwp test").get(
    action="query", prop="revisions", revids=863662486, rvprop=["content"],
    rvslots=["main"], formatversion=2)
wikitext = result['query']['pages'][0]['revisions'][0]['slots']['main']['content']


start = time.time()
for i in range(50):
    parse(wikitext)
print("Parsing took {0} seconds on average.".format((time.time() - start) / 50))

With the binary C parser:

$ python demo_parser_performance.py 
Parsing took 0.16366254806518554 seconds on average.

With the pure python parser:

$ python demo_parser_performance.py 
Parsing took 0.3874843120574951 seconds on average.

So we're about 2x slower. I think that is acceptable when trading for robustness until this bug is fixed.

Overall probably yes, but keep in mind this specific article has revisions that go into the 2+ secs of parsing time with mwparserfromhell. Sure, it may very well be an edge case, just keep it in mind.

python3 lala.py 'rev1*'
revision: rev110083490, seconds 2.119253635406494
revision: rev110266010, seconds 2.717501401901245
revision: rev109308712, seconds 1.0890047550201416
revision: rev110241454, seconds 2.605189323425293
revision: rev109663898, seconds 1.7907164096832275
revision: rev109181061, seconds 1.0200517177581787
revision: rev109070247, seconds 1.0225756168365479
revision: rev110530543, seconds 2.9767608642578125
revision: rev109273505, seconds 1.1006214618682861
revision: rev110335113, seconds 2.7629189491271973
revision: rev109005759, seconds 0.916459321975708
revision: rev109204713, seconds 1.0664558410644531
revision: rev109110956, seconds 1.0236058235168457
revision: rev109096194, seconds 1.0882329940795898
revision: rev110550913, seconds 2.8583431243896484
revision: rev109504701, seconds 1.1853852272033691
revision: rev110083820, seconds 2.0748698711395264
revision: rev109665173, seconds 1.7461316585540771
revision: rev109235455, seconds 1.0167853832244873
revision: rev110105440, seconds 2.2856404781341553
revision: rev110265710, seconds 2.5583033561706543
revision: rev110310626, seconds 2.757080316543579
revision: rev109111816, seconds 1.0812571048736572
revision: rev109265752, seconds 1.1079812049865723
revision: rev110084923, seconds 2.384281635284424
revision: rev109099274, seconds 1.000997543334961
revision: rev109104295, seconds 0.9784274101257324
revision: rev109006184, seconds 0.8562686443328857
revision: rev109988094, seconds 2.334792137145996
revision: rev109071346, seconds 0.879063606262207
revision: rev109082529, seconds 0.9722130298614502

Also note that the wikitext of this page ranges in the 600k bytes whereas the wikitext for the Barack Obama page is ~300k

awight claimed this task.
awight moved this task from Backlog to Completed on the Machine-Learning-Team (Active Tasks) board.

/me gives a standing ovation

This is fixed!

Halfak moved this task from Completed to Backlog on the Machine-Learning-Team (Active Tasks) board.

We're still waiting on a release of mwparsefromhell to allow us to interrupt long jobs like this. If this revision gets scored again, it's going to lock up a worker.

Change 477310 had a related patch set uploaded (by Halfak; owner: Halfak):
[research/ores/wheels@master] Bumps mwparserfromhell to 0.5.2

https://gerrit.wikimedia.org/r/477310