Page MenuHomePhabricator

Heartbeat timeout on User:Pdebee
Closed, ResolvedPublic

Description

Pasting from email to get it out of my inbox,

node bin/roundtrip-test.js --prefix enwiki "User:Pdebee"
Ok, that's a separate issue.  I didn't set a worker_heartbeat_timeout
in my little service wrapper, so it's using the default.

Since we're supplying a parsoidURL in rt, it didn't come up.

https://github.com/wikimedia/parsoid/blob/master/tests/serviceWrapper.js#L83-L89


What's really going on here though is that there's a huge <include>
on that page (starting around {{user citing sources}})
and we recursively parse the contents, but this time it's synchronous,

https://github.com/wikimedia/parsoid/blob/master/lib/wt2html/pegTokenizer.pegjs.txt#L1084

and that does lock up the process enough to prevent the heartbeat,
at least locally.  This still seems somewhat unsatisfying in that
heartbeats are supposed to be 3 mins, so I'll probably look a little
further tomorrow.

I want to say that the solution to this is to treat <noinlcudeonly>
as extension tags (like https://gerrit.wikimedia.org/r/#/c/281076/)
and tokenize them async in the extension handler.  But that's just
an initial thought ...
A stopgap measure could be to put limit on the wikitext length
we're willing to parse synchronously.

Event Timeline

Arlolra triaged this task as Medium priority.Aug 19 2016, 2:39 AM

This page parses just fine now. But, the output is not all great in terms of RT diffs. But, looks like most of the time spend parsing this page is in the PEG.

[subbu@earth parsoid] parse.js --trace time --dump wt2html:limits --page "User:Pdebee" --lint < /dev/null > /tmp/pdebee.html
trace/time                    | Starting parse at  1509996507163
  trace/time/dompp            |   TOP; start=1509996545062
  trace/time/dompp            |   TOP; dpLoader                       start
  trace/time/dompp            |   TOP; dpLoader                       end; time = 901
  trace/time/dompp            |   TOP; markFosteredContent            start
  trace/time/dompp            |   TOP; markFosteredContent            end; time = 63
  trace/time/dompp            |   TOP; handleUnbalancedTables         start
  trace/time/dompp            |   TOP; handleUnbalancedTables         end; time = 51
  trace/time/dompp            |   TOP; markTreeBuilderFixups          start
  trace/time/dompp            |   TOP; markTreeBuilderFixups          end; time = 174
  trace/time/dompp            |   TOP; normalize                      start
  trace/time/dompp            |   TOP; normalize                      end; time = 41
  trace/time/dompp            |   TOP; cleanupFormattingTagFixup      start
  trace/time/dompp            |   TOP; cleanupFormattingTagFixup      end; time = 17
  trace/time/dompp            |   TOP; migrateTemplateMarkerMetas     start
  trace/time/dompp            |   TOP; migrateTemplateMarkerMetas     end; time = 14
  trace/time/dompp            |   TOP; handlePres                     start
  trace/time/dompp            |   TOP; handlePres                     end; time = 20
  trace/time/dompp            |   TOP; migrateTrailingNLs             start
  trace/time/dompp            |   TOP; migrateTrailingNLs             end; time = 177
  trace/time/dompp            |   TOP; computeDSR                     start
  trace/time/dompp            |   TOP; computeDSR                     end; time = 145
  trace/time/dompp            |   TOP; wrapTemplates                  start
  trace/time/dompp            |   TOP; wrapTemplates                  end; time = 59
  trace/time/dompp            |   TOP; linkNbrs+unpackDOMFragments    start
  trace/time/dompp            |   TOP; linkNbrs+unpackDOMFragments    end; time = 48
  trace/time/dompp            |   TOP; cite                           start
  trace/time/dompp            |   TOP; cite                           end; time = 31
  trace/time/dompp            |   TOP; handleEmptyElts                start
  trace/time/dompp            |   TOP; handleEmptyElts                end; time = 27
  trace/time/dompp            |   TOP; linter                         start
  trace/time/dompp            |   TOP; linter                         end; time = 98
  trace/time/dompp            |   TOP; stripMarkers+(li+table)Fixups  start
  trace/time/dompp            |   TOP; stripMarkers+(li+table)Fixups  end; time = 16
  trace/time/dompp            |   TOP; cleanupAndSaveDP               start
  trace/time/dompp            |   TOP; cleanupAndSaveDP               end; time = 152
  trace/time/dompp            |   TOP; end=1509996547107; time = 2045
trace/time                    | Finished parse at  1509996547119
-------------------------------------------------------
Recorded times (in ms) for sync token transformations
-------------------------------------------------------
              TOTAL PARSE TIME: 39956
           TOTAL PROFILED TIME: 32908
                           PEG: 26369
                 DOMPasses:TOP: 2034
             HTML5 TreeBuilder: 888
             Setup Environment: 810
                       SyncTTM: 644
      Pre-parse (source fetch): 502
            AsyncTTM (Partial): 290
    WikiLinkHandler:onWikiLink: 289
              buildDOMFragment: 153
     AttributeExpander:onToken: 149
        SanitizerHandler:onAny: 144
        ParagraphWrapper:onAny: 126
              DOMPasses:NESTED: 70
                          Init: 69
      TokenStreamPatcher:onAny: 45
    TemplateHandler:onTemplate: 36
              PreHandler:onAny: 36
        ListHandler:onListItem: 33
 TemplateHandler:onTemplateArg: 33
    ParagraphWrapper:onNewLine: 30
    QuoteTransformer:onNewLine: 28
 ExternalLinkHandler:onExtLink: 27
      OnlyInclude:onAnyInclude: 26
             ListHandler:onAny: 23
          PreHandler:onNewline: 20
      QuoteTransformer:onQuote: 14
             ListHandler:onEnd: 3
QuoteTransformer:onNewLine:end: 3
        QuoteTransformer:onAny: 3
  ExtensionHandler:onExtension: 2
  TokenStreamPatcher:onNewline: 2
     ExternalLinkHandler:onEnd: 2
    TokenCollector:_onAnyToken: 2
              PreHandler:onEnd: 1
TokenCollector:_onDelimiterToken: 1
TokenCollector:_onDelimiterToken:end: 1
      TokenStreamPatcher:onEnd: 0
        ParagraphWrapper:onEnd: 0
BehaviorSwitchHandler:onBehaviorSwitch: 0
-------------------------------------------------------
-------------------- Used resources -------------------
                      # tokens: 35985 / 1000000
                   # listItems: 3248 / 30000
                  # tableCells: 2147 / 30000
               # transclusions: 660 / 10000
                      # images: 0 / 1000
                  wikitextSize: 176943 / 1000000
                     HTML Size: 1003884
-------------------------------------------------------

Looks like parse of the serialize wt does lock up the parse.

[subbu@earth parsoid] node bin/roundtrip-test.js --prefix enwiki "User:Pdebee"
{"name":"parsoid","hostname":"earth","pid":32620,"level":30,"levelPath":"info/service-runner","msg":"master(32620) initializing 1 workers","time":"2017-11-06T19:29:53.600Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"levelPath":"info","msg":"loading ...","time":"2017-11-06T19:29:54.544Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"levelPath":"info","msg":"ready on localhost:39725","time":"2017-11-06T19:29:54.574Z","v":0}
{"name":"parsoid","hostname":"earth","pid":32620,"level":40,"levelPath":"warn/service-runner","msg":"Startup finished","time":"2017-11-06T19:29:54.575Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":808980326,"reqId":null,"userAgent":null,"msg":"redirecting to revision 808980326 for wikitext","longMsg":"redirecting to revision\n808980326\nfor\nwikitext","levelPath":"info","time":"2017-11-06T19:29:56.133Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":null,"reqId":null,"userAgent":null,"msg":"started parsing","longMsg":"started parsing","levelPath":"info","time":"2017-11-06T19:29:56.661Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":null,"reqId":null,"userAgent":null,"msg":"completed parsing in 37381 ms","longMsg":"completed parsing in 37381 ms","levelPath":"info","time":"2017-11-06T19:30:34.035Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":null,"reqId":null,"userAgent":null,"msg":"started serializing","longMsg":"started serializing","levelPath":"info","time":"2017-11-06T19:30:34.142Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":null,"reqId":null,"userAgent":null,"msg":"completed serializing in 1249 ms","longMsg":"completed serializing in 1249 ms","levelPath":"info","time":"2017-11-06T19:30:35.375Z","v":0}
{"name":"/home/subbu/work/wmf/parsoid/lib/index.js","hostname":"earth","pid":32630,"level":30,"logType":"info","wiki":"enwiki","title":"User:Pdebee","oldId":null,"reqId":null,"userAgent":null,"msg":"started parsing","longMsg":"started parsing","levelPath":"info","time":"2017-11-06T19:30:35.883Z","v":0}
{"name":"parsoid","hostname":"earth","pid":32620,"level":50,"message":"worker 32630 stopped sending heartbeats, killing.","status":["enwiki/User:Pdebee"],"levelPath":"error/service-runner/master","msg":"worker 32630 stopped sending heartbeats, killing.","time":"2017-11-06T19:32:54.584Z","v":0}

The quote in the description says that we're locked up because <noinlcudeonly> content gets tokenized synchronously and this page has a huge block of it.

Change 398189 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Template syntax w/ empty link target shouldn't be considered broken

https://gerrit.wikimedia.org/r/398189

Change 398171 abandoned by Arlolra:
[WIP] Tokenize noincludeonly content asynchronously

Reason:
For now

https://gerrit.wikimedia.org/r/398171

Change 398189 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Template syntax w/ empty link target shouldn't be considered broken

https://gerrit.wikimedia.org/r/398189

With that merged patch, the page parses in ~22s and the render seems reasonable now. The roundtrip test also works for me.

The abandoned patch for async tokenization of noincludeonly content also works, but there're many sync calls to tokenization in the codebase that may depend on having tokens for that content, so a lot more work is required for what now seems unnecessary. Will pick that back up as required.