Page MenuHomePhabricator

Investigate onExtLink performance
Closed, ResolvedPublic

Description

Here is a time trace for enwiki:Barack_Obama with the --replay option to eliminate I/O effects. See the onExtLink handler performance which looks like an outlier for what should be a trivial and fast transformation. This is not specific to the BO page and can be seen across a number of pages

-------------------------------------------------------
Recorded times (in ms) for sync token transformations
-------------------------------------------------------
                   TOTAL PARSE TIME:   11477
                TOTAL PROFILED TIME:    8226
                      DOMPasses:TOP:    2043
       ExtensionHandler:onExtension:    1210; count:    584; per-instance: 2.07192   
                  HTML5 TreeBuilder:    1054
                   DOMPasses:NESTED:    1014
                            SyncTTM:     527
                                PEG:     344
                 AsyncTTM (Partial):     343
      ExternalLinkHandler:onExtLink:     302; count:    903; per-instance: 0.33444
             SanitizerHandler:onAny:     254; count:  41515; per-instance: 0.00612   
                   buildDOMFragment:     238; count:   2296; per-instance: 0.10366   
         WikiLinkHandler:onWikiLink:     197; count:   2828; per-instance: 0.06966   
          AttributeExpander:onToken:     168; count:  70875; per-instance: 0.00237   
         TemplateHandler:onTemplate:     114; count:    696; per-instance: 0.16379   
                  Setup Environment:     114
           TokenStreamPatcher:onAny:      55; count:  37261; per-instance: 0.00148   
             ParagraphWrapper:onAny:      54; count:  19932; per-instance: 0.00271   
             QuoteTransformer:onAny:      23; count:  10121; per-instance: 0.00227   
           OnlyInclude:onAnyInclude:      23; count:  25360; per-instance: 0.00091   
                   PreHandler:onAny:      22; count:   2035; per-instance: 0.01081   
                  ListHandler:onAny:      21; count:  13864; per-instance: 0.00151   
           QuoteTransformer:onQuote:      20; count:   1274; per-instance: 0.0157    
             ListHandler:onListItem:      18; count:   1460; per-instance: 0.01233   
     QuoteTransformer:processQuotes:      14; count:    341; per-instance: 0.04106   
               PreHandler:onNewline:      14; count:   2062; per-instance: 0.00679   
           Pre-parse (source fetch):       9
 QuoteTransformer:processQuotes:end:       9; count:    290; per-instance: 0.03103   
                  ListHandler:onEnd:       7; count:    667; per-instance: 0.01049   
      ExternalLinkHandler:onUrlLink:       4; count:      1; per-instance: 4         
             ParagraphWrapper:onEnd:       4; count:      1; per-instance: 4         
           TokenStreamPatcher:onEnd:       2; count:    667; per-instance: 0.003     
BehaviorSwitchHandler:onBehaviorSwitch:       2; count:      1; per-instance: 2         
         ParagraphWrapper:onNewLine:       2; count:   2062; per-instance: 0.00097   
       TokenStreamPatcher:onNewline:       1; count:   2308; per-instance: 0.00043   
          ExternalLinkHandler:onEnd:       0; count:   1355; per-instance: 0         
-------------------------------------------------------

Event Timeline

ssastry created this task.Jul 26 2018, 6:52 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2018, 6:52 AM
ssastry triaged this task as Normal priority.Jul 26 2018, 6:53 AM
Arlolra claimed this task.Aug 2 2018, 12:00 AM

Change 449911 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add a fast path to avoid unnecessarily reparsing extlink href

https://gerrit.wikimedia.org/r/449911

Change 449911 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add a fast path to avoid unnecessarily retokenizing the extlink href

https://gerrit.wikimedia.org/r/449911

Change 450056 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Test for a valid protocol before attempting to tokenize extlink content

https://gerrit.wikimedia.org/r/450056

The combined effect of these patches is to reduce the time by an order of magnitude.

From

ExternalLinkHandler:onExtLink:     437; count:    907; per-instance: 0.48181

to

ExternalLinkHandler:onExtLink:      41; count:    907; per-instance: 0.0452

The combined effect of these patches is to reduce the time by an order of magnitude.
From

ExternalLinkHandler:onExtLink:     437; count:    907; per-instance: 0.48181

to

ExternalLinkHandler:onExtLink:      41; count:    907; per-instance: 0.0452

Can you report on both those separately?

The first reduces it to

ExternalLinkHandler:onExtLink:     172; count:    907; per-instance: 0.18964

Change 450056 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Test for a valid protocol before attempting to tokenize extlink content

https://gerrit.wikimedia.org/r/450056

Arlolra closed this task as Resolved.Aug 2 2018, 5:19 PM