Page MenuHomePhabricator

Evaluate and document performance of one or two token transformers in node.js vs PHP
Closed, ResolvedPublic

Description

In order to get a handle on the performance implications of porting Parsoid to PHP, we need to evaluate the different components of Parsoid.
One of these components in the wt -> html direction are the various token transformers.
As part of T186912: Make token transform handlers unit testable, @Sbailey has implemented a Mock Token Transformer that lets us run token transformers individually by feeding it input tokens. This lets us both evaluate correctness as well performance of individual ported token transformers.

As part of this task, we should port the mock ttm code as well as one or two token transformers, and evaluate and document performance the PHP version compared to the node.js version.

QuoteTransformer is the obvious simplest one to port for starters.

Event Timeline

ssastry triaged this task as High priority.Sep 17 2018, 8:27 PM

See numbers below. Looks like current PHP port of the transformer code takes ~1.6-1.8x longer compared to the Node.js versions. T205337: Extract and use a token transformation interface (API) in place of custom token handlers is potentially related.

--------------------------------------------------------------------
      Page            Transformer     # iters        PHP        JS  
--------------------------------------------------------------------
enwiki:Skating        Quote             5000         792       471  
enwiki:Skating        Paragraph         5000        5779      3706  
                                                 -------------------
Time for both transforms (1 iteration):             1.31      0.83  
                                                 -------------------
enwiki:Hampi          Quote             1000        6936      3692  
enwiki:Hampi          Paragraph         1000       36334     20576  
                                                 -------------------
Time for both transforms (1 iteration):            43.27     24.27  
                                                 -------------------
enwiki:Barack_Obama   Quote              100        4609      2519  
enwiki:Barack_Obama   Paragraph          100       10226      5864  
                                                 -------------------
Time for both transforms (1 iteration):           148.35     83.83  
--------------------------------------------------------------------

NOTES:
0. Node is v6.12.0; PHP is v7.2.10

1. Time reported in ms

2. Transformer time includes 
   (a) time spent in handlers
   (b) time spend adding/removing/getting transformers
   (c) time spent in loop dispatching tokens

   So, looking at numbers below, 
   * in JS,  (b) = 0.50 - 0.65 of total time (get = ~0.5)
   * in PHP, (b) = 0.25 - 0.40 of total time (get = ~0.25)

3. Token transformers + token transformer manager accounts for roughly 30% of total parse time of a page.

JS: Raw numbers
---------------
[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 5000 --timingMode --transformer ParagraphWrapper --inputFile tests/transform/paragraph-Skating.txt
Total transformer execution time = 10980.067 milliseconds
Total time processing tokens     = 3705.853 milliseconds
Total time adding transformers   = 0.457 milliseconds
Total time removing transformers = 0.000 milliseconds
Total time getting transformers  = 1928.844 milliseconds

[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 100 --timingMode --transformer ParagraphWrapper --inputFile tests/transform/paragraph-Barack_Obama.txt
Total transformer execution time = 19011.445 milliseconds
Total time processing tokens     = 5864.195 milliseconds
Total time adding transformers   = 0.446 milliseconds
Total time removing transformers = 0.000 milliseconds
Total time getting transformers  = 2910.246 milliseconds

[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 5000 --timingMode --transformer QuoteTransformer --inputFile tests/transform/quote-Skating.txt
Total transformer execution time = 1052.140 milliseconds
Total time processing tokens     = 471.966 milliseconds
Total time adding transformers   = 43.700 milliseconds
Total time removing transformers = 25.993 milliseconds
Total time getting transformers  = 234.086 milliseconds

[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 100 --timingMode --transformer QuoteTransformer --inputFile tests/transform/quote-Barack_Obama.txt
Total transformer execution time = 8277.082 milliseconds
Total time processing tokens     = 2519.165 milliseconds
Total time adding transformers   = 204.784 milliseconds
Total time removing transformers = 115.361 milliseconds
Total time getting transformers  = 1383.633 milliseconds

[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 1000 --timingMode --transformer QuoteTransformer --inputFile /tmp/quote-Hampi.txt
Total transformer execution time = 12385.282 milliseconds
Total time processing tokens     = 3692.281 milliseconds
Total time adding transformers   = 323.272 milliseconds
Total time removing transformers = 202.490 milliseconds
Total time getting transformers  = 1935.719 milliseconds

[subbu@earth:~/work/wmf/parsoid] bin/transformTests.js --iterationCount 1000 --timingMode --transformer ParagraphWrapper --inputFile /tmp/paragraph-Hampi.txt
Total transformer execution time = 68874.998 milliseconds
Total time processing tokens     = 20576.563 milliseconds
Total time adding transformers   = 0.779 milliseconds
Total time removing transformers = 0.000 milliseconds
Total time getting transformers  = 10181.213 milliseconds


PHP: Raw numbers
----------------
[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 5000 --timingMode --ParagraphWrapper --inputFile ../tests/transform/paragraph-Skating.txt
Total transformer execution time = 9946.4101791382 milliseconds
Total time processing tokens     = 5778.853 milliseconds
Total time adding transformers   = 0.032 milliseconds
Total time removing transformers = 0 milliseconds
Total time getting transformers  = 1416.088 milliseconds

[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 100 --timingMode --ParagraphWrapper --inputFile ../tests/transform/paragraph-Barack_Obama.txt
Total transformer execution time = 18400.49290657 milliseconds
Total time processing tokens     = 10225.929 milliseconds
Total time adding transformers   = 0.013 milliseconds
Total time removing transformers = 0 milliseconds
Total time getting transformers  = 2494.77 milliseconds

[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 5000 --timingMode --QuoteTransformer --inputFile ../tests/transform/quote-Skating.txt
Total transformer execution time = 1162.8670692444 milliseconds
Total time processing tokens     = 792.462 milliseconds
Total time adding transformers   = 110.241 milliseconds
Total time removing transformers = 39.137 milliseconds
Total time getting transformers  = 167.519 milliseconds

[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 100 --timingMode --QuoteTransformer --inputFile ../tests/transform/quote-Barack_Obama.txt
Total transformer execution time = 7981.4829826355 milliseconds
Total time processing tokens     = 4609.114 milliseconds
Total time adding transformers   = 481.269 milliseconds
Total time removing transformers = 166.257 milliseconds
Total time getting transformers  = 1090.096 milliseconds

[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 1000 --timingMode --QuoteTransformer --inputFile /tmp/quote-Hampi.txt
Total transformer execution time = 11912.426948547 milliseconds
Total time processing tokens     = 6936.152 milliseconds
Total time adding transformers   = 772.509 milliseconds
Total time removing transformers = 273.678 milliseconds
Total time getting transformers  = 1493.668 milliseconds

[subbu@earth:~/work/wmf/parsoid/php] php bin/transformTests.php --iterationCount 1000 --timingMode --ParagraphWrapper --inputFile /tmp/paragraph-Hampi.txt
Total transformer execution time = 65585.317134857 milliseconds
Total time processing tokens     = 36333.778 milliseconds
Total time adding transformers   = 0.013 milliseconds
Total time removing transformers = 0 milliseconds
Total time getting transformers  = 9003.901 milliseconds
ssastry added subscribers: tstarling, Anomie, Legoktm.

@tstarling, @Anomie, @Legoktm: FYI. Closing this task for now, but, we can discuss these numbers and review code next quarter.