As discussed with @GWicke we want to measure how long saving a page with MathML takes compared to the png rendering baseline.
We should make sure that MathML performs at least as well as the older PNG render mode.
As discussed with @GWicke we want to measure how long saving a page with MathML takes compared to the png rendering baseline.
We should make sure that MathML performs at least as well as the older PNG render mode.
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T96152 Problems with Math in section titles | |||
| Open | None | T28213 Strip marker (UNIQ / QINU) issues (tracking) | |||
| Open | None | T94344 Section anchor depends on the rendering mode | |||
| Declined | Physikerwelt | T109195 Install texlive-extra-utils on mw appservers | |||
| Duplicate | None | T129107 Change png format for mathematical formulae to be more accessible | |||
| Duplicate | None | T131190 Set MathML / SVG as the default rendering mode on de.wikibooks.org | |||
| Resolved | • mobrovac | T131177 Enable MathML by default | |||
| Resolved | Physikerwelt | T78046 Implement a single math rendering mode that works across all devices (MathML / SVG using mathoid) | |||
| Resolved | • mobrovac | T132096 Improve save / preview performance of MathML math rendering mode |
On my local server
https://en.formulasearchengine.com/w/index.php?title=Fourier_transform
it takes 4min 32 seconds for the first preview of the page in png rendering mode.
The second preview took 31.44 seconds and the third one 19.1 seconds.
In mathml mode it takes 2.95 seconds to visit the page (probably it was cached).
Preview takes 24.6 s, 26.0s for the second and 25.3 for the third rendering.
In production it was around 4seconds for png and 15 for MathML.
That's what the profiler says for preview with MathML mode
100.00% 27389.668 1 - main()
99.94% 27372.931 1 - MediaWiki::run
99.94% 27372.186 1 - MediaWiki::main
99.17% 27163.596 1 - MediaWiki::performRequest
99.11% 27146.380 1 - MediaWiki::performAction
99.11% 27146.118 1 - SubmitAction::show
99.11% 27146.103 1 - EditAction::show
99.11% 27145.270 1 - EditPage::edit
99.01% 27119.345 1 - EditPage::showEditForm
97.60% 26731.913 1 - EditPage::getPreviewText
96.90% 26540.836 35 - Parser::parse
96.50% 26431.841 1 - AbstractContent::getParserOutput
96.50% 26431.103 1 - WikitextContent::fillParserOutput
87.55% 23980.399 1585 - call_user_func_array
79.42% 21753.053 35 - Parser::internalParseHalfParsed
76.21% 20874.431 5774 - Hooks::run
75.84% 20771.678 35 - MathHooks::onParserBeforeTidy
58.54% 16035.220 393 - VirtualRESTServiceClient::runMulti
58.37% 15988.661 393 - MultiHttpClient::runMulti
55.73% 15264.949 1472 - call_user_func_array@1
50.40% 13805.195 392 - MathHooks::mathPostTagHook
47.85% 13104.690 392 - MathMathML::render
47.68% 13058.360 392 - MathRestbaseInterface::getMathML
47.65% 13052.382 392 - MathRestbaseInterface::getContent
46.40% 12709.705 392 - VirtualRESTServiceClient::run
32.07% 8784.482 1615 - curl_multi_select
24.47% 6702.551 2015 - curl_multi_exec
17.22% 4716.549 35 - Parser::internalParse
15.62% 4279.448 1303 - PPFrame_DOM::expand
14.97% 4101.265 427 - Parser::replaceVariables
12.33% 3376.707 35 - MathRestbaseInterface::batchEvaluate
11.22% 3072.671 247 - Parser::braceSubstitution
11.04% 3022.538 5075 - PPFrame_DOM::expand@1
10.77% 2950.537 5655 - preg_replace
9.93% 2720.452 127 - Parser::braceSubstitution@1
8.79% 2408.510 313 - Parser::callParserFunction
7.55% 2069.265 81 - ScribuntoHooks::invokeHook
7.08% 1939.451 81 - Scribunto_LuaModule::invoke
6.89% 1887.962 582 - Scribunto_LuaStandaloneInterpreter::dispatch
6.08% 1664.969 183 - Scribunto_LuaStandaloneInterpreter::callFunction
5.39% 1475.254 545 - PPFrame_DOM::expand@2
5.12% 1402.735 81 - Scribunto_LuaEngine::executeFunctionChunk
4.65% 1274.073 216 - Parser::braceSubstitution@2
4.50% 1233.777 2431 - Scribunto_LuaStandaloneInterpreter::sendMessage
3.54% 968.559 70 - Parser::callParserFunction@1
3.26% 893.665 245 - ExtParserFunctions::ifObj
2.86% 783.217 1040 - Scribunto_LuaStandaloneInterpreter::handleCall
2.79% 764.055 2431 - fwrite
2.75% 754.235 1040 - Scribunto_LuaStandaloneInterpreter::callback
2.54% 694.957 914 - Scribunto_LuaEngine::getInterpreter
2.53% 692.437 915 - Scribunto_LuaStandaloneEngine::load
2.53% 691.828 1146 - PPFrame_DOM::expand@3
2.43% 666.316 1512 - PPTemplateFrame_DOM::getArgument
2.34% 639.972 796 - Parser::extensionSubstitution
2.15% 587.791 587 - Scribunto_LuaEngine::getExpandedArgument
2.11% 577.064 35 - Parser::replaceTransparentTags
2.08% 569.092 938 - Scribunto_LuaStandaloneInterpreter::getStatus
1.93% 529.502 35 - Parser::extractTagsAndParams
1.91% 524.483 673 - Scribunto_LuaStandaloneInterpreter::dispatch@1
1.87% 510.835 532 - PPFrame_DOM::expand@4
1.86% 509.635 26912 - strtr
1.78% 488.283 637 - Parser::argSubstitution
1.75% 478.896 17248 - MultiHttpClient::{closure}
1.73% 472.625 36 - Parser::braceSubstitution@3
1.65% 451.703 4580 - preg_split
1.60% 437.832 392 - MathMathML::getHtmlOutput
1.59% 436.203 1599 - PPTemplateFrame_DOM::getNumberedArgument
1.57% 428.773 2431 - Scribunto_LuaStandaloneInterpreter::encodeMessage
1.54% 420.729 392 - MathHooks::mathTagHook
1.47% 402.373 2431 - Scribunto_LuaStandaloneInterpreter::encodeLuaVar
1.47% 402.226 24 - Parser::braceSubstitution@4
1.41% 385.216 312 - PPFrame_DOM::expand@5
1.36% 371.650 787 - call_user_func_array@2
1.35% 369.996 83 - Scribunto_LuaEngine::executeModule
1.32% 361.851 39 - Parser::formatHeadings
1.26% 345.168 11004 - Scribunto_LuaStandaloneInterpreter::encodeLuaVar@1
1.10% 301.581 3925 - MathHooks::mathConstantToString
1.03% 281.192 393 - MultiHttpClient::__destruct
1.03% 281.048 12 - ExtParserFunctions::switchObj
1.02% 280.144 2431 - Scribunto_LuaStandaloneInterpreter::receiveMessage
1.01% 275.731 393 - curl_multi_close
0.96% 262.037 215 - DatabaseBase::select
0.92% 251.089 23775 - preg_replace_callback
0.89% 244.668 2 - StubObject::_call
0.89% 244.357 519 - Linker::link
0.89% 244.340 1 - WikitextContent::preSaveTransformWe need to figure out why MathML rendering is about 3x slower than the PNG render mode on https://en.wikipedia.org/wiki/Fourier_transform.
One possible area of improvement might be increasing the parallelism of math tag rendering, by bumping up maxConnsPerHost. Each of those fetches should not take more than 100ms, so if they are all performed in parallel, they should not add up to 10+s.
I haven't looked into how math tags are found before sending off batch requests; there might be optimization potential by storing / caching the list / rendering of math tags in a page as a blob, or selecting them from the DB.
@GWicke as you can see from the profiler output it still makes individual request for each formulae.
Also there was a discussion in decomber with @mobrovac https://etherpad.wikimedia.org/p/Mathoid and two unmerged requests
https://gerrit.wikimedia.org/r/#/c/267611/
and
https://gerrit.wikimedia.org/r/#/c/267544/
I'll get back to that later... but I wanted to manifest it here since it came to my mind right now.
@Physikerwelt: Ahh, so I guess the third column in the profile is the number of calls? That would be 393 fomulas, requested sequentially. Assuming about 25ms per request, that works out to ~10s of waiting. Looks like we have the explanation.
To fix, this, we should move to a model where we request all math tags in a single VirtualRESTServiceClient::runMulti call. This might entail collecting all those math tags, and substituting the results in a second pass over the content. My memory of the MediaWiki facilities in that space is a bit foggy, but IIRC there is support for doing something like this with strip markers / strip state.
Edit: A combination of [Parser::insertStripItem()](https://github.com/wikimedia/mediawiki/blob/9b510882d79549bdc6be3b89d10f67d3f6c5da2d/includes/parser/Parser.php#L1005) (to collect the tag) and [the InternalParseBeforeSanitize hook](https://github.com/wikimedia/mediawiki/blob/9ee6461287f5d2d6fae719ec62ae6465e0a3809d/docs/hooks.txt#L1679) (to replace the tags' content based on the curl_multi result) might work.
@GWicke I think it would be optimal if restbase could return the svg-dimensions and the mathml code on the first post request, if the input is valid tex.
Even if you are trying to get the mathml code manually on https://en.wikipedia.org/api/rest_v1/?doc#!/Math/get_media_math_render_format_hash you wonder why you have to make two request.
I remember that @mobrovac was not too much in favor of putting that logic into restbase, but I guess that would be most efficient.
@Physikerwelt: Compared to doing all those requests sequentially, that's a minor optimization. Lets first get from 10s to < 100ms, before worrying about the last dozen ms.
Change 282413 had a related patch set uploaded (by Physikerwelt):
Reduce number of Restbase requests
@GWicke ok. I implemented that. On that server the profiler reported runtime is reduced from 22 to 14 seconds. I guess that the change in restbase would lead to a similar improvement.
See the new profiler output
100.00% 14464.696 1 - main()
99.93% 14454.067 1 - MediaWiki::run
99.92% 14453.243 1 - MediaWiki::main
98.25% 14210.940 1 - MediaWiki::performRequest
98.15% 14197.321 1 - MediaWiki::performAction
98.15% 14197.143 1 - ViewAction::show
98.15% 14197.132 1 - Article::view
97.44% 14093.905 1 - PoolCounterWork::execute
97.43% 14093.451 1 - PoolWorkArticleView::doWork
97.42% 14091.996 3 - Parser::parse
97.38% 14085.483 1 - AbstractContent::getParserOutput
97.37% 14084.051 1 - WikitextContent::fillParserOutput
81.05% 11723.254 1398 - call_user_func_array
59.66% 8629.930 3 - Parser::internalParseHalfParsed
58.11% 8404.905 4597 - Hooks::run
57.56% 8326.251 3 - MathHooks::onParserAfterTidy
40.32% 5832.064 3 - MathRestbaseInterface::batchEvaluate
40.06% 5794.550 2 - VirtualRESTServiceClient::runMulti
40.01% 5787.768 2 - MultiHttpClient::runMulti
37.42% 5412.410 3 - Parser::internalParse
34.30% 4961.303 559 - curl_multi_exec
32.41% 4688.119 1266 - PPFrame_DOM::expand
31.05% 4491.969 387 - Parser::replaceVariables
25.98% 3757.572 124 - Parser::braceSubstitution
24.51% 3545.939 3328 - PPFrame_DOM::expand@1
22.84% 3303.253 127 - Parser::braceSubstitution@1
19.84% 2869.263 304 - Parser::callParserFunction
16.89% 2442.666 81 - ScribuntoHooks::invokeHook
15.89% 2298.879 81 - Scribunto_LuaModule::invoke
15.53% 2246.859 582 - Scribunto_LuaStandaloneInterpreter::dispatch
14.55% 2105.299 1463 - call_user_func_array@1
13.65% 1974.410 183 - Scribunto_LuaStandaloneInterpreter::callFunction
12.22% 1767.588 545 - PPFrame_DOM::expand@2
11.56% 1671.535 81 - Scribunto_LuaEngine::executeFunctionChunk
11.41% 1651.114 4230 - preg_replace
10.55% 1526.472 2431 - Scribunto_LuaStandaloneInterpreter::sendMessage
10.44% 1509.867 216 - Parser::braceSubstitution@2
7.82% 1131.282 70 - Parser::callParserFunction@1
7.22% 1044.462 245 - ExtParserFunctions::ifObj
7.04% 1018.726 2431 - fwrite
6.24% 902.653 1040 - Scribunto_LuaStandaloneInterpreter::handleCall
6.19% 896.022 1146 - PPFrame_DOM::expand@3
6.03% 872.532 1512 - PPTemplateFrame_DOM::getArgument
6.02% 871.054 1040 - Scribunto_LuaStandaloneInterpreter::callback
5.51% 797.432 914 - Scribunto_LuaEngine::getInterpreter
5.49% 794.649 915 - Scribunto_LuaStandaloneEngine::load
4.82% 697.356 587 - Scribunto_LuaEngine::getExpandedArgument
4.70% 679.560 938 - Scribunto_LuaStandaloneInterpreter::getStatus
4.52% 654.111 532 - PPFrame_DOM::expand@4
4.50% 650.679 637 - Parser::argSubstitution
4.33% 626.853 673 - Scribunto_LuaStandaloneInterpreter::dispatch@1
4.15% 599.897 36 - Parser::braceSubstitution@3
4.09% 590.917 1599 - PPTemplateFrame_DOM::getNumberedArgument
3.81% 551.477 543 - curl_multi_select
3.49% 504.537 24 - Parser::braceSubstitution@4
3.28% 474.969 312 - PPFrame_DOM::expand@5
3.16% 456.937 403 - Parser::extensionSubstitution
3.15% 455.715 2431 - Scribunto_LuaStandaloneInterpreter::encodeMessage
3.13% 452.398 83 - Scribunto_LuaEngine::executeModule
3.09% 447.480 25642 - strtr
3.05% 440.456 395 - call_user_func_array@2
2.94% 425.235 392 - MathHooks::mathPostTagHook
2.94% 424.949 2431 - Scribunto_LuaStandaloneInterpreter::encodeLuaVar
2.57% 371.842 7 - Parser::formatHeadings
2.49% 359.590 11004 - Scribunto_LuaStandaloneInterpreter::encodeLuaVar@1
2.24% 324.124 392 - MathHooks::mathTagHook
2.22% 321.299 12 - ExtParserFunctions::switchObj
2.19% 316.093 2431 - Scribunto_LuaStandaloneInterpreter::receiveMessage
2.09% 301.726 150 - Parser::getTemplateDom
2.07% 299.652 20548 - preg_replace_callback
1.99% 287.967 17640 - MultiHttpClient::{closure}
1.93% 279.463 576 - call_user_func
1.89% 273.535 1564 - PPTemplateFrame_DOM::getNamedArgument
1.88% 271.713 37 - Parser::fetchTemplateAndTitle
1.84% 266.516 37 - Parser::statelessFetchTemplate
1.82% 262.746 392 - MathMathML::getHtmlOutput
1.81% 261.413 9 - Sanitizer::removeHTMLtags
1.68% 243.140 133 - DatabaseBase::select
1.66% 240.790 11 - Parser::braceSubstitution@5
1.55% 224.551 1 - OutputPage::output@Physikerwelt, do you see a difference when you allow a greater concurrency (say, 500) for the math requests?
@GWicke do you remember how I can adjust taht in LocalSettings.php
Note, that this test is run on desktop machines at TU Berlin not on real wmf servers.
@Physikerwelt, I'm afraid that there is currently no support for changing the default from LocalSettings. There is support in MultiHTTPClient (both in the constructor and [runMulti](https://github.com/wikimedia/mediawiki/blob/c42f06642091e1cb1142066b0a52a69a0c7e783d/includes/libs/MultiHttpClient.php#L55)), but this is not used in VRS right now. It should not be too hard to add support for forwarding options (or setting them at mount time), though.
As mentioned earlier, the default concurrency is 50, which means that we'd end up with 8 rounds for ~380 math tags. In theory, given sufficient server resources, bumping the concurrency to 200 or 500 should cut that down significantly. Could you try passing in array('maxConnsPerHost' => 500) as the second parameter into runMulti()?
I made some experiements with ab
Document Path: /en.wikipedia.org/v1/media/math/
Document Length: 28 bytes
Concurrency Level: 10
Time taken for tests: 4.483 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 7180000 bytes
HTML transferred: 280000 bytes
Requests per second: 2230.41 [#/sec] (mean)
Time per request: 4.483 [ms] (mean)
Time per request: 0.448 [ms] (mean, across all concurrent requests)
Transfer rate: 1563.90 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.5 1 4
Processing: 1 3 1.0 3 20
Waiting: 1 2 0.9 2 19
Total: 2 4 1.1 4 22Document Path: /en.wikipedia.org/v1/media/math/
Document Length: 28 bytes
Concurrency Level: 50
Time taken for tests: 2.845 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 7180000 bytes
HTML transferred: 280000 bytes
Requests per second: 3514.80 [#/sec] (mean)
Time per request: 14.226 [ms] (mean)
Time per request: 0.285 [ms] (mean, across all concurrent requests)
Transfer rate: 2464.48 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 5 1.7 5 13
Processing: 2 9 3.4 8 26
Waiting: 1 7 2.8 7 23
Total: 4 14 3.8 14 32
Percentage of the requests served within a certain time (ms)
50% 14
66% 15
75% 17
80% 17
90% 19
95% 21
98% 23
99% 25
100% 32 (longest request)Document Path: /en.wikipedia.org/v1/media/math/
Document Length: 28 bytes
Concurrency Level: 100
Time taken for tests: 2.672 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 7180000 bytes
HTML transferred: 280000 bytes
Requests per second: 3741.89 [#/sec] (mean)
Time per request: 26.724 [ms] (mean)
Time per request: 0.267 [ms] (mean, across all concurrent requests)
Transfer rate: 2623.71 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 10 3.4 10 21
Processing: 3 16 5.8 16 41
Waiting: 2 13 4.9 13 39
Total: 6 27 6.4 26 53
Percentage of the requests served within a certain time (ms)
50% 26
66% 29
75% 30
80% 32
90% 35
95% 38
98% 40
99% 43
100% 53 (longest request)Document Path: /en.wikipedia.org/v1/media/math/
Document Length: 28 bytes
Concurrency Level: 200
Time taken for tests: 22.779 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 7180000 bytes
HTML transferred: 280000 bytes
Requests per second: 439.01 [#/sec] (mean)
Time per request: 455.575 [ms] (mean)
Time per request: 2.278 [ms] (mean, across all concurrent requests)
Transfer rate: 307.82 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 225 357.5 25 1794
Processing: 6 228 343.7 39 1186
Waiting: 5 214 337.4 30 1181
Total: 18 454 630.2 63 2193
Percentage of the requests served within a certain time (ms)
50% 63
66% 357
75% 735
80% 1014
90% 1812
95% 1917
98% 2005
99% 2015
100% 2193 (longest request)@Physikerwelt, in production, we have a lot more servers with many cores / workers each handling this in parallel. This means that we can use rather high parallelism without slowing down individual requests.
@GWicke I could not find a significant improvement by playing around with that variable.
Note that the value already has improved quite a bit: Before it was
58.54% 16035.220 393 - VirtualRESTServiceClient::runMulti
after the proposed change it will be
40.06% 5794.550 2 - VirtualRESTServiceClient::runMulti
Let me check how long the page would take on that server without any math...
@Physikerwelt: Yeah, this is already a big step forward. Based on the data we have, it is likely that this patch already brings math processing time to < 1s, which means that the difference vs. PNG mode will be insignificant.
It would of course be nice to be even faster, and I think in production bumping up concurrency and eliminating the need for the second request should get us there.
@Physikerwelt, the best place to specify such per-backend options should be in the VRS setup. I think we can bump this up for RESTBase in general, not just math.
@GWicke even better. Without math https://en.formulasearchengine.com/wiki/Fourier_transform_nomath
the profiler reports 4.2 seconds rather than 13.1 seconds with math. Of that 13.1 seconds 6.0 seconds were spend in MultiHttpClient::runMulti. Which is the majority of the math overhead of 8.9 s.
Change 283262 had a related patch set uploaded (by Mobrovac):
Allow the number of concurrent connections to be configurable
Change 283269 had a related patch set uploaded (by Mobrovac):
Math: increase the number of concurrent connections to 150
Gerrit 283262 allows us to configure the number of concurrent connections made by the Math extension to RESTBase, while Gerrit 283269 bumps that number in prod from 50 to 150.
I think that with these two changes we can consider this issue effectively resolved, as they allow us to further fine-tune the number of connections at a later time in an easy matter.
Change 283262 merged by jenkins-bot:
Allow the number of concurrent connections to be configurable
To predict the result of the config change I did an experiment here
http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Math25&action=submit
With png it takes about 1.4 s with mathml about 2.5 s
However, in total I would say this is expected. We are now sending two consequitve http requests. In the png rendering mode, we get the outputhash (16 byte) based on a 16 byte inputhash. Moreover, the inputhash is the primary key.
I think that problem should not be solved as a part of the Math exetension. It's more about adding some in memory based cached to the VirtualRESTServiceClient. Given the zipf distribution of the formulae even a small amout of memory could do a lot and for a lot of pages no requests to restbase would have to be sent at all.
Note that privately hosted wikis that used remote restbase instances for math rendering would benefit even more, since they have higher network latency.
Do we need to add the config change to this list
https://wikitech.wikimedia.org/wiki/Deployments
?
Change 283269 merged by jenkins-bot:
Math: increase the number of concurrent connections to 150
The changes are now fully deployed everywhere. The Fourier transform enwiki page now loads in under 8 seconds, down from 30 seconds before Gerrit 283269 got in the mix! This is a big improvement, so resolving the task. We can subsequently change the value of $wgMathConcurrentReqs to tune the load times.
I'm sorry, but to me it looks like performance is still quite poor:
Is this change actually deployed in production?
Edit: These timings are from the Chrome network console, for the HTML request (so excluding rendering delays).
I had a look at the fourier transform page which had 609 formulae.
The most recent version has by far more formulae than the one I had on my test server.
https://en.wikipedia.org/w/index.php?title=User:Physikerwelt/sandbox/609&action=submit
I can preview that page in mathml mode in >7.1s
For the actual formulae on the fourier transform page it takes much longer
https://en.wikipedia.org/w/index.php?title=User:Physikerwelt/sandbox/FT&action=submit
preview with mathml >13.5s
Change 286260 had a related patch set uploaded (by Physikerwelt):
Use the cached version of checkTex to improve performance
Change 286260 merged by jenkins-bot:
Use the cached version of checkTex to improve performance
With the last change deployed on beta the time (now self reported parsing times) has changed as follows:
Fourier_transform on enwiki
CPU time usage 4.028 seconds Real time usage 22.192 seconds
Fourier_transform on beta
CPU time usage 5.642 seconds Real time usage 11.458 seconds
609 test on enwiki
CPU time usage 2.171 seconds Real time usage 10.711 seconds
609 test on beta
CPU time usage 2.693 seconds Real time usage 6.003 seconds
FT test on enwiki
CPU time usage 2.488 seconds Real time usage 13.313 seconds
FT test on beta
CPU time usage 3.259 seconds Real time usage 7.362 seconds
Note that the times vary quite a bit for repeated tests.
Change 286412 had a related patch set uploaded (by Mobrovac):
Use the cached version of checkTex to improve performance
Change 286412 merged by jenkins-bot:
Use the cached version of checkTex to improve performance