Page MenuHomePhabricator

Math extension randomly fails in gate-and-submit for Wikibase
Closed, ResolvedPublic10 Estimated Story Points

Description

This has been happening a couple of times already, just redoing it would fix it but it's annoying:

11:38:26 -'<img src="05207d8083caa922b443937de1b6b14a4d3a7335" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -0.338ex; width:4.572ex; height:2.176ex;" alt="{\displaystyle \sin{x}}"/>'
11:38:26 +'<strong class="error texerror">Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/": {\displaystyle \sin{x}}</strong>\n
11:38:26 +'
11:38:26 
11:38:26 /workspace/src/extensions/Math/tests/phpunit/MathCoverageTest.php:40
11:38:26 /workspace/src/tests/phpunit/MediaWikiIntegrationTestCase.php:418
11:38:26 /workspace/src/maintenance/doMaintenance.php:99

or

13:16:45 1) MathCoverageTest::testCoverage with data set #30 ('dt, \operatorname{d}\!t, \par...\psi\!', array(), '<img src="68958802ac6c403e4a1...\!}"/>')
13:16:45 === Logs generated by test case
13:16:45 [objectcache] [debug] MainWANObjectCache using store {class} {"class":"EmptyBagOStuff"}
13:16:45 [Math] [debug] Start rendering ${\displaystyle dt, \operatorname{d}\!t, \partial t, \nabla\psi\!}$ in mode png []
13:16:45 [Math] [info] Tex check failed {"post":{"type":"tex","q":"{\\displaystyle dt, \\operatorname{d}\\!t, \\partial t, \\nabla\\psi\\!}"},"error":"(curl error: 28) Timeout was reached","urlparams":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/check\/tex"}

or

13:13:03 1) MathCoverageTest::testCoverage with data set #4 ('\sqrt{\pi}', array('square root of pi'), '<img src="7ae18ec124928c74818...i}}"/>')
13:13:03 === Logs generated by test case
13:13:03 [objectcache] [debug] MainWANObjectCache using store {class} {"class":"EmptyBagOStuff"}
13:13:03 [Math] [debug] Start rendering ${\displaystyle \sqrt{\pi}}$ in mode png []
13:13:03 [Math] [error] Restbase math server problem {"urlparams":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/mml\/7ae18ec124928c74818b516e6350ca9610966c6e","response":{"code":0,"body":""},"math_type":"mml","tex":"{\\displaystyle \\sqrt{\\pi}}"}

or

13:16:45 3) MathCoverageTest::testCoverage with data set #86 ('\and \land \wedge, \curlywedg...dge \!', array(), '<img src="bdb145fadf20142e514...\!}"/>')
13:16:45 === Logs generated by test case
13:16:45 [objectcache] [debug] MainWANObjectCache using store {class} {"class":"EmptyBagOStuff"}
13:16:45 [Math] [debug] Start rendering ${\displaystyle \and \land \wedge, \curlywedge, \bigwedge \!}$ in mode png []
13:16:45 [Math] [error] Restbase math server problem {"urlparams":"https:\/\/wikimedia.org\/api\/rest_v1\/media\/math\/render\/mml\/bdb145fadf20142e514b40731ed8c431dc53d855","response":{"code":0,"body":""},"math_type":"mml","tex":"{\\displaystyle \\and \\land \\wedge, \\curlywedge, \\bigwedge \\!}"}
13:16:45 [MessageCache] [debug] MessageCache using store {class} {"class":"HashBagOStuff"}
13:16:45 [Math] [error] Cannot get mml. Server problem. [{},{}]
13:16:45 ===
13:16:45 Failed to render \and \land \wedge, \curlywedge, \bigwedge \!
13:16:45 Failed asserting that two strings are equal.
13:16:45 --- Expected
13:16:45 +++ Actual
13:16:45 @@ @@
13:16:45 -'<img src="bdb145fadf20142e514b40731ed8c431dc53d855" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -1.338ex; margin-right: -0.257ex; width:11.969ex; height:3.843ex;" alt="{\displaystyle \and \land \wedge, \curlywedge, \bigwedge \!}"/>'
13:16:45 +'<strong class="error texerror">Failed to parse (Conversion error. Server (&quot;https://wikimedia.org/api/rest_&quot;) reported: &quot;Cannot get mml. Server problem.&quot;): {\displaystyle \and \land \wedge, \curlywedge, \bigwedge \!}</strong>\n
13:16:45 +'

In summary we see different random errors while interacting with restbase from the CI. This is that either the check or the mml request fail. On top it seems that different HTTP implementation are use curl Guzzle. One might suspect that the reporting of HTTP-Status Code 0 points to a consequtive error in the MW-Guzzle implemenation (T232866).

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald Transcript

The problem seems to be in the network

"Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/"

We are planning to remove the restbase caching layer from the Math rendering pipeline and use object-cache and the new MW rest API instead. So the problem will automatically vanish in 6 months or so.

@Jdforrester-WMF do you by chance know if other gate-and-submit operations suffer from network problems?

The approach to add fault tolerance T136947 in the context of these kinds of problems T136812 was abandoned.

Please let me know, if you think there is something that can be done from within the Math extension.

I think it would already help to know if these failures happen in Math itself as well or if it’s somehow due to the interaction with Wikibase.

Edit: what I mean is, do you also get occasional failures on Gerrit changes for Math itself, and just accept them? Or is this a new issue?

I think it would already help to know if these failures happen in Math itself as well or if it’s somehow due to the interaction with Wikibase.

Edit: what I mean is, do you also get occasional failures on Gerrit changes for Math itself, and just accept them? Or is this a new issue?

There seems to be only little development going on for the Math extension: https://gerrit.wikimedia.org/r/q/project:mediawiki%252Fextensions%252FMath
This didn't have a chance yet to become an issue for the Math extension itself.
So as to your question: probably unknown.

Test failed. If I interprete the logs correctly

22:32:24 2020-03-30 20:22:04 af4c4757bd9f wikidb-unittest_: [a083b4e6f81e0473866dbf7b] [no req]   Serializers\Exceptions\UnsupportedObjectException from line 109 of /workspace/src/extensions/WikibaseLexeme/src/Serialization/StorageLexemeSerializer.php: Can not serialize incomplete Lexeme

it also seems to be somehow related to wikidata.

@Jdforrester-WMF do you by chance know if other gate-and-submit operations suffer from network problems?

Yes, but rarely. Unfortunately, Wikibase-related jobs suffer from them more than anyone else because the code is so spread out into different repos, each of which has an equal low-but-not-zero chance of happening.

Test failed. If I interprete the logs correctly

22:32:24 2020-03-30 20:22:04 af4c4757bd9f wikidb-unittest_: [a083b4e6f81e0473866dbf7b] [no req]   Serializers\Exceptions\UnsupportedObjectException from line 109 of /workspace/src/extensions/WikibaseLexeme/src/Serialization/StorageLexemeSerializer.php: Can not serialize incomplete Lexeme

it also seems to be somehow related to wikidata.

I think that was unrelated fallout from T246358, which seems to have been temporarily reverted.

The test build of that Math change succeeded now.

The test build of that Math change succeeded now.

Thank you very much for fixing this. However, now only the test jobs did run not the gate-and-submit. Or it this the same?

I don’t think anything was fixed… it’s an error that only happens randomly, I guess it didn’t happen this time? (And gate-and-submit will only run once that change is +2ed.)

The gate-and-submit build for that change (quibble-vendor-mysql-php74-docker #2788) just failed similarly:

…
19:02:05 --- Expected
19:02:05 +++ Actual
19:02:05 @@ @@
19:02:05 -'<img src="29a40fce03a1e91ec71f38d84c3e16e034c5c902" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -3.005ex; width:25.811ex; height:6.843ex;" alt="{\displaystyle \sum_{m=1}^\infty\sum_{n=1}^\infty\frac{m^2\,n}{3^m\left(m\,3^n+n\,3^m\right)}}"/>'
19:02:05 +'<strong class="error texerror">Failed to parse (Conversion error. Server (&quot;https://wikimedia.org/api/rest_&quot;) reported: &quot;Cannot get mml. Server problem.&quot;): {\displaystyle \sum_{m=1}^\infty\sum_{n=1}^\infty\frac{m^2\,n}{3^m\left(m\,3^n+n\,3^m\right)}}</strong>\n
19:02:05 +'

These are caused by math extension not being able to connect to RESTBase from CI. I think this is a separate issue from the wiki base one.

I somehow don't believe the problem is in RESTBase itself - if math rendering was failing in RESTBase with such frequency, we'd had much bigger problems the Math extension gate-and-submit.

So the problem is likely in how Math extension is contacting RESTBase in CI. It goes via wgMathFullRestbaseURL thus it calls https://wikimedia.org/api/rest_v1. The logs suggest it times out.

@Pchelolo has the CI a more direct route to RESTBase?

These are caused by math extension not being able to connect to RESTBase from CI. I think this is a separate issue from the wiki base one.

The errors in the Wikibase builds look the same to me.

I had quite an intensive discussion with @WDoranWMF on how to use tickets in Phabricator. I have now a good picture of how issue handling works in the core platform team. I have also an idea about the campsite model. Now, this ticket is in the campsite column waiting/blocked for Wikidata-Campsite and Math. However, it is in progress in the Wikidata project. Moreover, nobody is assigned to this ticket.
For me it intransparent, which mechanism will move the ticket forward.

@Physikerwelt I think the first step is to ask CPT to triage it by adding Platform Engineering. We can review and see if we can help but based on the comments thus far it looks like we're not sure where the issue lies so may be we'll have to trace together?

Physikerwelt set the point value for this task to 10.

@WDoranWMF I updated the description to support the triage process. Thank you for taking over the responsibility... even though I suspect that it is a CI/network issue.

While this is all very nice for the ticket I am still confused about the meaning of "in progress" in the Wikidata project. @Ladsgroup even after looking at https://github.com/Ladsgroup/Phabricator-maintenance-bot I could not figure out why the bot moved the task.

@Physikerwelt We can triage it but we'll have to direct it depending on that.

We’ve now removed the Math extension from Wikibase’s CI again (T249235), so if this keeps happening for Math changes, it’ll no longer affect us over in Wikibase. I’ll stay subscribed to this task – if the issue is resolved, either with the “magically vanish in 6 months or so” solution or in some other way, then we can maybe add Math to Wikibase CI again.

daniel subscribed.

Wikibase problem is apparently fixed, untagging CPT.

Physikerwelt claimed this task.

It did not happen for quite a while. I guess Wikibase problems have been fixed and I hope wikibase won't generate any new problems in the future. Wikibase might want to reenable Math downstream tests after the dependency to restbase is gone.