Page MenuHomePhabricator

Sporadic Math<->RB connection problems
Closed, ResolvedPublic

Description

From T131177#2345969

I'm still getting sporadic

"Math extension cannot connect to Restbase."

and

Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): R_{2}

errors. I get a different set of errors each time I press "Show preview" without changing the text.

Event Timeline

Could you provide the pages where these happen, @SalixAlba ? How often do they happen? Do they always happen for the same set of pages?

They happened on
https://en.wikipedia.org/w/index.php?title=User:Salix_alba/sandbox&direction=next&oldid=723079790
it occurred for a period of about 30 min at roughly 22 hours UTC last night. The same page rendered fine later.

There was a latex error in the page so I'm not sure if that had anything to do with the problem.

A search on google
https://www.google.co.uk/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=%22Failed+to+parse%22+site:wikipedia.org
gives about 1800 result.

A wikipedia search gives another bunch.
https://en.wikipedia.org/w/index.php?search=Failed+to+parse&title=Special:Search&go=Go&searchToken=ad4h1jod2x27mnxmn91xp28p6

Not sure if these are all due to problems while being set up or a later problem. GWikes and myself cleared a lot of these yesterdays but a whole lot more have appeared on searches today.
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#MathML_glitch

mobrovac added a project: User-mobrovac.

Did some more manual purging, so we should be better off now.

There was a latex error in the page so I'm not sure if that had anything to do with the problem.

Yes, if there is a syntax error in the formula, you will still see an error saying the formula can't be parsed.

As for

I'm still getting sporadic

"Math extension cannot connect to Restbase."

we will need to instruct the Math extension to retry the render requests a couple of times before giving up, which should minimise the amount of such errors. In the meantime, @SalixAlba, please keep an eye on this error and if it occurs again for you, try reloading the page (and purging it, if necessary) and let us know if that can at least be considered as a work-around for this issue.

I'm using this query: https://www.google.com/search?num=100&hl=en&ie=UTF-8&oe=UTF-8&q=%22Math+extension+cannot+connect+to+Restbase.%22&gws_rd=ssl

The list is getting quite short now, but I think we should prioritize adding proper retries to MultiHTTPClient asap, so that short hickups in RESTBase or Mathoid do not lead to big red warnings in pages.

Change 292919 had a related patch set uploaded (by Mobrovac):
Retry RB requests a maximum of 3 times

https://gerrit.wikimedia.org/r/292919

[...] so that short hickups in RESTBase or Mathoid do not lead to big red warnings in pages.

How often do such hiccups occur, and why? Is the rate of error responses from RESTBase measured and graphed anywhere?

@ori, as with most services, significant error bursts are predominantly caused by operational incidents like the 2.2.6 roll-out last week. Other, less common, sources of connection errors are networking issues, and possibly deploys. While restarts during deploys are generally handled gracefully, I wouldn't guarantee that no single request is ever dropped in the process if the stars align just right.

Error rates are graphed in graphite. A good starting point is https://grafana.wikimedia.org/dashboard/db/restbase. You can drill down into each individual entry point's response codes and latencies.

The overall issue for math is not that errors are common, but that rare errors are currently not handled in a way that leads to a good user experience.

There is a task T49037 for a tracking category for errors in math formula. With something like that it would be easier to find and fix error. I'm not sure if its technically possible for this situation when errors occur without there being an edit.

The new teaching category https://en.wikipedia.org/wiki/Category:Pages_with_math_errors
makes it easy to watch for changes.

Yesterday I fixed all the main space pages in the category. Looking just now the total has risen from 104 pages to 131 pages. A bunch of new main-space articles have been added, strangely most beginning P.

Not sure if this will turn out to be a continuing phenomena, with sporadic errors breaking pages.

This appeared in cswiki too, reported on https://cs.wikipedia.org/wiki/Wikipedie:Pod_l%C3%ADpou_(technika)#Nestandardn.C3.AD_chov.C3.A1n.C3.AD_.3Cmath.3E . It apeared in article called Johnsonův algoritmus, chapter Příklad. Permanent link: https://cs.wikipedia.org/w/index.php?title=Johnson%C5%AFv_algoritmus&oldid=13749401 . Null edit or purging did not help.

Error message: pochopit (Chyba konverze. Server („https://wikimedia.org/api/rest_“) hlásí: „Cannot get mml. Server problem.“): {\displaystyle {\emph {w}}(h)}. In English this is same message as the English one.

By pressing preview without changing the text I get everytime the same message.

The problem seems to be the \emph{w} syntax. This syntax is not listed on the supported latex at https://en.wikipedia.org/wiki/Help:Displaying_a_formula and is not suported by MathJax.

In latex text mode \emph renders things in italics. This is the default in math-mode so you could just use <math>w(h)</math>.

There is a question as to why the error message is confusing: "Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): "
rather than "Failed to parse (syntax error):"

A page with a similar error has been reported to me. That would be:

https://cs.wikibooks.org/wiki/Integrov%C3%A1n%C3%AD/V%C3%BDpo%C4%8Det_re%C3%A1ln%C3%BDch_integr%C3%A1l%C5%AF_pomoc%C3%AD_reziduov%C3%A9_v%C4%9Bty

The error is the same, non descriptive, not really helpful text Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."):

The answer from the API's side is of course way better , the full one for this specific error is

{
   "type": "https://mediawiki.org/wiki/HyperSwitch/errors/bad_request",
   "title": "Bad Request",
   "method": "POST",
   "detail": [
     "TeX parse error: \\nolimits is allowed only on operators"
   ],
   "uri": "/complete"
}

Perhaps we should return the Tex parse error somehow to the editor instead of the ugly red generic fail.

@akosiaris this might be another problem. As far as I can tell the problem on the page you are referring to was caused by an operatorname that ends with a huge space i.e. \operatorname {Res\ }. I'm tempted to create a seperate issue for that. What do you think?

@akosiaris this might be another problem. As far as I can tell the problem on the page you are referring to was caused by an operatorname that ends with a huge space i.e. \operatorname {Res\ }. I'm tempted to create a seperate issue for that. What do you think?

To be honest, I see 2 distinct issues for the error I posted.

  • One is that the not helpful and generic error message seen on the pages which is what lead me to this task (it is the only one sporting that error message AFAIK)
  • The actual error which is quite possibly another problem.

I am fine with creating another task for handling the actual TeX error, but we should probably create one for addressing the unhelpful and generic error message as well.

I observe a bunch of "Math extension cannot connect to Restbase" hickups right now on the page https://de.wikipedia.org/wiki/Skalarprodukt
The first one showed up about an hour ago. The following is a copy from the rendered page (3 line wraps by me):

Geometrisch lässt es sich wie folgt definieren: Bezeichnen     a =  |     a →     |    {\displaystyle a=|{\vec {a}}|}   und
Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge):
Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „/mathoid/local/v1/“:): b = |\vec b|
die Längen der Vektoren

The error survived a few purges over >30 minutes, does show up in Firefox on both the PC and Samsung Galaxy but not in both the IE on PC and the pre-installed Internet App on the phone. The page is also ok in the previous version of the wikipedia page and in the preview of the editor.
After about 60 minutes, that error disappeared only to spread over the page, for example:

Andere übliche Notationen sind 
Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge): 
Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „/mathoid/local/v1/“:): \vec a \circ \vec b,\ \vec a \bullet \vec b 
und ⟨ a → , b → ⟩ . {\displaystyle \langle {\vec {a}},{\vec {b}}\rangle .} \langle \vec a, \vec b \rangle.

and two other places.

We also got a report about elevated error rates on IRC around that time. The timing coincided with a rolling restart of all cassandra instances, which was needed for an operational task. Normally, such restarts should not cause any errors, but in this case a small percentage of requests clearly returned errors:

pasted_file (1×1 px, 278 KB)

To make such errors less likely, we will wait longer between restarting each instance. This should give clients more time to reconnect to individual replica instances, and ought to help reduce the probability of any remaining user-visible errors.

@mobrovac, are there still concrete actionables for this task?

Physikerwelt claimed this task.

@GWicke the activity on this task indicates that there is nothing left to do. --> closing.

I occasionally check the tracking categories. For the last few weeks they have been clear, but you do still get occasional batches of articles failing. I've recently seen 20+ articles in https://en.wikipedia.org/wiki/Category:Articles_with_math_errors most seem to actually be OK when I look at the article with maybe five requiring a null edit to fix. I did fix a batch at the end of Jan.

Just to update occurrence of problems relating to this bug. I've just looked at

https://en.wikipedia.org/wiki/Category:Articles_with_math_errors

and it has 32 articles in it. Its been pretty good of late and it the first time in the last two weeks when there have been connection problems.

I've done null edits to fix all the articles.

Change 353056 had a related patch set uploaded (by GWicke; owner: GWicke):
[mediawiki/extensions/Math@master] Better error handling for math render errors

https://gerrit.wikimedia.org/r/353056

Change 353056 merged by jenkins-bot:
[mediawiki/extensions/Math@master] Better error handling for math render errors

https://gerrit.wikimedia.org/r/353056

Since mid May, the math extension has been adding a separate category (localized by key math-tracking-category-render-error) in case of render issues, and uses math-tracking-category-error only for syntax errors. As a consequence, https://en.wikipedia.org/wiki/Category:Articles_with_math_errors now only contains four articles, all with legitimate syntax issues.

Pages with temporary render failures are re-rendered after a couple of minutes, which should clear up any transient issues automatically in a timely manner. A quick google search suggests that the phrase "Math extension cannot connect to Restbase." has basically disappeared from Wikipedia. I am thus going to declare this task done. Please reopen if you see this error message return to any article without clearing up itself within ~30 minutes.

I've created https://en.wikipedia.org/wiki/Category:Articles_with_math_render_errors and a couple of talk pages have appeared in there with what appear to be syntax error, but no restbase connection errors..

https://en.wikipedia.org/wiki/Talk:Arg_max
Has two syntax errors and appears in both the category [[Category:Articles with math render errors]] and also the no article syntax error category [[Category:Pages with math errors]]

https://en.wikipedia.org/wiki/Talk:Tracy%E2%80%93Widom_distribution
Has a syntax error it is in [[Category:Articles with math render errors]] but not [[Category:Pages with math errors]].

Very minor problems but a bit odd. Syntax errors on talk pages generally go uncorrected as they are often about how to get the correct syntax.

@SalixAlba, I didn't expect permanent render issues to turn up in the render issue category, but looking at the code it actually makes sense. Currently, the math extension does not distinguish between bad requests / permanent mathjax errors, and temporary connection issues.

However, since temporary issues now clear themselves up quickly, the distinction is actually still useful to identify MathJax issues. Anything that gets past the check, but is not rendering properly in MathJax would be a bug in either the checker (letting something pass that shouldn't), or in MathJax (not rendering valid math).

Change 292919 abandoned by Physikerwelt:
Retry RB requests a maximum of 3 times

Reason:
Gerrit cleanup. Please resubmit, if this patch is still relevant.

https://gerrit.wikimedia.org/r/292919