Page MenuHomePhabricator

<math>\land</math> – Unclear why the page appears in an error-category
Closed, ResolvedPublic

Description

That very short and simple page is in the error category for outdated math-syntax, but it uses only one <math>\land</math>

https://de.wikipedia.org/wiki/Und

https://de.wikipedia.org/wiki/Kategorie:Wikipedia:Seiten,_die_ein_veraltetes_Format_des_math-Tags_verwenden

Even if I reduce the content to a simple "<math>\land</math>" that category shows up (just checked the preview), same behaviour in enWP.

\land is recommended as a replacement of \and in https://www.mediawiki.org/wiki/Extension:Math/de#Tracking_categories so something is odd here, either the description or the magic adding to that category.

Event Timeline

  • check texvc
  • check mathoid

Change 778521 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/services/texvcjs@master] Add test for texvc deprecation warnings

https://gerrit.wikimedia.org/r/778521

Change 779436 had a related patch set uploaded (by Physikerwelt; author: Physikerwelt):

[mediawiki/services/mathoid@master] Add test that \land is not deprecated

https://gerrit.wikimedia.org/r/779436

Physikerwelt added projects: RESTBase, serviceops.
Physikerwelt added a subscriber: akosiaris.

@Wurgl I reached the end of my options. I checked that no warning is emitted in the source code by adding tests. Maybe this is a caching problem. Here you need support from someone with access to the restbase server that can delete the cache for the land command from Cassandra. As you can see from the edits on https://www.mediawiki.org/w/index.php?title=Extension:Math/T305613&action=history it seems to be a caching issue. I don't know what to do next. Maybe @akosiaris can help?

@Wurgl I reached the end of my options. I checked that no warning is emitted in the source code by adding tests. Maybe this is a caching problem. Here you need support from someone with access to the restbase server that can delete the cache for the land command from Cassandra. As you can see from the edits on https://www.mediawiki.org/w/index.php?title=Extension:Math/T305613&action=history it seems to be a caching issue. I don't know what to do next. Maybe @akosiaris can help?

To be honest, I am a bit at a loss too. For what is worth, and per my understanding, RESTBase caches URLs like https://de.wikipedia.org/api/rest_v1/media/math/check/<insert hash> or https://en.wikipedia.org/api/rest_v1/media/math/render/svg/<insert hash. I don't see a direct link to mediawiki categories right now.

I won't discuss much the German wiki page, because my admittedly limited knowledge of German isn't helping much. That being said, even blanking the page in preview, the parser seems to continue adding the category, which is pretty baffling.

Focusing on the history of https://www.mediawiki.org/w/index.php?title=Extension:Math/T305613, I am witnessing this (edge caching does not apply, I am logged in). This is in chronological order

I think we need someone who knows this way better than me.

It is gone! The category is gone! Both on the german page and on https://www.mediawiki.org/wiki/Extension:Math/T305613 (see at top from Physikerwelt)

@akosiaris thank you. Thank you. The check endpoint does not work with a hash but the actual TeX string. For example

curl -X 'POST' \
  'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d "q=\land" 
{"success":true,"checked":"\\land ","requiredPackages":[],"identifiers":[],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\and\n","location":{"start":{"offset":1,"line":2,"column":1},"end":{"offset":6,"line":3,"column":1}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":1,"line":2,"column":1}}]}

it shows the deprecation warning, even though it should not show it. But as shown by the test cases submitted to texvcjs and mathoid, I don't understand how this is generated.

@akosiaris thank you. Thank you. The check endpoint does not work with a hash but the actual TeX string.

Ah indeed. Thank you for correcting me.

For example

curl -X 'POST' \
  'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d "q=\land" 
{"success":true,"checked":"\\land ","requiredPackages":[],"identifiers":[],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\and\n","location":{"start":{"offset":1,"line":2,"column":1},"end":{"offset":6,"line":3,"column":1}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":1,"line":2,"column":1}}]}

it shows the deprecation warning, even though it should not show it. But as shown by the test cases submitted to texvcjs and mathoid, I don't understand how this is generated.

Now, this is interesting. I don't see that. From my computer, 100 calls to the service don't return that.

$ for i in `seq 1 100` ; do curl -s -X 'POST'       'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d "q=\land"  ; done|grep Deprecation
$ 

Also 100 calls to the internal service don't as well

deploy1002$ for i in `seq 1 100`; do curl -X POST https://mathoid.discovery.wmnet:4001/texvcinfo -d "q=\land" | grep Deprecation ; done 
$

So there might be indeed some caching in play.

Could you please re-run the above curl you pasted adding -v and pasting the output in a new phab paste (you can use https://phabricator.wikimedia.org/paste/edit/form/36/ to create a new one with the proper permissions so that it is not public so your IP address and general location don't get leaked to public phab pastes).

@akosiaris unfortunately, I don't get the old (incorrect) output any longer.

I recall that some requests were cached forever. The person who cleaned the cache last time was @GWicke so some time has passed since then. Do you know what the current lifetime of the cache is https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/mathoid.js#L176

I recall that some requests were cached forever. The person who cleaned the cache last time was @GWicke so some time has passed since then. Do you know what the current lifetime of the cache is https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/mathoid.js#L176

Assuming that function is at fault here, the configuration for that gets generated from https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/restbase/deploy/+/refs/heads/master/scap/templates/config.yaml.j2#83 so up to 10 days.

But we haven't yet concluded that it was indeed the edge cache layer (upstream of restbase - composed of ATS+Varnish) that was serving the wrong response (which is why I wanted to see the output of curl -v).

OK, I am sending the output of the curl -v in private paste as suggested. Even if it is a caching problem, I wonder how the incorrect values were produced. I don't remember having ever changed the deprecation warning mechanism and can't find anything in the commit history. The code is still in its initial state (https://github.com/wikimedia/mediawiki-services-texvcjs/commit/7e5df40aff12f2ffd39bf7f4762bf3910f0a8979).

OK, I am sending the output of the curl -v in private paste as suggested.

Found it, thanks. Unfortunately as you pointed out, the issue is no longer. So this doesn't help us pinpoint this.

Even if it is a caching problem, I wonder how the incorrect values were produced. I don't remember having ever changed the deprecation warning mechanism and can't find anything in the commit history. The code is still in its initial state (https://github.com/wikimedia/mediawiki-services-texvcjs/commit/7e5df40aff12f2ffd39bf7f4762bf3910f0a8979).

That makes 2 of us. And the fact we no longer observe it doesn't make it easier to find it.

Interestingly, all of the cases I outlined in T305613#7847532 now behave correctly.

I am starting to look more towards the jobqueue and some delayed job for refreshLinks, which has at times delays of up to more than 3 days. https://grafana-rw.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-job=refreshLinks&from=now-90d&to=now&viewPanel=5

That being said, in this specific case, numbers and fix don't add up. Backlog was close to 1 day for most of this while we observed the fix at ~4days.

@akosiaris today I was thinking about a possible source of the problem. My hypothesis is that the checked tex rather than the user input tex is used to calculate a cache key. This would explain how the warning ends up in the check. Both inputs, \and and \land will be converted to \land . Another problem I see, is if you know query via curl for \and you don't get a warning even though you should.

Strange!

This one says "Deprecation: Alias no longer supported."
curl -X POST 'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d 'q=\and x'; echo

This one is silent:
curl -X POST 'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d 'q=\and'; echo

There is definitely some cache involved!

$ curl -X POST 'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d 'q=\land xcc';echo
{"success":true,"checked":"\\land xcc","requiredPackages":[],"identifiers":["x","c","c"],"endsWithDot":false}

$ curl -X POST 'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d 'q=\and xcc';echo
{"success":true,"checked":"\\land xcc","requiredPackages":[],"identifiers":["x","c","c"],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\and ","location":{"start":{"offset":0,"line":1,"column":1},"end":{"offset":5,"line":1,"column":6}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":0,"line":1,"column":1}}]}

$ curl -X POST 'https://de.wikipedia.org/api/rest_v1/media/math/check/tex' -d 'q=\land xcc';echo
{"success":true,"checked":"\\land xcc","requiredPackages":[],"identifiers":["x","c","c"],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\and ","location":{"start":{"offset":0,"line":1,"column":1},"end":{"offset":5,"line":1,"column":6}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":0,"line":1,"column":1}}]}

Command 1 and 3 are the same (with \land). First shows no problem second does. The difference is command 2 which uses \and instead of \land

And I agree to @Physikerwelt it strongly smells like your hypothesis is true.

@Wurgl, nailed it!

Having access to bypass the edge caches pinpoints the problem to RESTBase.

Step 1. Check that an arbitrary query is ok

akosiaris@deploy1002:~$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\land xcf"| jq .
{
  "success": true,
  "checked": "\\land xcf",
  "requiredPackages": [],
  "identifiers": [
    "x",
    "c",
    "f"
  ],
  "endsWithDot": false
}

Step 2. Pollute RESTBase with the same query but using \and instead of \land

akosiaris@deploy1002:~$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\and xcf"| jq .
{
  "success": true,
  "checked": "\\land xcf",
  "requiredPackages": [],
  "identifiers": [
    "x",
    "c",
    "f"
  ],
  "endsWithDot": false,
  "warnings": [
    {
      "type": "texvc-deprecation",
      "details": {
        "error": {
          "message": "Deprecation: Alias no longer supported.",
          "expected": [],
          "found": "\\and ",
          "location": {
            "start": {
              "offset": 0,
              "line": 1,
              "column": 1
            },
            "end": {
              "offset": 5,
              "line": 1,
              "column": 6
            }
          },
          "name": "SyntaxError"
        },
        "success": false,
        "warnings": [],
        "status": "S",
        "details": "SyntaxError: Deprecation: Alias no longer supported.",
        "offset": 0,
        "line": 1,
        "column": 1
      }
    }
  ]
}

Step 3. Check that RESTBase is actually polluted now by re-running the query from Step 1.

akosiaris@deploy1002:~$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\land xcf"| jq .
{
  "success": true,
  "checked": "\\land xcf",
  "requiredPackages": [],
  "identifiers": [
    "x",
    "c",
    "f"
  ],
  "endsWithDot": false,
  "warnings": [
    {
      "type": "texvc-deprecation",
      "details": {
        "error": {
          "message": "Deprecation: Alias no longer supported.",
          "expected": [],
          "found": "\\and ",
          "location": {
            "start": {
              "offset": 0,
              "line": 1,
              "column": 1
            },
            "end": {
              "offset": 5,
              "line": 1,
              "column": 6
            }
          },
          "name": "SyntaxError"
        },
        "success": false,
        "warnings": [],
        "status": "S",
        "details": "SyntaxError: Deprecation: Alias no longer supported.",
        "offset": 0,
        "line": 1,
        "column": 1
      }
    }
  ]
}

And there we go. RESTBase is doing something here.

akosiaris added subscribers: hnowlan, Eevans.

Adding @Eevans and @hnowlan. They are the only 2 people in the WMF that I know that might be able to help debugging this.

Some more information regarding this. With the exception of the warnings stanza the response by mathoid for queries `\land xcc' and '\and xcc' is identical

curl -s -X POST https://mathoid.discovery.wmnet:4001/texvcinfo -d "q=\land xcc" |jq .
{
  "success": true,
  "checked": "\\land xcc",
  "requiredPackages": [],
  "identifiers": [
    "x",
    "c",
    "c"
  ],
  "endsWithDot": false
}

vs

curl -s -X POST https://mathoid.discovery.wmnet:4001/texvcinfo -d "q=\and xcc" |jq .
{
  "success": true,
  "checked": "\\land xcc",
  "requiredPackages": [],
  "identifiers": [
    "x",
    "c",
    "c"
  ],
  "endsWithDot": false,
  "warnings": [
    {
      "type": "texvc-deprecation",
      "details": {
        "error": {
          "message": "Deprecation: Alias no longer supported.",
          "expected": [],
          "found": "\\and ",
          "location": {
            "start": {
              "offset": 0,
              "line": 1,
              "column": 1
            },
            "end": {
              "offset": 5,
              "line": 1,
              "column": 6
            }
          },
          "name": "SyntaxError"
        },
        "success": false,
        "warnings": [],
        "status": "S",
        "details": "SyntaxError: Deprecation: Alias no longer supported.",
        "offset": 0,
        "line": 1,
        "column": 1
      }
    }
  ]
}

My cursory reading of the restbase mathoid code[1] that what is used for the normalized version of the formula is the checked attribute of the responses above. Which is identical for both responses.

[1] https://github.com/wikimedia/restbase/blob/master/sys/mathoid.js#L73

@Eevans, @hnowlan let me know if you have any ideas on how to fix this.

Still looking into this - I noticed this also happens when using other deprecated symbols:

$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\lor abcdeftest"
{"success":true,"checked":"\\lor abcdeftest","requiredPackages":[],"identifiers":["a","b","c","d","e","f","t","e","s","t"],"endsWithDot":false}

$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\or abcdeftest"
{"success":true,"checked":"\\lor abcdeftest","requiredPackages":[],"identifiers":["a","b","c","d","e","f","t","e","s","t"],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\or ","location":{"start":{"offset":0,"line":1,"column":1},"end":{"offset":4,"line":1,"column":5}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":0,"line":1,"column":1}}]}

$ curl -s -X POST https://restbase.discovery.wmnet:7443/en.wikipedia.org/v1/media/math/check/tex -d "q=\lor abcdeftest"
{"success":true,"checked":"\\lor abcdeftest","requiredPackages":[],"identifiers":["a","b","c","d","e","f","t","e","s","t"],"endsWithDot":false,"warnings":[{"type":"texvc-deprecation","details":{"error":{"message":"Deprecation: Alias no longer supported.","expected":[],"found":"\\or ","location":{"start":{"offset":0,"line":1,"column":1},"end":{"offset":4,"line":1,"column":5}},"name":"SyntaxError"},"success":false,"warnings":[],"status":"S","details":"SyntaxError: Deprecation: Alias no longer supported.","offset":0,"line":1,"column":1}}]}

I have a strong suspicion that this has something to do with how Restbase is hashing the request and how it uses this as a key in Cassandra.

I guess the problem is somewhere near the following passage

https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/mathoid.js#L81

the idea was to use an indirection table to avoid duplicates. However, since the indirection table allows only for one entry per checked tex it is not possible to store other data in the warnings field. The only quick fix is to remove the indirection table.

Change 778521 merged by jenkins-bot:

[mediawiki/services/texvcjs@master] Add test for texvc deprecation warnings

https://gerrit.wikimedia.org/r/778521

As a side note, I am currently looking into T302628 to simplify the entire setup.

Not really severe. I was just confused, why a tiny small article is in some category of errorness articles, when there is no error. Was the description of replacements wrong?

So it is just confusing the authors, it is nasty.

@Wurgl Thanks, that's super helpful to know! To be transparent, RESTBase is in a deprecated state and it is going to be difficult for us to pick bugs that are not unbreak now level. I will do my best to keep this in the stack and slot it in.

Change 779436 merged by jenkins-bot:

[mediawiki/services/mathoid@master] Add test that \land is not deprecated

https://gerrit.wikimedia.org/r/779436

I think the bug in the describtion is solved, isn't it?

The original problem on deWP is solved, yea! But @akosiaris wrote on Apr 15 2022, 10:53 AM some statements which I sadly cannot reproduce. So maybe @akosiaris may decide if this one shall be closed.

Physikerwelt claimed this task.