Page MenuHomePhabricator

Marking cross-wiki notifications as read doesn't work
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue:

  • Have an unread notification at a wiki.
  • Go to another SUL wiki
  • Try to mark the cross-wiki notification as read.

What happens?:

  • The notification is not marked as read (unless we do so at the originating wiki)

Other information:
Screencast,

Event Timeline

I can reproduce this. This is what I see in the network tab:

{
    "query": {
        "echomarkread": {
            "result": "success",
            "errors": {
                "enwiki": {
                    "code": "badtoken",
                    "info": "Invalid CSRF token.",
                    "*": "See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
                }
            },
            "alert": {
                "rawcount": 17,
                "count": "17"
            },
            "message": {
                "rawcount": 3,
                "count": "3"
            },
            "rawcount": 20,
            "count": "20"
        }
    }
}

Echo handles cross-wiki actions by issuing API requests. Authentication is done through centralauthtoken (which is a JWT token that can be requested via action=centralauthtoken, and it can be passed as an API parameter to any wiki in the farm, and it will be considered authenticated). This is what I did to replicate what Echo does internally:

  1. When logged in to my staff account, I visited https://test.wikipedia.org/wiki/Special:ApiSandbox and submitted action=centralauthtoken as an API call
  2. In an inkognito window (logged OUT), I visited https://en.wikipedia.org/wiki/Special:ApiSandbox and requested a CSRF token (action=query&meta=tokens&type=csrf&centralauthtoken=(REDACTED) (using the actual JWT token from step 1)
  3. I copied the resulting csrftoken and closed the inkognito window
  4. I repeated step 1 to generate a fresh JWT token (it is valid for a single request only)
  5. In a fresh inkognito window, I visited https://en.wikipedia.org/wiki/Special:ApiSandbox and marked my notifications as read (action=echomarkread&list=344509906&wikis=enwiki&token=(REDACTED))
  6. API call succeeded

I tried separate variations of the above steps. The only way I was able to get a CSRF failure was when the enwiki request to echomarkread was logged in (rather than authenticated via centralauthtoken).

This makes me think the underlying process works, and Echo is not passing something somewhere correctly (either the centralauthtoken, or the CSRF token).

From Echo's perspective, everything seems to be done correctly. I've inspected the individual API calls, and the flow makes sense:

  • all of the requests are authenticated (using the centralauth token),
  • tokens are passed on correctly (what Echo gets is what it passes on to the foreign wikis),
  • but for some reason, the token is refused

I'm not sure what requirements tokens have. It might be useful having MediaWiki-Engineering a look as well.

@Quiddity Do you know when this started (approximately)?

@Quiddity Do you know when this started (approximately)?

It definitely started this week. I'd guess it started on Thursday, as I interact with cross-wiki notifications a lot, and I only noticed it on Thursday.

This is probably not related to the problems with notifications caused by the security patch on T420154, although I won't say that with certainty until we fix those problems and this one remains unfixed.

The recent changes to centralauthtoken (T420280) are probably also not related. It seems that these tokens are working correctly, it's the csrf tokens that don't work.

csrf tokens are tied to sessions. But until this bug, they have worked when passing them between two centralauthtoken-authenticated sessions, and they still work when I make basically the same chain of requests in my browser:

mw.ForeignApi.prototype.checkForeignLogin = () => $.Deferred().reject(); // ensure you don't hit an optimization that skips 'centralauthtoken' if third-party cookies work
await new mw.ForeignApi('https://de.wikipedia.org/w/api.php').postWithToken('csrf', {action: 'checktoken', type: 'csrf'}) // => result: "valid", ...

This could be a "stale read" issue. We're now back to a multi-DC configuration after the datacenter switchover, and sessions are synced between datacenters (and also between multiple instances in each datacenter, I think). Maybe this syncing now takes a tiny bit longer, and the code trying to read the session is accessing a different instance then the code writing it, and getting an old result that doesn't have the token?

I'm not sure about this hypothesis, but you could check it quickly by adding a sleep( 1 ); in Echo's getCentralAuthToken() method after $api->execute();. If that makes it work, we should ask a session store expert from SRE for advice on how to resolve the problem in a better way.

I encountered the problem a couple times a couple days ago. Almost filed a ticket, but then I wasn't able to reproduce. Seems intermittent.

This is a similar symptom to my 2024 ticket T369451: Intermittent empty and un-dismissible cross-wiki notifications, but surely a different root cause since that one's been fixed for awhile.

This is probably not related to the problems with notifications caused by the security patch on T420154, although I won't say that with certainty until we fix those problems and this one remains unfixed.

I've finished backporting and I can still reproduce this issue, unfortunately.

The recent changes to centralauthtoken (T420280) are probably also not related. It seems that these tokens are working correctly, it's the csrf tokens that don't work.

Agreed.


This could be a "stale read" issue. We're now back to a multi-DC configuration after the datacenter switchover, and sessions are synced between datacenters (and also between multiple instances in each datacenter, I think). Maybe this syncing now takes a tiny bit longer, and the code trying to read the session is accessing a different instance then the code writing it, and getting an old result that doesn't have the token?

I'm not sure about this hypothesis, but you could check it quickly by adding a sleep( 1 ); in Echo's getCentralAuthToken() method after $api->execute();. If that makes it work, we should ask a session store expert from SRE for advice on how to resolve the problem in a better way.

I added sleep( 2 ); at line 102 of ForeignWikiRequest in Echo at mw-experimental. That didn't fix the issue :-/.

While testing using mw-experimental, I decided to try this with wmf.21 applied. I was unable to reproduce this with wmf.21. This means that:

As far as I understand things, this also means the centralauthtoken changes are not causing this (as they were backported to wmf.21 as well).

I collected two verbose logs, one with wmf.22 (issue reproduced) and other with wmf.21 (issue not reproduced):

  • wmf.21 (reqId:"97dc89a6-1727-4e96-8edd-082c1615f681"), Logstash
  • wmf.22 (reqId:"8683fe43-09f8-48a6-8065-6dfe1d277f64"), Logstash

This is caused by 1261439 Add ApiCentralAuthTokenTest. I was able to reproduce locally after setting up cross-wiki notifications. We changed SessionManager::getGlobalSession() to $this->getRequest()->getSession() in order to make the code easier to test, expecting this to be a no-op change, but it turns out that this reveals a bug in the Echo code, which does not pass the correct session when constructing the FauxRequest for the internal API call (here).

Thanks for the testing @Urbanecm_WMF, that really helped narrow it down. I guess it's our problem now :)

Change #1268613 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@master] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268613

Change #1268633 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@wmf/1.46.0-wmf.22] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268633

Change #1268634 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@wmf/1.46.0-wmf.23] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268634

Change #1268633 merged by jenkins-bot:

[mediawiki/extensions/Echo@wmf/1.46.0-wmf.22] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268633

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:09:01Z] <cscott@deploy1003> Started scap sync-world: Backport for [[gerrit:1268600|Ensure RevisionOutputCache uses post-processing options where appropriate (T421629)]], [[gerrit:1268625|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268633|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268651|PHP SDK: Handle experiment config missing or

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:10:51Z] <cscott@deploy1003> cscott, kgraessle, sfaci, matmarex: Backport for [[gerrit:1268600|Ensure RevisionOutputCache uses post-processing options where appropriate (T421629)]], [[gerrit:1268625|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268633|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268651|PHP SDK: Handle experiment config

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:17:54Z] <cscott@deploy1003> Finished scap sync-world: Backport for [[gerrit:1268600|Ensure RevisionOutputCache uses post-processing options where appropriate (T421629)]], [[gerrit:1268625|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268633|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268651|PHP SDK: Handle experiment config missing or

Change #1268634 merged by jenkins-bot:

[mediawiki/extensions/Echo@wmf/1.46.0-wmf.23] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268634

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:31:06Z] <cscott@deploy1003> Started scap sync-world: Backport for [[gerrit:1268654|PHP SDK: Handle experiment config missing or malformed (T422112)]], [[gerrit:1268634|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268653|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268648|Bump wikimedia/parsoid to 0.23.0-a26 (T422394)]], [[gerrit:12686

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:32:55Z] <cscott@deploy1003> matmarex, sfaci, cscott, kgraessle: Backport for [[gerrit:1268654|PHP SDK: Handle experiment config missing or malformed (T422112)]], [[gerrit:1268634|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268653|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268648|Bump wikimedia/parsoid to 0.23.0-a26 (T422394)]], [[g

Mentioned in SAL (#wikimedia-operations) [2026-04-07T21:39:25Z] <cscott@deploy1003> Finished scap sync-world: Backport for [[gerrit:1268654|PHP SDK: Handle experiment config missing or malformed (T422112)]], [[gerrit:1268634|ForeignWikiRequest: Pass session to internal 'centralauthtoken' request (T422218)]], [[gerrit:1268653|Remove Navigation Menu Link Instrumentation on Personal Dashboard (T422512)]], [[gerrit:1268648|Bump wikimedia/parsoid to 0.23.0-a26 (T422394)]], [[gerrit:1268

This is fixed on Wikimedia wikis now.

Thanks for fixing this issue @matmarex. I tested it locally and it seems to work

beforeafter

Change #1268613 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1268613

Change #1269556 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@REL1_45] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269556

Change #1269557 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@REL1_44] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269557

Change #1269558 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):

[mediawiki/extensions/Echo@REL1_43] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269558

Change #1269557 merged by jenkins-bot:

[mediawiki/extensions/Echo@REL1_44] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269557

Change #1269556 merged by jenkins-bot:

[mediawiki/extensions/Echo@REL1_45] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269556

Change #1269558 merged by jenkins-bot:

[mediawiki/extensions/Echo@REL1_43] ForeignWikiRequest: Pass session to internal 'centralauthtoken' request

https://gerrit.wikimedia.org/r/1269558