Page MenuHomePhabricator

Requests originating from zhwiki wikifeeds caused parsoid outage
Closed, ResolvedPublic

Description

From ops mailing list:

There was a restbase/parsoid outage due to a flood of requests for transforming pages on zhwiki.
This was eventually traced back to requests to the /api/rest_v1/feed/featured/yyyy/mm/dd URLs, which seem to cascade (via restbase, mobileapps, restbase again) into making 10+ requests to parsoid to urls like:

http://zh.wikipedia.org/w/rest.php/zh.wikipedia.org/v3/transform/pagebundle/to/pagebundle/<title>

These requests are expensive to parse causing the outage.

NOTE: This wasn't the first time we saw such an outage, but it was the first time we could identify the origin thanks to moving wikifeeds out of restbase and to the api gateway, with its superior observability stack.

Here is the grafana diagram for parsoid performance where the spike on language conversion is visible.
https://grafana.wikimedia.org/goto/X77YIsmIz?orgId=1

Event Timeline

From my local restbase setup I only end-up querying pagebundle/to/bundle when I am passing a specific locale to /page/html.
For example:

This doesn't hit the transformation that is computationally intensive:

curl -v 127.0.0.1:7233/zh.wikipedia.beta.wmflabs.org/v1/page/html/%E5%8D%97%E5%8C%97%E6%9C%9D

but this does:

curl -v 127.0.0.1:7233/zh.wikipedia.beta.wmflabs.org/v1/page/html/%E5%8D%97%E5%8C%97%E6%9C%9D -H 'accept-language: zh-cn'

When it comes to wikifeeds, the backend GETs page/html endpoint with the locale passed from the clients.

Change 958573 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Disable Zh Language converter

https://gerrit.wikimedia.org/r/958573

Change 958573 abandoned by Subramanya Sastry:

[mediawiki/services/parsoid@master] Disable Zh Language converter

Reason:

I will go with Scott's suggestion here.

https://gerrit.wikimedia.org/r/958573

Change 958593 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/services/parsoid@master] Disable Zh Language converter

https://gerrit.wikimedia.org/r/958593

Change 958995 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/wikifeeds@master] Strip zh variants when calling parsoid

https://gerrit.wikimedia.org/r/958995

Change 958995 merged by jenkins-bot:

[mediawiki/services/wikifeeds@master] Strip zh variants when calling parsoid

https://gerrit.wikimedia.org/r/958995

There is a workaround for wikifeeds that should fix the parsoid outage issue in the short term. Can we re-enable traffic for zhwiki on wikifeeds?

I 've just disabled the rule. It's still present, but inactive. For other SREs having to re-enable it in an emergency:

puppetmaster1001:$ sudo requestctl enable cache-text/wikifeeds_featured
puppetmaster1001:$ sudo requestctl commit

I 've just re-enabled the filter, rejecting traffic, we are meeting issues with high latencies and decreased availability in the parsoid cluster.

Change 959822 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[mediawiki/services/wikifeeds@master] Strip zh variants when calling summary

https://gerrit.wikimedia.org/r/959822

Change 959822 merged by jenkins-bot:

[mediawiki/services/wikifeeds@master] Strip zh variants when calling summary

https://gerrit.wikimedia.org/r/959822

Change 958593 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Disable Zh Language converter

https://gerrit.wikimedia.org/r/958593

Change 962706 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a26

https://gerrit.wikimedia.org/r/962706

Change 962706 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.18.0-a26

https://gerrit.wikimedia.org/r/962706

Change 966593 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@master] Use 'variant-metadata-hack' pb2pb for Parsoid-unuspported variants

https://gerrit.wikimedia.org/r/966593

Change 967275 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/core@master] Don't call Parsoid's pb2pb to add variant metadata

https://gerrit.wikimedia.org/r/967275

Change 967275 merged by jenkins-bot:

[mediawiki/core@master] Don't call Parsoid's pb2pb to add variant metadata

https://gerrit.wikimedia.org/r/967275

Mentioned in SAL (#wikimedia-operations) [2023-11-09T11:38:43Z] <_joe_> disabled requestctl cache-text/wikifeeds_featured T350645 T346657

Given the patch is now live with the latest train, I've disabled the rule for now.

And good news, most requests to the endpoint now take 50-100ms to get a response, instead than 5-10 seconds. This is much more sustainable.

Change 966593 abandoned by Subramanya Sastry:

[mediawiki/core@master] Use 'variant-metadata-hack' pb2pb for Parsoid-unsupported variants

Reason:

https://gerrit.wikimedia.org/r/966593

Jgiannelos claimed this task.