Page MenuHomePhabricator

Failed fetching https://wikimedia.org/api/rest_v1/metrics/unique-devices/{parameters}: Connection timed out
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
Failed fetching {requesturl}: {error}
exception.trace
Impact

Likely slow response time on MediaWiki's end.

Notes

This only happens from within MW-on-K8s containers, probably due to different firewall rules.

Caused by PageViewService::getSiteData( 32, PageViewService::METRIC_UNIQUE ).

Event Timeline

Hi, do you have any idea what resources does this function try to get?

It would be useful to know what url gets fetched, and using which libraries. In theory, any request in k8s for mediawiki resources should be redirected to the api.

Oh, I now see the problem.

The url it's trying to fetch is

https://wikimedia.org/api/rest_v1/metrics/unique-devices/test.wikipedia.org/all-sites/daily/20230507/20230705

which is not rewritten to use an internal uri for AQS, but rather reaches out to the public url at the edge.

This url is in the interesting situation of being ours, thus not using url-downloader, and not being a wiki, so not being covered by wgLocalHTTPProxy.

So I think the solution would be something similar to what was done in T340483

Hi, do you have any idea what resources does this function try to get?

It is making a request to AQS. The goal is to get the number of visitors in the last 30 days (either on project or article level).

It would be useful to know what url gets fetched, and using which libraries. In theory, any request in k8s for mediawiki resources should be redirected to the api.

Here is a few URLs that are loaded by the function in question:

It is loaded by PageViewInfo MediaWiki extension by HttpRequestFactory.

Change 935998 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/PageViewInfo@master] Route AQS requests through wgLocalHTTPProxy

https://gerrit.wikimedia.org/r/935998

Thanks for the advice @Joe, uploaded a patch and hopefully it'll work in production :). I've used wgCopyUploadProxy as in T340483, but I am wondering whether we should set wgHTTPProxy on MediaWiki's end. AFAICS, currently only wgCopyUploadProxy is set (to url-downloader), and wgHTTPProxy seems to be unset. Is that intentional?

Aren't these URLs RESTBase? couldn't we just sent them via the service proxy instead?

Aren't these URLs RESTBase? couldn't we just sent them via the service proxy instead?

Yes. Using the service proxy might work -- setting wgPageViewInfoWikimediaEndpoint in the MW config repo to the service proxy could be worth trying. Currently, it is set to https://wikimedia.org/api/rest_v1, so assuming the rest of the URL structure remains the same, it should work. It is somewhat hard to verify from my end, as AFAIK there is no way to get a shell.php session in a MW-on-K8s container, where I could test this before trying to deploy a patch.

Change 936065 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] PageView: Route requests through restbase service proxy

https://gerrit.wikimedia.org/r/936065

Aren't these URLs RESTBase? couldn't we just sent them via the service proxy instead?

Yes. Using the service proxy might work -- setting wgPageViewInfoWikimediaEndpoint in the MW config repo to the service proxy could be worth trying. Currently, it is set to https://wikimedia.org/api/rest_v1, so assuming the rest of the URL structure remains the same, it should work. It is somewhat hard to verify from my end, as AFAIK there is no way to get a shell.php session in a MW-on-K8s container, where I could test this before trying to deploy a patch.

see T341197, I should have a patch out tomorrow.

BTW, sorry for the confusion, but wgLocalHTTPProxy points to the mediawiki API, it's used for self-calls to MediaWiki, so it's not the right solution here.

Anyways I want to point out that if this change (using restbase directly) works on-prem, it will work in mw on k8s as well.

Change 936065 merged by jenkins-bot:

[operations/mediawiki-config@master] PageView: Route requests through restbase service proxy

https://gerrit.wikimedia.org/r/936065

Mentioned in SAL (#wikimedia-operations) [2023-07-06T18:56:03Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:936065|PageView: Route requests through restbase service proxy (T341191)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-06T18:57:32Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:936065|PageView: Route requests through restbase service proxy (T341191)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Change 935998 abandoned by Urbanecm:

[mediawiki/extensions/PageViewInfo@master] Route AQS requests through wgCopyUploadProxy

Reason:

in favor of 936065: routing requests through the service proxy instead

https://gerrit.wikimedia.org/r/935998

Mentioned in SAL (#wikimedia-operations) [2023-07-06T19:03:30Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:936065|PageView: Route requests through restbase service proxy (T341191)]] (duration: 07m 27s)

Change 936083 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] PageView: Fix base URL when using service proxy

https://gerrit.wikimedia.org/r/936083

Change 936083 merged by jenkins-bot:

[operations/mediawiki-config@master] PageView: Fix base URL when using service proxy

https://gerrit.wikimedia.org/r/936083

Mentioned in SAL (#wikimedia-operations) [2023-07-06T19:17:28Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:936083|PageView: Fix base URL when using service proxy (T341191)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-06T19:24:44Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:936083|PageView: Fix base URL when using service proxy (T341191)]] (duration: 07m 16s)

After I fixed the base URL to work via service proxy, it seems to work. Even though the issue is user-facing, the pageviews data is cached, so it is somewhat difficult to trigger via the web interface. Verified via shell.php on mwmaint1002:

[urbanecm@mwmaint1002 ~]$ mwscript shell.php enwiki
Psy Shell v0.11.10 (PHP 7.4.33 — cli) by Justin Hileman
> $cachedService = \MediaWiki\MediaWikiServices::getInstance()->get('PageViewService')
= MediaWiki\Extension\PageViewInfo\CachedPageViewService {#986}

> sudo $service = $cachedService->service
= MediaWiki\Extension\PageViewInfo\WikimediaPageViewService {#985}

> $service->getSiteData(32, 'unique')->isOK()
= true

[...] It is somewhat hard to verify from my end, as AFAIK there is no way to get a shell.php session in a MW-on-K8s container, where I could test this before trying to deploy a patch.

see T341197, I should have a patch out tomorrow.

Amazing!

BTW, sorry for the confusion, but wgLocalHTTPProxy points to the mediawiki API, it's used for self-calls to MediaWiki, so it's not the right solution here.

No worries, I realized that and used wgCopyUploadProxy in the final version of the patch.

Anyways I want to point out that if this change (using restbase directly) works on-prem, it will work in mw on k8s as well.

Thank you. In that case, we should be set, and the failing log entries should disappear.

Urbanecm_WMF claimed this task.
Urbanecm_WMF triaged this task as Medium priority.