Page MenuHomePhabricator

graphoid should not use the http proxy to connect to the mediawiki api and other internal services
Closed, ResolvedPublic

Description

As per the title, this morning we had graphoid failing to contact the mediawiki api correctly because url-downloader was down.

For any API request, graphoid should contact a specific host we specify in the config ( I don't see it anywhere in the current graphoid config btw).

Event Timeline

A downtime happened again today on codfw (no users affected) due to this defect. While there is T122134, I think the root issue is this one, as it is in my opinion a SPOF.

Also, accooring to the alerts, this is not only happening for graphoid, but also for citoid and restbase in general.

I will work on it later today. Graphoid needs to be able to contact any
wiki project via API, plus restbase and wdqs services. What end point
should it access?

@Yurik, given the relatively low volume from graphoid I think just using the public domains (including Varnish) is fine.

Why is graphoid using the url-downloader proxy at all? Afaik, all the main domains are accessible without internal proxying.

@GWicke I do not know why its using a proxy - I don't think I ever set it up that way. Graphoid requires access to:

  • API of all WMF servers (only public at this point, but in theory we might want private wiki support at a later time)
  • RESTbase API access (e.g. pageview API)
  • Wikidata Query Service API

All of the above are WMF-only domains. I suspect the proxy setting is being generated by some centralized service puppet somewhere (might be wrong)

The only reason I could imagine for using the proxy would be limiting / denying access to internal IPs. There are more reliable ways to do that without the proxy, though. For example, iptables rules can be set up to match on process user / group. T121240 is proposing to more generally segment the network & enforce such limits for each service.

Opsens like @Joe or @mobrovac should be able to help you with this.

The correct endpoint to use for mediawiki, restbase, etc are (in puppet terms)

MW Api: http://api.svc.${::mw_primary}.wmnet
RB: http://restbase.svc.${::site}.wmnet

on the other hand, the wikidata query service doesn't have an internal load balancer and can be reached via the public endpoint we're already using

I'm currently working on T97530: SCB services should not use a proxy for our domains which will allow you to easily switch to using internal LVS IPs, so stay tuned!

The correct endpoint to use for mediawiki, restbase, etc are (in puppet terms)

MW Api: http://api.svc.${::mw_primary}.wmnet
RB: http://restbase.svc.${::site}.wmnet

Joe, should I manually specify the Host value when accessing these endpoints?

Joe, should I manually specify the Host value when accessing these endpoints?

When I fix T97530, that will be done automagically for you. All you will have to do is make call like mwApiGet(app, domain, query) from Graphoid and it will format the request appropriately.

@mobrovac thx, but mwApiGet implies mw api.php. What about thumbnails from commons, pageviews api, wdqs api?

@mobrovac thx, but mwApiGet implies mw api.php. What about thumbnails from commons, pageviews api, wdqs api?

The update will also feature automagic conversion for RESTBase, so you will be able to use that for the Pageviews API as well (basically for anything under /api/rest_v1/). For WDQS, @Joe already indicated you can use the public endpoint. I don't know the exact URI or format for thumbnails, perhaps @Joe or @fgiunchedi can help you with that.

Shouldn't we have some "automagic mapper" that changes any https://publichost/ into a http://privatehost/? I presume that it will only convert httpshttp, and publichostprivatehost, but it will leave both path and query url portions intact?

Change 288890 had a related patch set uploaded (by Mobrovac):
Graphoid: Do not use the proxy for the allowed domains

https://gerrit.wikimedia.org/r/288890

Gerrit 288890 tells Graphoid not to use the proxy for the list of its allowed domains, all of which are present in WMF production, so this is basically a work-around to unblock this issue.

Change 288890 merged by Alexandros Kosiaris:
Graphoid: Do not use the proxy for the allowed domains

https://gerrit.wikimedia.org/r/288890

@mobrovac thx, but mwApiGet implies mw api.php. What about thumbnails from commons, pageviews api, wdqs api?

The update will also feature automagic conversion for RESTBase, so you will be able to use that for the Pageviews API as well (basically for anything under /api/rest_v1/). For WDQS, @Joe already indicated you can use the public endpoint. I don't know the exact URI or format for thumbnails, perhaps @Joe or @fgiunchedi can help you with that.

@Yurik you can use ms-fe.svc.<site>.wmnet internally instead of upload.wikimedia.org

mobrovac claimed this task.

Removed T97530 as a blocker and resolving since this is now fixed for the time being. A proper solution still needs to be concocted, be let's track that in a follow-up.

@Yurik you can use ms-fe.svc.<site>.wmnet internally instead of upload.wikimedia.org

This is an interesting info! I'm currently working on trying to standardise access to our internal stuff for forks of the service-template-node as part of T97530 (still in WIP mode only on my local branch), but will definitely get back to you @fgiunchedi about the best way to approach the problem for uploads (even though that's probably a corner case confined to Graphoid).