Page MenuHomePhabricator

Determine the source of internal requests going through the API gateway.
Open, Needs TriagePublic

Description

Ideally, the API gateway would only handle requestscoming from the outside. However, it looks like it is receiving a large number of internal requests (>500 req/sec) from internal IPs. We are seeing requests primarily from the 172.16 and 10.192 ranges. We should find the source of these requests and change them to use the service mesh instead, if possible.

Event Timeline

Could you supply some of these IP addresses for investigation? My gut feeling is that these are going to be health checks. Is this the api-gateway or the rest-gateway?

Could you supply some of these IP addresses for investigation? My gut feeling is that these are going to be health checks. Is this the api-gateway or the rest-gateway?

I haven't checked the API gateway. Here are the top IPs for the REST gateway (naive sort to find dupes, not by rec/sec):

10.192.12.30
10.192.14.10
10.192.29.14
10.192.36.6
10.192.4.20
10.192.40.11
10.192.41.6
10.192.5.30
10.192.8.26
172.16.1.41
172.16.19.172
172.16.4.75

From 172.16.19.172 alone we see 40 req/sec. That seems a lot for health checks...

Also I'd assume that most health checks shouldn't go through the gateway, right?

Could you supply some of these IP addresses for investigation? My gut feeling is that these are going to be health checks. Is this the api-gateway or the rest-gateway?

I haven't checked the API gateway. Here are the top IPs for the REST gateway (naive sort to find dupes, not by rec/sec):

10.192.12.30
10.192.14.10
10.192.29.14
10.192.36.6
10.192.4.20
10.192.40.11
10.192.41.6
10.192.5.30
10.192.8.26
172.16.1.41
172.16.19.172
172.16.4.75

From 172.16.19.172 alone we see 40 req/sec. That seems a lot for health checks...

Also I'd assume that most health checks shouldn't go through the gateway, right?

All of the 10.192 addresses are wikikube workers, which would correspond to either health checks or something else in wikikube making direct calls (which we actively discourage). The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely

The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely

These are tools and bots running on Toolforge and Cloud VPS VMs. You 'll also see edits from this IP range.

According to the rest-gateway logs, both MediaWiki itself and Wikifeeds are making direct calls to rest-gateway for the page-analytics_pageviews route.

wikifeeds is doing direct rest-gateway calls for a couple of functions in pageviews.js:

MediaWiki probably has a similar call somewhere.

This should be changed to direct mw-api-int calls, shouldn't it?

MediaWiki probably has a similar call somewhere.

In GrowthExperiments (hardcoded) and PageViewInfo (uses $wgPageViewInfoWikimediaEndpoint).

Tagging:

If these teams/projects could please weigh in on feasibility, I can create subtasks for each of them. It would be really great to get this squared away.

All of the 10.192 addresses are wikikube workers, which would correspond to either health checks or something else in wikikube making direct calls (which we actively discourage). The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely

@Clement_Goubert shared the following snipped with me:

350   │       service_cluster_cidr:
351   │         v4: "10.192.72.0/24"
352   │         v6: "2620:0:860:306::/116"
353   │       cluster_cidr:
354   │         v4: "10.194.128.0/17"
355   │         v6: "2620:0:860:cabe::/64"

If I understand correctly, "service_cluster_cidr" means "pods", and "cluster_cidr" means "nodes". So the requests are coming from pods... is that consistent with the idea that these are health probes?

All of the 10.192 addresses are wikikube workers, which would correspond to either health checks or something else in wikikube making direct calls (which we actively discourage). The 172.16.* hosts are from WMCS. We need to investigate what kind of traffic we're seeing from these hosts further most likely

@Clement_Goubert shared the following snipped with me:

350   │       service_cluster_cidr:
351   │         v4: "10.192.72.0/24"
352   │         v6: "2620:0:860:306::/116"
353   │       cluster_cidr:
354   │         v4: "10.194.128.0/17"
355   │         v6: "2620:0:860:cabe::/64"

If I understand correctly, "service_cluster_cidr" means "pods", and "cluster_cidr" means "nodes".

service_cluster_cidr is the IP space allocated to Kubernetes Services. It's an internal to Kubernetes IP range, it's improbable that you are indeed seeing these as source addresses.

cluster_cidr is indeed pod IP range. Those are the IPs that workloads hosted in WikiKube. Those could show up as source addresses, but since you are talking about 10.192.X.X above and that range is 10.194.128.0/17 it's not this.

From T410198#11378640 I see:

10.192.12.30
10.192.14.10
10.192.29.14
10.192.36.6
10.192.4.20
10.192.40.11
10.192.41.6
10.192.5.30
10.192.8.26

None of these IP belong in either 10.192.72.0/24 or 10.194.128.0/17, so they are unrelated.

However, the IP address listed previous are indeed wikikube-worker IPs, so node IPs. That is consistent indeed with health checks (which are originating on purpose from the IP address of the host of the pods).

None of these IP belong in either 10.192.72.0/24 or 10.194.128.0/17, so they are unrelated.

However, the IP address listed previous are indeed wikikube-worker IPs, so node IPs. That is consistent indeed with health checks (which are originating on purpose from the IP address of the host of the pods).

Thank you for clarifying!

Is ther a way to distinguish the following (by address or headers):

  • requests from workers
  • requests directly from other pods in the cluster
  • requests from the mesh
  • requests coming in through the CDN
  • other (whatever that may be?)

None of these IP belong in either 10.192.72.0/24 or 10.194.128.0/17, so they are unrelated.

However, the IP address listed previous are indeed wikikube-worker IPs, so node IPs. That is consistent indeed with health checks (which are originating on purpose from the IP address of the host of the pods).

Thank you for clarifying!

Is ther a way to distinguish the following (by address or headers):

  • requests from workers

It depends on your end goal. Do you want to remove them so that they don't mess with your data in a statistical manner? Or to be absolutely certain ?

For example, regarding healthchecks

Requests from the outside can make it to this endpoint, but it's improbable they will, so statistically speaking you are probably ok

  • Or, amend the httpGet definition of the readinessProbe allows to add headers. You can add a unique header to this section and rely solely on that one.

Or a combination of the above, depending on how sure you want to be. Statistically speaking, unless something weird happens, any of the above, even in isolation, should be sufficient.

  • requests directly from other pods in the cluster

Yes. The cluster_cidr IP range from above fits perfectly your purpose.

  • requests from the mesh

I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.

I 'll also point out that part of the task is about having internal workloads talk to API Gateway (via the mesh or directly, doesn't matter). I 've always been against such a change. The API Gateway should be an entry point from the outside, not to be used internally. That's what the mesh is for (and what Clement is pointing out above for some workloads).

  • requests coming in through the CDN

Headers. They will have x-forwarded-for set and more importantly X-Client-IP.

  • other (whatever that may be?)

Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.

Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.

I 'll also point out that part of the task is about having internal workloads talk to API Gateway (via the mesh or directly, doesn't matter). I 've always been against such a change. The API Gateway should be an entry point from the outside, not to be used internally. That's what the mesh is for (and what Clement is pointing out above for some workloads).

I agree. The plan is to identify that kind of caller and make them use the mesh to talk to the respective backend service.

If it turns out we need a gateway between internal clients and API gateway, it should be a separate instance with different configuration.

It depends on your end goal. Do you want to remove them so that they don't mess with your data in a statistical manner? Or to be absolutely certain ?

Ideally i should be certain enough to excempt certain requests from rate limiting. It's doesn't have to be 100%, but shouldn't be exploitable from the outside.

  • Requests from nodes/workers will be for a well defined endpoint in a readinessProbe. The definitions will be in the chart of the workload, possibly overriden in helmfile.d/services. For the API gateway this is /healthz

Right... I can filter out health checks for the gateway itself easily enough I suppose. I was thinking of k8s health checks in general, but they wouldn't go through the gateway - why would they? Silly me!

  • requests directly from other pods in the cluster

Yes. The cluster_cidr IP range from above fits perfectly your purpose.

Ideally, there shouldn't be any such requests, right? All requests to the gateway (with the exception of health checks and prometheus scapes) should come from the CDN? But we do see a lot of them...

I should probably check whether they come from nodes directly, or from nodes but via the CDN - that is, am I looking at the actual client address, or at x-client-ip. Currently these two sources of information are conflated.

  • requests from the mesh

I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.

Thank you for checking. I was hoping this to be the case, but I wasn't sure.

  • requests coming in through the CDN

Headers. They will have x-forwarded-for set and more importantly X-Client-IP.

Right. I'll see how I can me better use of that information.

  • other (whatever that may be?)

Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.

Hm... this is a problem... I don't see an easy or efficient way to do that in Envoy with Lua.

Would it be possible to include a magic header in these requests? One that cannot come from the outside?

Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.

I'll need to figure out how to do proper cdir matching in Lua I guess. Nother bit of trouble...

It depends on your end goal. Do you want to remove them so that they don't mess with your data in a statistical manner? Or to be absolutely certain ?

Ideally i should be certain enough to excempt certain requests from rate limiting. It's doesn't have to be 100%, but shouldn't be exploitable from the outside.

/healthz is so fast (it's designed to be) that I would argue excluding it entirely from rate limiting, regardless of originating point, would be OK. Should someone decide to abuse that endpoint, we 'll block him as an abuser, not rate limit them.

  • Requests from nodes/workers will be for a well defined endpoint in a readinessProbe. The definitions will be in the chart of the workload, possibly overriden in helmfile.d/services. For the API gateway this is /healthz

Right... I can filter out health checks for the gateway itself easily enough I suppose. I was thinking of k8s health checks in general, but they wouldn't go through the gateway - why would they? Silly me!

Exactly, health checks go directly to the workloads, never through any kind of proxy/gateway.

  • requests directly from other pods in the cluster

Yes. The cluster_cidr IP range from above fits perfectly your purpose.

Ideally, there shouldn't be any such requests, right? All requests to the gateway (with the exception of health checks and prometheus scapes) should come from the CDN? But we do see a lot of them...

To the API gateway? The one powering exclusively api.wikimedia.org? Yes, no requests should show from the pods. If you do see some, there's some misconfiguration and we need to fix it. For the REST Gateway, I see Clément has already added 3 tags for the items mentioned in T410198#11387937, so this is already identified and being chased down.

Down the line, we can enforce this if you want at the network layer, disallowing other pods from being able to reach the api-gateway and rest-gateway). Might sounds overkill, but it will save us from similar misconfigurations in the future.

I should probably check whether they come from nodes directly, or from nodes but via the CDN - that is, am I looking at the actual client address, or at x-client-ip. Currently these two sources of information are conflated.

  • requests from the mesh

I see no listeners stanza defined in any chart with api-gateway as a value, nor the equivalent stanza in the puppet repo. I could be wrong, but currently you shouldn't be able to see any such type of traffic. This is on purpose.

Thank you for checking. I was hoping this to be the case, but I wasn't sure.

  • requests coming in through the CDN

Headers. They will have x-forwarded-for set and more importantly X-Client-IP.

Right. I'll see how I can me better use of that information.

  • other (whatever that may be?)

Health checks from the LVS load balancers. Those are defined here: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/service.yaml#L3469, so they will look very much like the healthchecks from the nodes themselves. The differentiating factor will be the source IP address, which will be slightly different. You will need a reverse DNS lookup to be sure here.

Hm... this is a problem... I don't see an easy or efficient way to do that in Envoy with Lua.

Oh, I hadn't realized you want to do it in the path of the request. Yeah, certainly don't do a reverse DNS lookup during serving a request.

Would it be possible to include a magic header in these requests? One that cannot come from the outside?

Revisiting this with a fresh mind, I was wrong. LVS hosts don't do HTTP fetches anymore for Kubernetes workloads. They only keep an idle TCP connection open, without doing an HTTP request. This is configured identically for api-gateway and rest-gateway

What however WILL do /healthz HTTP checks, and is defined in api-gateway and rest-gateway is Prometheus.

It looks like we can amend the configuration a bit and send some mutually agreed upon HTTP header with a shared (between prometheus and gateway) secret value so that you can identify those.

However, this is /healthz we are talking about. It's probably cheaper and faster to just exclude the entire endpoint from rate limiting as I pointed out above.

Note that are both IPv6 and IPv4 addresses in many of the data structures I linked above. Include both, to be future proof.

I'll need to figure out how to do proper cdir matching in Lua I guess. Nother bit of trouble...

https://github.com/Humbedooh/lua-cidr/blob/main/cidr.lua looks pretty simple code wise and with a single require, which is bundled in lua.

I tested it quickly in a container and it appears to work. Caveat being that the documentation in the comment at the top of the file disagrees with the signature of the cidr.network function, which takes a self argument that the comment doesn't somehow pass. The tests (which do pass a 1) appear to agree, not sure why they decided to implement that self parameter though.

As far as I can tell from logstash, calls identified as internal (that get no ratelimit_key) are definitely from Wikifeeds and MediaWiki itself, and all to either page-analytics or device-analytics. device-analytics calls seem to also originate from PageViewInfo