Page MenuHomePhabricator

RfC: Use of gRPC as Lambda interface for linked artifact caching
Open, LowPublic

Assigned To
Authored By
Eevans
Mar 11 2026, 4:41 PM
Referenced Files
F73853616: image.png
Mar 27 2026, 3:57 PM
F73853465: image.png
Mar 27 2026, 3:57 PM
F73765476: image.png
Mar 26 2026, 2:13 PM
F73765387: image.png
Mar 26 2026, 2:13 PM
F73765379: image.png
Mar 26 2026, 2:13 PM

Description

The proposal document for Linked Artifact Caching specifies using gRPC as the lambda interface. The rationale for this is manyfold:

  • The proposal assumes that the cache system provides the canonical/only HTTP API for serving artifacts. This would mean that the Lambda interface is implemented entirely for the purpose of integrating with the cache
  • It provides an interface that is clear, concise, and strongly typed
  • Support for deadlines and cancellations
  • Service schema migrations that are straighforward (forward & backward compatibly)
  • Ease of use; You can imagine having (per language) scaffolding projects where you only need to edit some metadata & implement a stub. The scaffold project could be setup to build a container suitable for a standard helm chart
  • Given enough adoption, it might make sense (in the future) to implement the Lambdas using FaaS

In short, we get for free what would require considerable work to implement ourselves (and for each new lambda), with a better outcome. This of course rests heavily on the first assumption above, that (for a majority of use-cases) artifacts are served exclusively by the cache service. It would not make sense to require (a majority of) teams to develop a gRPC service in addition to an HTTP one, when the latter could serve both purposes. It also assumes that use of gRPC doesn't otherwise create hardships in our environment. We should put these assumptions to the test.

NOTE: The proposal document has since been updated with a long-form rationale as well.

Event Timeline

Eevans added subscribers: isarantopoulos, achou, klausman and 2 others.

Hii all! Wanted to bring up a discussion that came up while working on the Article Topics integration with Hoarde. It's about the lambda interface protocol and whether gRPC should be the only supported method or whether we should consider HTTP. I've been discussing this with both Eric and Luca, and I think there are valid points on both sides so I wanted to open it up here.

There is a very compelling case for gRPC - it gives us structured contracts, backward-compatible schema evolution, and built-in handling of timeouts and failure propagation. This is also based on previous experience with RESTBase, where using plain HTTP led to a lot of issues around enforcing contracts and harmonizing error handling. I can see how gRPC makes Hoarde's side cleaner, especially if the lambda exists solely to serve cache misses.

That said, there's a practical concern on the ML Platform side and other non-ML services willing to integrate. The vast majority of our internal services communicate over HTTP, and our infra (mesh, ingress, routing) is built around that. To integrate with gRPC-only Hoarde, each service team would need to either deploy an adapter/proxy alongside their service, or deploy a gRPC-capable replica. I've prototyped the adapter approach and discussed the replica route with Luca - both are technically feasible. But as Hoarde would onboard more use-cases, I can imagine this becoming a pattern where every integrating service carries extra infrastructure just to bridge the protocol gap. So we’re shifting complexity and maintenance from Hoarde to its clients.

The current design relies on a big assumption that lambdas only serve Hoarde, and Hoarde is the endpoint for consumers. I think it's worth investigating whether this assumption is valid and makes sense within our existing infrastructure, especially as we scale to more use-cases. If some services will need to remain reachable over HTTP for other consumers anyway, we should think about how we want to handle the protocol boundary.

Very curious to hear opinions from others! Pinging a few people I think might be interested: @isarantopoulos @elukey @Ottomata @klausman @achou

cc also @GGoncalves-WMF . I think your points are really important. We should do some good platform product management to see how capabilities like Hoarde provides can and will actually be used.

From my own experience: building platform capabilities (like streaming enrichment) without understanding how teams work and do things, there is a risk that the systems we platform engineers choose build will not be adopted, even if we choose the best technical option. The user story is important too!

One thing of note about using gRPC vs. REST/HTTP(S) based communication is that a lot of the infrastructure we have built around LiftWing assumes HTTP services (e.g. using Envoy in HTTP mode). If we set up a gRPC service that services on LW call, or a gRPC service on Liftwing, we need to make sure that "the path is clear" in this sense, also regarding pod security policies and the like.

That said, there's a practical concern on the ML Platform side and other non-ML services willing to integrate. The vast majority of our internal services communicate over HTTP, and our infra (mesh, ingress, routing) is built around that. To integrate with gRPC-only Hoarde, each service team would need to either deploy an adapter/proxy alongside their service, or deploy a gRPC-capable replica. I've prototyped the adapter approach and discussed the replica route with Luca - both are technically feasible. But as Hoarde would onboard more use-cases, I can imagine this becoming a pattern where every integrating service carries extra infrastructure just to bridge the protocol gap. So we’re shifting complexity and maintenance from Hoarde to its clients.

To me the above is the main concern, since almost every service at the foundation runs HTTP..

That said, there's a practical concern on the ML Platform side and other non-ML services willing to integrate. The vast majority of our internal services communicate over HTTP, and our infra (mesh, ingress, routing) is built around that. To integrate with gRPC-only Hoarde, each service team would need to either deploy an adapter/proxy alongside their service, or deploy a gRPC-capable replica. I've prototyped the adapter approach and discussed the replica route with Luca - both are technically feasible. But as Hoarde would onboard more use-cases, I can imagine this becoming a pattern where every integrating service carries extra infrastructure just to bridge the protocol gap. So we’re shifting complexity and maintenance from Hoarde to its clients.

To me the above is the main concern, since almost every service at the foundation runs HTTP..

Is this because you a) envision the service being used to do caching for already implemented systems, b) as-yet-implemented systems that will invariably need an HTTP service anyway (and if so, why), or c) because using something other than HTTP generally imposes a burden that exceeds the benefits of using grpc (and if so, how)?

That said, there's a practical concern on the ML Platform side and other non-ML services willing to integrate. The vast majority of our internal services communicate over HTTP, and our infra (mesh, ingress, routing) is built around that. To integrate with gRPC-only Hoarde, each service team would need to either deploy an adapter/proxy alongside their service, or deploy a gRPC-capable replica. I've prototyped the adapter approach and discussed the replica route with Luca - both are technically feasible. But as Hoarde would onboard more use-cases, I can imagine this becoming a pattern where every integrating service carries extra infrastructure just to bridge the protocol gap. So we’re shifting complexity and maintenance from Hoarde to its clients.

To me the above is the main concern, since almost every service at the foundation runs HTTP..

Is this because you a) envision the service being used to do caching for already implemented systems, b) as-yet-implemented systems that will invariably need an HTTP service anyway (and if so, why), or c) because using something other than HTTP generally imposes a burden that exceeds the benefits of using grpc (and if so, how)?

I think it's a mix of them.
For a) - our practical next aim would be integrating with other existing and established revision-scoring models already on LiftWing e.g revert_risk_model or article_country services.
For b) - I think in the end we also would expect potential users outside of the caching service. Some of our APIs, which we would like to integrate with hoarde, offer also optional parameters beyond wiki/page/revision to offer custom behavior e.g. model threshold for article topics. So most users would be happy to go to cache for default behaviour, but I think we need to be open to custom parameters and other users.
For c) - All our current users and networking stack is based on HTTP so this would practically mean adding and maintaining a new protocol layer for us. I definitely see the benefits of supporting gRPC and would like LiftWing infra and services to support it, but our current biggest benefit would be integrating with cache. I assume other users who would like to integrate with caching service could be in similar position.

That said, there's a practical concern on the ML Platform side and other non-ML services willing to integrate. The vast majority of our internal services communicate over HTTP, and our infra (mesh, ingress, routing) is built around that. To integrate with gRPC-only Hoarde, each service team would need to either deploy an adapter/proxy alongside their service, or deploy a gRPC-capable replica. I've prototyped the adapter approach and discussed the replica route with Luca - both are technically feasible. But as Hoarde would onboard more use-cases, I can imagine this becoming a pattern where every integrating service carries extra infrastructure just to bridge the protocol gap. So we’re shifting complexity and maintenance from Hoarde to its clients.

To me the above is the main concern, since almost every service at the foundation runs HTTP..

Is this because you a) envision the service being used to do caching for already implemented systems, b) as-yet-implemented systems that will invariably need an HTTP service anyway (and if so, why), or c) because using something other than HTTP generally imposes a burden that exceeds the benefits of using grpc (and if so, how)?

Thanks @BWojtowicz-WMF! I want to reply to these, but first I'd like to propose a bit of process:

The proposal doc contains a lot of bold, authoritative-sounding proclamations that amount to my opinion on the matter. That's not by accident, I'm a big proponent of Cunningham's Law. :) That said, I want the outcome to be one of consensus, so as these discussions play out, I'll update the proposal to reflect as much (and reference the relative discussions when it makes sense). I wanted to get that out of the way first though, because that proposal has been a work in progress that is missing some rationale for those bold proclamations, I took the time to back-fill some just now, and didn't want you to think a reference to them was meant to shut down discussion!


For a) - our practical next aim would be integrating with other existing and established revision-scoring models already on LiftWing e.g revert_risk_model or article_country services.

Ok.

For b) - I think in the end we also would expect potential users outside of the caching service. Some of our APIs, which we would like to integrate with hoarde, offer also optional parameters beyond wiki/page/revision to offer custom behavior e.g. model threshold for article topics. So most users would be happy to go to cache for default behaviour, but I think we need to be open to custom parameters and other users.

If I understand you correctly (and if I don't, please don't hesitate to correct me), you're arguing that we might have uses that can't be satisfied, which would force a product team to build an HTTP API to serve them, one that would otherwise have also worked as the lambda (while providing an example of a hypothetical use-case). Or put another way, that (a, above) we might have past use cases with extant HTTP APIs, and (b) we might have (unavoidable) future ones too.

I have two responses to this, I think: One is that we should be careful as we walk the line between painting ourselves into a corner (i.e. making a decision now that makes something painful or intractable later), and risking success with a scope that is too wide. My personal hope is that we can employ JIT engineering here, and take small measured steps as we see a need manifest. Doing so carries some risk of course, but then so does the alternative.

And two, to address that hypothetical (with a hypothetical solution ☺️): What is currently proposed is really minimalist. It's as minimal as I could imagine working for the single concrete use-case we have (article topics). It's almost certainly too minimal, but that's a feature not a bug (we couldn't have built a feature wrong, if we didn't build it in the first place). If we had such a hypothetical use-case in front of us, I think we could accommodate it rather easily. It sounds like a case that could be solved by simply passing such parameters on to the lambda.

I added a section to the proposal doc with regard to objectives that I think is relevant here.

For c) - All our current users and networking stack is based on HTTP so this would practically mean adding and maintaining a new protocol layer for us. I definitely see the benefits of supporting gRPC and would like LiftWing infra and services to support it, but our current biggest benefit would be integrating with cache. I assume other users who would like to integrate with caching service could be in similar position.

If you mean "it's different and unfamiliar", that's totally fair. If there are other, infrastructural considerations I would like to understand them (though it uses HTTP/2, so that seems unlikely?). If I do understand you correctly, I think this is an issue that will come down to cost v benefit. There are costs (and savings) associated with gRPC, and costs (and savings) associated with REST APIs, and we need to choose what makes the most sense.

I added a section to the proposal to clarify linked artifact cache objectives (which ofc can be up for debate), and one with (my) rationale for gRPC over HTTP.

If I understand you correctly (and if I don't, please don't hesitate to correct me), you're arguing that we might have uses that can't be satisfied, which would force a product team to build an HTTP API to serve them, one that would otherwise have also worked as the lambda (while providing an example of a hypothetical use-case). Or put another way, that (a, above) we might have past use cases with extant HTTP APIs, and (b) we might have (unavoidable) future ones too.

It’s not a fully hypothetical situation as our current production API for article topics supports optional parameters so we would introduce breaking changes by restricting the schema. In article topic case we should safe, because ~nobody is using those optional parameters. I agree that we should take small steps and not solve for not-yet-existing problems on hoarde end and that current minimal codebase of hoarde is a great benefit, I'm very much in favor of keeping it this way.
Thank you for extending the proposal doc!

If you mean "it's different and unfamiliar", that's totally fair. If there are other, infrastructural considerations I would like to understand them (though it uses HTTP/2, so that seems unlikely?). If I do understand you correctly, I think this is an issue that will come down to cost v benefit. There are costs (and savings) associated with gRPC, and costs (and savings) associated with REST APIs, and we need to choose what makes the most sense.

I think different and unfamiliar part is part of it, but it’s more of the development/maintenance cost that would come with running gRPC servers/adapters and supporting both HTTP and gRPC on LiftWing. From ML development perspective, we built our usual development/deployment/testing/observability lifecycle and tooling around HTTP APIs so huge part of our codebase is written and configured with HTTP endpoints in mind. So in our view, there is an up-front costs of adding gRPC support to our codebase as well as additional maintenance cost of running and supporting it as we are not switching technologies, but adding it to our stack.

I haven't played around with https://connectrpc.com/ at all, but it looks like it was designed with these exact concerns in mind, and could allow us to write and ship a fully-compliant gRPC server that also, with zero additional work required, would accept an easy-to-speak dialect of plain HTTP.

I said this elsewhere, but I think it bears having it here as well:

The implications here are larger than a this for that replacement (s/gRPC/HTTP/g) on the proposal. It moves us away from the platform-y/FaaS paradigm, and more toward the realm of a general HTTP caching proxy. It's a significant expansion of scope with implications on software complexity (client & server), on-boarding, and production support. If this is the way, I think we have to (re)evalutate not just the architecture, but resourcing, timelines, and ownership.

Just my 2 cents about various objections I've seen raised in this task:

  • If anything, we should've started moving to gRPC for internal service to service communications a long time ago; we haven't done it mostly because we didn't have immediate needs.
  • Our mesh is not built "around HTTP" more than it's built around gRPC. It can work pefectly well with grpc. In fact, quite a few internal functions of our service mesh are built using grpc (for the same reason it would make sense using grpc here)
  • Given gRPC uses HTTP/2 for transport, and it works perfectly fine with our mesh, our ingress, and our routing logic.
  • I'm not sure I understand the comment about needing an http replica, but frankly, what stops you from running your "service" (which is just a lambda) with a sidecar of the linked artifact cache to provide the http interface?

So I'd like to understand what exactly are the problems you all see with the idea of exposing a grpc interface from your services. I mean direct, practical issues, so that we can start working on resolving those. Maybe a meeting is a good idea?

I'm not sure I understand the comment about needing an http replica, but frankly, what stops you from running your "service" (which is just a lambda) with a sidecar of the linked artifact cache to provide the http interface?

@Joe I am the one who started the question, so I'll jump in :) I am a bit skeptical that everything works transparently via grpc or HTTP, because I have never heard of any service within our mesh running only via grpc. Could you add some example of services that are already using it so I can understand what's already there?

The ML use case is a bit different, since they don't use the same mesh module but Istio sidecars. The Kserve framework allows to expose a service via grpc, but for various reasons the deployment needs to be duplicated (because there could only be one port set for each InferenceService resource, either HTTP or grpc). So technically it should be just a matter of either:

  • Creating a new "proxy" to translate grpc to http within every ML k8s namespace (something that the ML team already tried to work on, and I warned them about the potential technical debt).
  • Accept the duplication of the resources and create multiple InferenceService resources when needed. Say that you want the outlink model server as lambda for artifact caching: you'll have two separate deployments, one for HTTP and one for grpc. There is some non negligible work on the ML side to test all of this, but it will be more future-proof imho (so when Kserve will allow multiple ports/protocols in the same deployment, it will be just a matter of configuration update for them).

My main point and doubt though is why the work is pushed to whoever exposes a lambda, versus allowing the artifact caching service to just use either grpc or http. I get the former is probably better for various reasons, but I haven't seen big issues with HTTP so far either. I am genuinely asking to understand what is the main benefit, I don't have a ton of familiarity with grpc so I am probably missing a big one.

[ ... ]
My main point and doubt though is why the work is pushed to whoever exposes a lambda, versus allowing the artifact caching service to just use either grpc or http. I get the former is probably better for various reasons, but I haven't seen big issues with HTTP so far either. I am genuinely asking to understand what is the main benefit, I don't have a ton of familiarity with grpc so I am probably missing a big one.

Have you seen: Why gRPC when so much of what we already do is REST?

Edit: Fixed the missing anchor on the link.

I want to share a small update from our side on where we are.

Our immediate need is allowing integration of article topic data to the Mobile Apps team, where we agreed to start this integration end of Q3 / beginning of Q4. The estimated load would be ~1M requests/day (~12RPS) + backfilling to beginning of the year 2026. This is something our deployment with multiple replicas could handle, but having some sort of cache could help us, especially for latency.

There are a few issues that came up during integration work with hoarde:

First one is the difficulty of enabling gRPC networking to our cluster. This is what Luca described above and something Tobias is looking into regarding scope of this work.

Second and potentially the main one is not meeting hoarde assumptions about our service being entirely for the purpose of integrating with the cache, which opens a couple of practical integration issues:

  1. We want to integrate with our existing article topic service. This service currently runs both request-response path, but also consumes page change events via changeprop and produces prediction events to Eventgate. We would need to make sure this event-driven path is still preserved.
  2. We would still want to expose HTTP API for other users. We expose parameters beyond wiki_id/page_id/revision_id, which are being used by other users, so hoarde would mainly cover the access path for Mobile Apps team, where we need cache the most.
  3. If we would expose a native gRPC endpoint to the current service, there is a gRPC contract mismatch. Hoarde expects LambdaService.GetArtifact with its own request/response types. Our inference stack is based on KServe, which has native gRPC support, however it exposes GRPCInferenceService.ModelInfer with different types (https://kserve.github.io/website/docs/concepts/architecture/data-plane/v2-protocol#grpc-api). It means we would need adapter between hoarde and article topics anyway even if both use gRPC. I have prototyped and tested a gRPC/HTTP adapter that bridges the gap between our services, but maintaining those adapters per ML service would be additional infrastructure cost for us.

With the above in mind, the cost/benefit ratio of integrating with hoarde is starting to lean slightly more towards costs for us, especially if we could meet the load requirements purely by scaling our replicas. This is also because our other services we would potentially like to integrate with caching service would be in a similar position.

I want to share a small update from our side on where we are.

Our immediate need is allowing integration of article topic data to the Mobile Apps team, where we agreed to start this integration end of Q3 / beginning of Q4. The estimated load would be ~1M requests/day (~12RPS) + backfilling to beginning of the year 2026. This is something our deployment with multiple replicas could handle, but having some sort of cache could help us, especially for latency.

There are a few issues that came up during integration work with hoarde:

First one is the difficulty of enabling gRPC networking to our cluster. This is what Luca described above and something Tobias is looking into regarding scope of this work.

Second and potentially the main one is not meeting hoarde assumptions about our service being entirely for the purpose of integrating with the cache, which opens a couple of practical integration issues:

  1. We want to integrate with our existing article topic service. This service currently runs both request-response path, but also consumes page change events via changeprop and produces prediction events to Eventgate. We would need to make sure this event-driven path is still preserved.
  2. We would still want to expose HTTP API for other users. We expose parameters beyond wiki_id/page_id/revision_id, which are being used by other users, so hoarde would mainly cover the access path for Mobile Apps team, where we need cache the most.
  3. If we would expose a native gRPC endpoint to the current service, there is a gRPC contract mismatch. Hoarde expects LambdaService.GetArtifact with its own request/response types. Our inference stack is based on KServe, which has native gRPC support, however it exposes GRPCInferenceService.ModelInfer with different types (https://kserve.github.io/website/docs/concepts/architecture/data-plane/v2-protocol#grpc-api). It means we would need adapter between hoarde and article topics anyway even if both use gRPC. I have prototyped and tested a gRPC/HTTP adapter that bridges the gap between our services, but maintaining those adapters per ML service would be additional infrastructure cost for us.

With the above in mind, the cost/benefit ratio of integrating with hoarde is starting to lean slightly more towards costs for us, especially if we could meet the load requirements purely by scaling our replicas. This is also because our other services we would potentially like to integrate with caching service would be in a similar position.

@BWojtowicz-WMF is this something you could diagram? I'm trying to visualize all of this, and I'm not confident I fully understand.

If you want to expose the HTTP API for users, that would happen via the component the linked artifact service would expose. I don't understand what the issue is here.

@Joe Hoarde's HTTP API only exposes wiki_id/page_id/revision_id parameters, which would cover the use-case for the Mobile Apps team. However, our service also exposes additional parameters (e.g. page_title, threshold) that some users rely on. On top of that, exposing HTTP API is extremely useful for us for development/debugging.
I think those would not be as problematic if we were building a new service with hoarde in mind from the beginning, however we're trying to integrate caching into existing services.

@Eevans I'll try to work on some nice and simple diagram.

Okay, I've done a few not too technical sketches trying to visualize the issue we're facing.

  1. This is the current state of our service. We support both request/response path for users/developers/apps and event path powered by changeprop.

image.png (682×1 px, 146 KB)

  1. This is what I assume is the ideal design from Hoarde point of view. Users go only to Hoarde, whereas we power the cache misses.

image.png (758×988 px, 167 KB)

  1. However, due to reasons written in here, our current design starts looking like this. Note it's still unclear where the whole event-based path from Picture 1 goes. I understand Hoarde can get triggered by a page change event, but what about publishing events with prediction data - would our service still need to publish those?

image.png (791×984 px, 224 KB)

Okay, I've done a few not too technical sketches trying to visualize the issue we're facing.

These are great, thank you @BWojtowicz-WMF!

[ ... ]

  1. However, due to reasons written in here, our current design starts looking like this. Note it's still unclear where the whole event-based path from Picture 1 goes. I understand Hoarde can get triggered by a page change event, but what about publishing events with prediction data - would our service still need to publish those?

image.png (791×984 px, 224 KB)

What are you doing with the threshold argument? Are you late filtering the response from the inference service, or invoking the service with a threshold as the constraint? If the latter, is there any reason you couldn't late filter a cached response (i.e. is the cached response somehow constrained to a limited set of thresholds)?

What are you doing with the threshold argument? Are you late filtering the response from the inference service, or invoking the service with a threshold as the constraint? If the latter, is there any reason you couldn't late filter a cached response (i.e. is the cached response somehow constrained to a limited set of thresholds)?

The filtering currently happens inside the service using threshold as the constraint, before it returns the response to the user. In principle, it could be possible to do late-filtering by having the inference service return all predictions (essentially the behaviour with threshold=0) and doing late-filtering outside of the inference service - this would shift responsibility of filtering to the user.

Apologies; My diagrams aren't nearly as pretty as yours:

... Note it's still unclear where the whole event-based path from Picture 1 goes. I understand Hoarde can get triggered by a page change event, but what about publishing events with prediction data - would our service still need to publish those?

Events would be plumbed just as they are now, only the request would go to Hoarde instead of Liftwing. It's a new revision, so we're triggering a cache miss and returning what Liftwing currently does (via the Lambda). In a perfect world, the response from Hoarde —the linked artifact— is all that your clients need, and they can consume that directly. I'm not 100% sure what that would mean for an externally-facing API, I think @Joe has gone on record as saying that these responses should be not be served to external clients. In that case (in addition to those where clients need something other than the cached artifact), see the second diagram below.

image.png (773×1 px, 40 KB)

For your thresholds example (and/or because you have a public-facing api), you might have a small service that transforms or enriches the artifact in some way. In this case, your lambda would indeed have to return the full set, all thresholds (it seems like that should be the case anyway), and the service could filter results to match the threshold constraint.

image.png (791×1 px, 41 KB)

I get that this might seem like a lot, or look unnecessarily elaborate considering that you already have a service that creates the artifact output and it can serve it —we're starting from a position of prior art. However, the linked-artifacts service aims to be a platform, and separating data (the lambda) from presentation (REST et al) makes sense architecturally (and I imagine you probably did something similar within your service codebase). Think of it like you would AWS, those services were built the way they were so that Amazon could provide various capabilities, and support them at scale. If you were moving something to AWS, you would expect to have to conform to interfaces that weren't built specifically for what you have. People do that though when they're expecting the long-term benefits to outweigh the near-term effort.