Page MenuHomePhabricator

Migrate PageViewInfo calls away from rest-gateway
Open, Stalled, MediumPublic

Description

rest-gateway is meant for centralized routing of external API calls and should not be called by services running inside production.

PageViewInfo calls it through $wgPageViewInfoWikimediaEndpoint in WikimediaPageViewService.php
and maybe in other places.

Would it be possible to change that to calling the page-analytics and device-analytics services via the service mesh?

Details

Event Timeline

Change #1240888 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] service mesh: Add page-analytics listener

https://gerrit.wikimedia.org/r/1240888

Clement_Goubert changed the task status from Open to Stalled.Feb 20 2026, 11:09 AM
Clement_Goubert triaged this task as Medium priority.
Clement_Goubert moved this task from Inbox to In Progress on the ServiceOps new board.

While trying to sort out what I thought would be a simple change in mediawiki-config/wmf-config/CommonSettings.php to switch from using the rest-gateway listener to the page-analytics listener, I found that this extension uses the same defined endpoint to call page-analytics and device-analytics. This would need to be split.

Change #1240888 merged by Clément Goubert:

[operations/puppet@production] service mesh: Add page-analytics listener

https://gerrit.wikimedia.org/r/1240888

@dr0ptp4kt Would you or someone on your team have some time to take this change on? It doesn't have maintainers, but this change is necessary for the enforcement of separation of concerns on API routing and as it's an AQS wrapper I figure you may have the right people and context to help.

Thanks for the ping @Clement_Goubert . Just wanted to acknowledge seeing it, and I'll go check on who is the appropriate maintainer on this one. The extension is marked as Unassigned. I know there's work happening in the vicinity of this, so now might be a good time to get this done.

Any timeline needs for this, by the way? That is, if it's blocking completion of other rate limiting or architecture reconfiguration initiatives, for example.

I'd also like to clarify the priority of this, please. It looks (from the ticket structure) that this is in support of WE5.1.3b, a dry-run of rate limiting for 3 routes in REST Gateway. That hypothesis looks completed though.

What is the impact of the current situation? Is it that PageViewInfo is "polluting" our observability of external API calls and making analysis and enforcement more difficult? Or is this more of a cleanup with no direct consequences to WE5.1?

When rate limits on "internal" traffic (WME/WMCS/etc) start being enforced on the rest-gateway, PageViewInfo will get rate limited as well unless we add an exception just for this extension.

I would much rather it use the proper pattern for MediaWiki API calls within WMF production, which is to call the service through the proper mesh listener, so that:

  • We have an accurate view of internal vs. external requests
  • The separation of concerns between the rest-gateway, responsible for routing and rate limiting external API calls, and service-to-service communication within WMF production remains clear
  • We don't have to maintain a rate limit exception for one extension

Current impact is indeed mental model and observability pollution.

As far as priority and timing goes, @JTweed-WMF can you weigh in please?

Looping @HCoplin-WMF , @Mooeypoo , @pmiazga for visibility for any considerations for the content attribution piece as well (connected with @JTweed-WMF , just want to make sure of awareness if not already surfaced - I see @MShilova_WMF subscribed).

Interested to hear any timeline matters from @JTweed-WMF / @HCoplin-WMF , will reserve any determination on who treats / when with product management here (probably not too hard to do, although sometimes these things are a little trickier than at first glance!).

My current understanding is that there will be no impact for the initial March rollout, but this will potentially become an issue once we start enforcing limits in April.

The long story short is that we do not currently differentiate true internal (ie: service to service traffic, which it sounds like this is) from WMCS traffic. That differentiation is blocked by: https://phabricator.wikimedia.org/T411503 So, here, it would basically mean that once we start enforcing WMCS limits, it would apply to this as well.

While I'm not sure if that would actually be an issue based on call volume off hand (maybe someone here could help me out with that), I do agree that it would pollute our observability at minimum. As @dr0ptp4kt mentioned, we may start driving more traffic through this with the new Attribution work too, which would change the math for the number of requests overall.

So yeah, I would prefer that we fix this sooner rather than later. I've also pinged @BPirkle to see if this is something that MW-Interfaces-Team could help with, but it seems like DPE is probably the closest thing to a 'real' owner of this service?

I don't recall looking at this extension before. To be sure, I think we're talking about:

Also, we're using the word "service" in two different ways on this task:

  • in "PageViewService" it means "service class", as in a PHP class defined within the extension
  • in "service mesh" it means an external piece of software executing independent of MediaWiki (in this case the AQS 2.0 Golang services)

Am I correct about all that?

As @Clement_Goubert says, the codebase calls two different AQS 2.0 services within the getRequestUrl function. Specifically, calls that include metrics/pageviews in the path map to the Page Analytics service, while calls that include metrics/unique-devices in the path map to the Device Analytics service.

For context, both these metrics endpoints were once implemented in AQS 1.0, which was a fork of RESTbase and a single codebase. As part of T263489: AQS 2.0 they were separated into different Golang codebases. For compatibility, they remained callable via the rest_v1 style urls, which has allowed everything to continue working until now.

It is unclear to me how tightly tied to WMF production the PageViewInfo extension is. The extension page on mediawiki.org mentions WMF specifically, but also mentions the possibility of other backends. This matters because if this is intended for use outside WMF, we probably wouldn't want to default to internal service mesh urls. Meaning that the rest_v1 style call probably remains for now as the default.

My guess is that we want to add two new config variables, one for the metrics/pageviews calls and another for the metrics/unique-devices calls, both of which default to the current value of PageViewInfoWikimediaEndpoint (https://wikimedia.org/api/rest_v1). And of course make the necessary code changes to ripple those through to the PageViewSerice and use them in the appropriate places. Then in WMF config, we'll override those to whatever the correct base url is to access these via the service mesh.

Does that sounds about right, or do I misunderstand?

I'll leave the MediaWiki internals for you to decide, but as far as what we want to achieve, yes, that sounds right. Given pageviews and unique-devices are now two services, the PageViewInfo extension needs to be able to address both individually through two different service mesh listeners.

With the additional context that @BPirkle provided, does this seem like something DPE could bang out, @GGoncalves-WMF & @dr0ptp4kt ?

Like I said, I don't expect this to be an issue within the next week as we ramp up rate limiting, but it is likely to bite us within a month or so. If y'all don't have capacity, we can potentially pull it into MWI as well; this service is just fairly firmly outside of our normal wheelhouse.

Thanks for the context, it really helps! T411771#11698086 sound straightforward enough. I think this is outside both of our wheelhouses really, but I'm thinking we can have DPE help out and MWI review the changes. I'm discussing this with @Ahoelzl and other DPE leads first though, and hope to update this ticket next week.

DE can take this on. @HCoplin-WMF do you have a specific timeline?

Excellent! Thanks, @Ahoelzl . In terms of timeline, it would be great if we could get this resolved before the next round of rate limiting enforcement, which is currently looking like it will roll out in roughly mid-April. The reason for that is that we neither want to erroneously 429 ourselves (especially with Attribution API adoption of this capability) nor do we want to overly inflate our API traffic numbers.