Add page_id and namespace to X-Analytics header in App / api requests
Open, NormalPublic

Description

Currently, Mediawiki has the ability to add items to the X-Analytics header via a nice little extension[1] written by Ori. This is deployed in production and is working just fine.

However, it is not activated for requests that come from mobile apps. In order to reliably count page views, we need to count based on page_id, and not a page title extracted from the request uri path. To include mobile apps page views in these counts, mobile apps requests will need to have these fields added to their X-Analytics headers.

I don't know much about how mobile apps make requests. AFAIK, this could be done in the apps themselves, or at the Mediawiki API level. Since Mediawiki already supports this, perhaps it is better to do so there.

Thoughts? Anyone know how we can actually do this?

Thanks!

[1] https://github.com/wikimedia/mediawiki-extensions-XAnalytics

Ottomata created this task.Mar 16 2015, 6:27 PM
Ottomata updated the task description. (Show Details)
Ottomata raised the priority of this task from to Normal.
Ottomata added subscribers: Ottomata, dr0ptp4kt, yuvipanda and 2 others.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 16 2015, 6:27 PM
Ottomata updated the task description. (Show Details)Mar 16 2015, 6:28 PM
Ottomata set Security to None.

I prefer header enrichment be done at the MediaWiki API tier if possible. The parameter name-value pairs in the query string of course dictate *whether* something qualifies as a potential pageview, and then the MediaWiki API tier could denote which, for example, actual page & namespace is being accessed (or whatever else is performant to add). Avoidance of additional cache fragmentation or cache pollution is necessarily in important facet of any MediaWiki API + VCL solution.

+1 for doing it at the mediawiki API level. I think that would necessitate a patch to MobileFrontend, which houses the action=mobileview that the apps use.

Anomie added a subscriber: Anomie.Mar 16 2015, 7:35 PM

Hmm, no XAnalytics ? As far as I understand this, that's where this really belongs.

The XAnalytics extension should probably be using the APIAfterExecute hook to do the same thing it does using the BeforePageDisplay hook for the web UI.

Seems like it needs to be done in the extension, rather than in the app.

Hm? The extension is meant to be used by MW code. We wouldn't code usages of the extension in the extension itself.

I suppose the APIAfterExecute hook should do this:

https://phabricator.wikimedia.org/rEWMV017f9d845c45697c1e84b50d8a16e65f60b68fee

?

Anomie added a comment.Apr 6 2015, 6:21 PM

No. It would more likely do like rEXAN64e43ad61363ff69d2bad173a6693ff311876ec0#C1546702NL26, and be in the XAnalytics extension rather than WikimediaEvents.

It could almost expose the same XAnalyticsSetHeader hook to stuff like WikimediaEvents, except for the part where that hook is being passed an OutputPage which makes little sense in an API context.

BTW, MediaWiki core has nothing to do with this "X-Analytics" header, it's all done by extensions MobileFrontend and XAnalytics. Possibly that should be merged into core somehow, but it isn't yet.

Ottomata added a comment.EditedApr 6 2015, 7:20 PM

I'm not sure I understand. Why would we make a generic extension actually do anything? Isn't the point of having a generic extension so that it can be used other callers? The commit you link to is the extension creation. The commit I link to uses it.

Anomie added a comment.Apr 6 2015, 7:25 PM

Your problem is that the generic extension is not doing its thing for API queries, because it's only doing its thing in response to the BeforePageDisplay hook which isn't fired during (most) API queries.

Change 202800 had a related patch set uploaded (by Legoktm):
Add X-Analytics header for API requests too

https://gerrit.wikimedia.org/r/202800

Change 202802 had a related patch set uploaded (by Legoktm):
Add page_id to X-Analytics header for action=mobileview requests

https://gerrit.wikimedia.org/r/202802

Change 202801 had a related patch set uploaded (by Ottomata):
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

Change 202799 had a related patch set uploaded (by Ottomata):
Move BeforePageDisplay hook to separate class

https://gerrit.wikimedia.org/r/202799

Change 202802 merged by jenkins-bot:
Add page_id and ns to X-Analytics header for action=mobileview requests

https://gerrit.wikimedia.org/r/202802

Change 202799 merged by jenkins-bot:
Move BeforePageDisplay hook to separate class

https://gerrit.wikimedia.org/r/202799

Change 202800 merged by jenkins-bot:
Add X-Analytics header for API requests too

https://gerrit.wikimedia.org/r/202800

I prefer header enrichment be done at the MediaWiki API tier if possible. The parameter name-value pairs in the query string of course dictate *whether* something qualifies as a potential pageview, and then the MediaWiki API tier could denote which, for example, actual page & namespace is being accessed (or whatever else is performant to add).

It doesn't look like this was done. Instead it passes an OutputPage (from $module->getOutput(); as @Anomie said, is this even meaningful for an API module?) to the same hook.

The WikimediaEvents listener is now called by that hook for API requests, despite saying "but only if the request is for an actual page".

On a typical title-specific API request with api.php, e.g.http://127.0.0.1:8080/w/api.php?action=query&prop=revisions&titles=Main%20Page , the title is "Badtitle/dummy title for API calls set in api.php", and this is skipped because that page doesn't exist.

The MobileFrontend extension does not override this title either AFAICT (no relevant setTitle call).

I'll add a NULL check, but I'm not sure if this is useful for API requests at all in its current form.

Change 206352 had a related patch set uploaded (by Mattflaschen):
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206352

Change 206352 merged by jenkins-bot:
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206352

Change 206959 had a related patch set uploaded (by Mattflaschen):
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206959

Change 206959 merged by jenkins-bot:
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206959

Status update? Its hard for me to follow all of the patches.

No one's responded to https://phabricator.wikimedia.org/T92875#1233108 .

In summary, I don't think it should be using the output title for API requests. Instead, it should log a meaningful title (or titles) related to the API request.

Change 202801 merged by jenkins-bot:
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

Ok, thanks Matt.

So, does that mean that the API code needs to figure out which title or titles are being requested and add them to X-Analytics via a different path? There must be a way to get the requested titles, if not in a nice way, at least from the query parameters.

Yeah, take a look at https://www.mediawiki.org/wiki/API:Query#Specifying_pages . Not all API modules relevant to pages subclass API:Query, but a lot do.

In the code, see ApiPageSet.php, ApiQuery.php, and ApiQueryBase.php.

kevinator moved this task from Incoming to Radar on the Analytics-Backlog board.
Milimetric moved this task from Incoming to Radar on the Analytics board.Jan 12 2016, 7:32 PM
Krinkle added a subscriber: Krinkle.Jan 4 2017, 4:45 AM

Change 202801 merged by jenkins-bot:
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

It seems addItem() doesn't work. self::$items is not used.

Change 338931 had a related patch set uploaded (by Krinkle):
Remove unused $items member

https://gerrit.wikimedia.org/r/338931

Change 338931 merged by jenkins-bot:
Remove unused $items member

https://gerrit.wikimedia.org/r/338931