Page MenuHomePhabricator

Add namespace ID to pageview_hourly
Closed, ResolvedPublic1 Estimate Story Points

Description

Both tables already record the page ID for pageviews, but not the namespace ID of the page. While it is possible in principle to determine the namespace from the page ID, it is very tedious and often impractical. Having the namespace directly available as field would make it much easier to answer numerous questions about Wikipedia usage.

From T92875: Add page_id and namespace to X-Analytics header in App / api requests it appears that this has already been considered for a while, but only the page ID was actually added.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 2 2017, 5:47 AM
Nuria added a subscriber: Nuria.Feb 2 2017, 5:08 PM

Should already be there at least for some pages? https://wikitech.wikimedia.org/wiki/X-Analytics

Thought during grooming: this might already be done by Ori but is not working.

This field is already populated in webrequest: x_analytics_map['ns'] (see https://wikitech.wikimedia.org/wiki/X-Analytics).

This task is then about adding the field to pageview_hourly.

Finally I ran a query checking for presence over an hour of webrequest:

No page, no namespace1.46%
No page, namespace5.82%
page and namespace92.72%

From those, there are no data provided by webApps, and proportions might be different on different hours.

This task should then be about adding the namespace field to the pageview_hourly table

JAllemandou renamed this task from Add namespace ID to webrequest and pageview_hourly to Add namespace ID to pageview_hourly.Feb 2 2017, 5:41 PM

Change 335679 had a related patch set uploaded (by Joal):
Add explicit namespace to webrequest and pageview

https://gerrit.wikimedia.org/r/335679

Tbayer added a comment.EditedFeb 2 2017, 5:58 PM

Right, this task was written with pageview_hourly in mind; I added webrequest only as an afterthought without checking thoroughly - I kind of assumed from the fact that T92875 is still open that we wouldn't yet have fully valid namespace data in that table either. Nevertheless it would be nice to have it in webrequest too as a separate, officially supported field ;) (as @JAllemandou's patch appears to have just done)

Nuria edited projects, added Analytics-Kanban; removed Analytics.Feb 3 2017, 12:34 AM

Correction to prior post: this is working fine for all requests but the ones that come from the apps, thus our code needs to take into account that the data might not always be present

Change 335679 merged by Nuria:
Add explicit namespace to webrequest and pageview

https://gerrit.wikimedia.org/r/335679

Nuria set the point value for this task to 1.Feb 8 2017, 5:00 PM
Nuria closed this task as Resolved.Feb 9 2017, 8:48 PM