Page MenuHomePhabricator

Count pageviews for all wikis behind varnish
Closed, ResolvedPublic5 Story Points

Description

Currently the pageview API only tracks what we call "knowledge" pageviews, we have an outstanding request
to track requests to other systems, wikis (like "outreach").

Event Timeline

Nuria created this task.Mar 17 2016, 7:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2016, 7:26 PM

So you're only tracking details about some wikis, not others? You're not going to be able to get information about views on things like wikitech

Nuria renamed this task from Track Pageviews for all wikis/systems to Count requests for all wikis/systems .Mar 17 2016, 7:29 PM

So you're only tracking details about some wikis, not others? You're not going to be able to get information about views on things like wikitech

Clarification: pageview API only serves counts about "knowledge" wikis, defined in pageview definition: https://meta.wikimedia.org/wiki/Research:Page_view

This task is about being able to count requests on other systems . All requests that hit varnish are count-able if that makes sense.

Krenair added a comment.EditedMar 17 2016, 7:43 PM

Okay, so everything going via varnish, not all systems or all wikis.

Krenair renamed this task from Count requests for all wikis/systems to Count requests for all wikis/systems behind varnish.Mar 17 2016, 7:43 PM
Milimetric triaged this task as Normal priority.Mar 21 2016, 4:11 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.
Milimetric set the point value for this task to 0.

Would this then also include all of the chapters wikis, outreach etc?

Nuria added a comment.Mar 24 2016, 8:43 PM

Would this then also include all of the chapters wikis, outreach etc?

Yes, it would. We do not have an ETA for this item though, we were hoping to get to it in the next three months but we are not sure we can get there.

Sadads added a subscriber: Sadads.May 13 2016, 1:36 PM

@Nuria do we have a sense of when this will happen? There are a fair number of dependencies on having this data available.

Nuria added a comment.EditedMay 13 2016, 3:28 PM

@Sadads: not for a at least 3 months, we are focusing of edit data after having worked on pageview data for a while.

As I said before (and I understand this is less convenient) we data on cluster for all wikis for the last 60 days at all times so our work on this regard should not block you, you can get the data (in a less convenient fashion) right now.

Nuria moved this task from Backlog (Later) to Dashiki on the Analytics board.May 23 2016, 4:34 PM
Nuria renamed this task from Count requests for all wikis/systems behind varnish to Count pageviews for all wikis/systems behind varnish.May 30 2016, 4:38 PM
Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.Jul 4 2016, 5:01 PM

Per our conversation with research (cc @DarTar and @Erik_Zachte) we are going to add "not knowledge wikis" to our pageview pipeline. For two reasons:

  1. magnitude of pageviews is really small, they will not affect regular stats. Also, we still have a whitelist mechanism so wikis that want to be excluded can be so.
  1. excluding these wikis creates more issues than it solves.

Thanks @Nuria thats good to know: community programs and events could
really use the data coming off these wikis.

\o/ (I don't often spam phabricator, but when I do, I'm really excited) :D

Nuria added a comment.EditedAug 4 2016, 4:54 PM

Let's start with outreachwiki, nl.wikimedia, ru.wikimedia, be.wikimedia, strategy.

  • Refactor the code on pageview definition that restricts counting to certain urls
  • Add wikis to the whitelist.

Monitor propagation to pageview API. To be clear we cannot count pageviews retroactively as we do not have past data but we can acount them going forward.

Nuria changed the point value for this task from 0 to 8.Aug 4 2016, 4:59 PM
Sadads added a comment.Sep 8 2016, 9:33 PM

@Nuria Cool! Is there a timeline for this: next couple sprints?

Nuria added a comment.Sep 8 2016, 11:15 PM

@Sadads: ETA is the end of next quarter, that is, end of December.

Sadads added a comment.Sep 9 2016, 6:39 PM

good to know, that delays one of my projects then: which is fine, it looked
like it might have been too early in next quarter anyway, Alex

Nuria edited projects, added Analytics-Kanban; removed Analytics.Oct 5 2016, 3:47 PM
Nuria moved this task from Next Up to In Progress on the Analytics-Kanban board.Oct 17 2016, 5:20 PM

Change 316838 had a related patch set uploaded (by Nuria):
Adding oureach wikipedia to Pageview whitelist

https://gerrit.wikimedia.org/r/316838

Change 316845 had a related patch set uploaded (by Nuria):
Enhancing regex to support pageviews to non-knowledge wikis

https://gerrit.wikimedia.org/r/316845

Nuria claimed this task.Oct 20 2016, 7:44 PM
Nuria changed the point value for this task from 8 to 5.

Change 316838 merged by Joal:
Adding several wikis to Pageview whitelist

https://gerrit.wikimedia.org/r/316838

Could se.wikimedia.org (Wikimedia Sverige) also be added?

Change 316845 merged by jenkins-bot:
Enhancing regex to support pageviews to non-knowledge wikis

https://gerrit.wikimedia.org/r/316845

Change 319084 had a related patch set uploaded (by Joal):
Update jar version for webrequest load job

https://gerrit.wikimedia.org/r/319084

Change 319084 merged by Ottomata:
Update jar version for webrequest load job

https://gerrit.wikimedia.org/r/319084

Change 319105 had a related patch set uploaded (by Milimetric):
Include pageviews for all wikis in whitelist

https://gerrit.wikimedia.org/r/319105

Change 319105 merged by Joal:
Include pageviews for all wikis in whitelist

https://gerrit.wikimedia.org/r/319105

Nuria renamed this task from Count pageviews for all wikis/systems behind varnish to Count pageviews for all wikis behind varnish .Nov 1 2016, 6:51 PM
Nuria moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Nov 1 2016, 6:57 PM
Nuria moved this task from Done to Ready to Deploy on the Analytics-Kanban board.
Nuria added a comment.Nov 1 2016, 7:00 PM

Changes are on pageview hourly, waiting for changes to appear on pageview API to close ticket. cc @Sadads

Tested this morning through pivot and pageview-api -- Seems very ok :)

Nuria added a comment.Nov 2 2016, 2:57 PM

See an example of a query returning data for pageview API: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/outreach.wikimedia/all-access/user/Main_Page/daily/2016103100/2016110200 for outreach

Also, pageview tool displays pageviews for this data too: https://tools.wmflabs.org/pageviews-test/?project=outreach.wikimedia.org&platform=all-access&agent=user&range=latest-20&pages=Main_Page

Please have in mind that data for 1st of November will be partial.

cc @MusikAnimal so he is aware new projects have been added to the API

Nuria moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Nov 2 2016, 2:58 PM

Woohoo! Thank you @Nuria and all who helped make this happen! :)