Page MenuHomePhabricator

Count pageviews for all wikis behind varnish
Closed, ResolvedPublic5 Estimated Story Points

Description

Currently the pageview API only tracks what we call "knowledge" pageviews, we have an outstanding request
to track requests to other systems, wikis (like "outreach").

Event Timeline

So you're only tracking details about some wikis, not others? You're not going to be able to get information about views on things like wikitech

Nuria renamed this task from Track Pageviews for all wikis/systems to Count requests for all wikis/systems .Mar 17 2016, 7:29 PM

So you're only tracking details about some wikis, not others? You're not going to be able to get information about views on things like wikitech

Clarification: pageview API only serves counts about "knowledge" wikis, defined in pageview definition: https://meta.wikimedia.org/wiki/Research:Page_view

This task is about being able to count requests on other systems . All requests that hit varnish are count-able if that makes sense.

Okay, so everything going via varnish, not all systems or all wikis.

Krenair renamed this task from Count requests for all wikis/systems to Count requests for all wikis/systems behind varnish.Mar 17 2016, 7:43 PM
Milimetric triaged this task as Medium priority.Mar 21 2016, 4:11 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.
Milimetric set the point value for this task to 0.

Would this then also include all of the chapters wikis, outreach etc?

Would this then also include all of the chapters wikis, outreach etc?

Yes, it would. We do not have an ETA for this item though, we were hoping to get to it in the next three months but we are not sure we can get there.

@Nuria do we have a sense of when this will happen? There are a fair number of dependencies on having this data available.

@Sadads: not for a at least 3 months, we are focusing of edit data after having worked on pageview data for a while.

As I said before (and I understand this is less convenient) we data on cluster for all wikis for the last 60 days at all times so our work on this regard should not block you, you can get the data (in a less convenient fashion) right now.

Nuria renamed this task from Count requests for all wikis/systems behind varnish to Count pageviews for all wikis/systems behind varnish.May 30 2016, 4:38 PM

Per our conversation with research (cc @DarTar and @Erik_Zachte) we are going to add "not knowledge wikis" to our pageview pipeline. For two reasons:

  1. magnitude of pageviews is really small, they will not affect regular stats. Also, we still have a whitelist mechanism so wikis that want to be excluded can be so.
  1. excluding these wikis creates more issues than it solves.

Thanks @Nuria thats good to know: community programs and events could
really use the data coming off these wikis.

\o/ (I don't often spam phabricator, but when I do, I'm really excited) :D

Let's start with outreachwiki, nl.wikimedia, ru.wikimedia, be.wikimedia, strategy.

  • Refactor the code on pageview definition that restricts counting to certain urls
  • Add wikis to the whitelist.

Monitor propagation to pageview API. To be clear we cannot count pageviews retroactively as we do not have past data but we can acount them going forward.

Nuria changed the point value for this task from 0 to 8.Aug 4 2016, 4:59 PM

@Nuria Cool! Is there a timeline for this: next couple sprints?

@Sadads: ETA is the end of next quarter, that is, end of December.

good to know, that delays one of my projects then: which is fine, it looked
like it might have been too early in next quarter anyway, Alex

Change 316838 had a related patch set uploaded (by Nuria):
Adding oureach wikipedia to Pageview whitelist

https://gerrit.wikimedia.org/r/316838

Change 316845 had a related patch set uploaded (by Nuria):
Enhancing regex to support pageviews to non-knowledge wikis

https://gerrit.wikimedia.org/r/316845

Nuria changed the point value for this task from 8 to 5.

Change 316838 merged by Joal:
Adding several wikis to Pageview whitelist

https://gerrit.wikimedia.org/r/316838

Could se.wikimedia.org (Wikimedia Sverige) also be added?

Change 316845 merged by jenkins-bot:
Enhancing regex to support pageviews to non-knowledge wikis

https://gerrit.wikimedia.org/r/316845

Change 319084 had a related patch set uploaded (by Joal):
Update jar version for webrequest load job

https://gerrit.wikimedia.org/r/319084

Change 319084 merged by Ottomata:
Update jar version for webrequest load job

https://gerrit.wikimedia.org/r/319084

Change 319105 had a related patch set uploaded (by Milimetric):
Include pageviews for all wikis in whitelist

https://gerrit.wikimedia.org/r/319105

Change 319105 merged by Joal:
Include pageviews for all wikis in whitelist

https://gerrit.wikimedia.org/r/319105

Nuria renamed this task from Count pageviews for all wikis/systems behind varnish to Count pageviews for all wikis behind varnish .Nov 1 2016, 6:51 PM

Changes are on pageview hourly, waiting for changes to appear on pageview API to close ticket. cc @Sadads

Tested this morning through pivot and pageview-api -- Seems very ok :)

See an example of a query returning data for pageview API: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/outreach.wikimedia/all-access/user/Main_Page/daily/2016103100/2016110200 for outreach

Also, pageview tool displays pageviews for this data too: https://tools.wmflabs.org/pageviews-test/?project=outreach.wikimedia.org&platform=all-access&agent=user&range=latest-20&pages=Main_Page

Please have in mind that data for 1st of November will be partial.

cc @MusikAnimal so he is aware new projects have been added to the API

Woohoo! Thank you @Nuria and all who helped make this happen! :)