Page MenuHomePhabricator

Homepage: allow including yesterday's data when querying pageviews with WikimediaPageViewService
Closed, ResolvedPublic

Description

WikimediaPageViewService won't query pageview data for today or yesterday. Excluding today makes sense since the API doesn't seem to return any data but yesterday is available and interesting even if it can be wrong due to processing lag. This is current hardcoded to $this->lastCompleteDay = strtotime( '0:0 2 days ago' );

The Growth team's newcomer homepage impact module (T216217) is using this service to show pageviews of recent contributions and we would like to have the option to include yesterday's data in order to show data as early as possible even if it is not always right.

We're flexible about how exactly this is configured and can work on the implementation with guidance from whoever is responsible for the PageViewInfo extension.

Event Timeline

SBisson created this task.Feb 27 2019, 8:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2019, 8:27 PM
MMiller_WMF renamed this task from Allow including yesterday's data when querying pageviews with WikimediaPageViewService to Homepage: allow including yesterday's data when querying pageviews with WikimediaPageViewService.Feb 28 2019, 12:29 AM
MMiller_WMF added subscribers: kostajh, Cntlsn, kaldari and 4 others.

I'm surprised to see this task because in my own testing, I have pageview numbers start to show up in the impact module the day after my edits (not two days after). But if this is true, we should work on it so that the impact module is engaging one day sooner.

SBisson added a subscriber: Tgr.Jun 5 2019, 2:16 PM

PageViewInfo doesn't have official maintainers so pinging top contributors @Legoktm and @Tgr.

WikimediaPageViewService is explicitly excluding yesterday's data for reasons documented here.

Based on my understanding of the API's gotchas and the fact that the PageViews tool is showing yesterday's data with the expectation that it may sometimes be empty, I propose we change WikimediaPageViewService to include it. Assuming we don't change anything in CachedPageViewService, which caches the data for one day, the direct consequence is that we would have access to yesterday's data in most cases (when there is no processing lag) instead of never like it is now.

Please let me know what you think.

Tgr added a comment.Jun 5 2019, 2:25 PM

Sounds like a product owner decision more than a maintainer one, except no one owns it. I certainly don't see any technical problems with the change.
I'm not aware of anyone using the API, and I doubt the graphs in action=pageinfo get viewed much by editors, so as far as I'm concerned feel free to change it.

Change 514513 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/PageViewInfo@master] Include yesterday's data

https://gerrit.wikimedia.org/r/514513

Change 514513 merged by jenkins-bot:
[mediawiki/extensions/PageViewInfo@master] Include yesterday's data

https://gerrit.wikimedia.org/r/514513

Etonkovidova closed this task as Resolved.Jun 7 2019, 7:26 AM

Verified the fix in en betalabs.

I'm surprised to see this task because in my own testing, I have pageview numbers start to show up in the impact module the day after my edits (not two days after). But if this is true, we should work on it so that the impact module is engaging one day sooner.

Since Pageviews time UTC time, in PDT timezone "yesterday" starts 7 hours later than it starts in UTC and that might make the impression that stats were always coming from yesterday.