Page MenuHomePhabricator

Some pageviews data are missing for Oct 21, 2021
Closed, ResolvedPublic

Description

https://lists.wikimedia.org/hyperkitty/list/analytics@lists.wikimedia.org/thread/S3D72D4WYBFNGD3OHAV37MCN5MNESB5U/ says some pageviews data are missing for Oct 21, 2021. I quickly checked, and it looks they're right: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/zh.wikipedia/all-access/all-agents/Cat/daily/2021102000/2021102200 does not offer any data for Oct 21, 2021.

Page "Cat" at zh.wikipedia was, however, visited six times on that day, according to multiple other data sources:

[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep '^zh.wikipedia ' pageviews-20211021-spider > pageviews-20211021-spider-zhwiki
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep '^zh.wikipedia' pageviews-20211021-automated > pageviews-20211021-automated-zhwiki
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep '^zh.wikipedia ' pageviews-20211021-user > pageviews-20211021-user-zhwiki
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep ' Cat ' pageviews-20211021-spider-zhwiki
zh.wikipedia Cat 7535498 desktop 1 R1
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep ' Cat ' pageviews-20211021-user-zhwiki
zh.wikipedia Cat 7535498 desktop 2 A1G1
zh.wikipedia Cat 7535498 mobile-web 3 K1L2
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ grep ' Cat ' pageviews-20211021-automated-zhwiki
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ # 2 user visits for desktop, 3 for mobile-web, 1 spider desktop visit => 6 visits in total
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$ hive --database=wmf
[...]
hive (wmf)> select sum(view_count) from pageview_hourly where year=2021 and month=10 and day=21 and project='zh.wikipedia' and page_title='Cat' limit 1;
Stage-Stage-1: Map: 68  Reduce: 1   Cumulative CPU: 2069.72 sec   HDFS Read: 6321078164 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 34 minutes 29 seconds 720 msec
OK
_c0
6
Time taken: 38.457 seconds, Fetched: 1 row(s)
hive (wmf)>
[urbanecm@stat1005 ~/tmp/zhwiki-pageviews-issue]$

Maybe the job loading data into the API (I think API uses Cassandra internally?) failed for some reason?

Event Timeline

Shizhao reopened this task as Open.
Shizhao claimed this task.
Shizhao removed Shizhao as the assignee of this task.
Shizhao added a project: Chinese-Sites.
Shizhao subscribed.
odimitrijevic moved this task from Incoming to Ops Week on the Analytics board.

Hi, I reran this job, and I believe it succeeded, and Cat is showing pageviews now. Does this look right to you? If so, can we resolve?