see https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/02
and
https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02
in metrics, Top 1 and 2 of article is "wiki" and "維基媒體基金會", but in feed they are not.
see https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/02
and
https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02
in metrics, Top 1 and 2 of article is "wiki" and "維基媒體基金會", but in feed they are not.
Because they are different APIs showing different content :)
Please re-open if you need more info (or just follow up in the #wikimedia-analytics IRC channel)
AFAIK, MCS manipulates the data received from the Pageviews API, so perhaps that's why there is a difference? Could MCS people take a look?
MCS does some filtering.
Also I should mention that MCS responds with the previous days data. So, you'd want to compare https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02 with https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/01. That is because it's part of the feed data calculated on 2017/08/02. At that time (shortly after 00:00 UTC) it only knows what the top pages of the previous days were. It cannot predict the current day's top viewed articles.
I think @bearND explained it pretty well, data from pageview API is subjected to fluctuations due to bot traffic (not self-reported as such and thus labelled as "user") and also it includes Main_Page, which is not an interesting bit . Android app massages that data and creates a more interesting feed.
"Wiki" and "維基媒體基金會" not main page
- It removes pages that are not in the main-namespace. (We mainly want to get rid of Special pages. One could argue that Category pages might be OK. That might be up for discussion.)
"Wiki" and "維基媒體基金會" is main-namespace
- It removes pages where we think the traffic is bot-inflated.
only user data
Also I should mention that MCS responds with the previous days data. So, you'd want to compare https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02 with https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/01. That is because it's part of the feed data calculated on 2017/08/02. At that time (shortly after 00:00 UTC) it only knows what the top pages of the previous days were. It cannot predict the current day's top viewed articles.
https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02:
No1.. "楚乔传": 2017-08-01Z views 53127
No2. "賭城群英會": 2017-08-01Z views 22306
......
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/01
No1. "Wiki": 66134
No2. "維基媒體基金會": 61413
No3. "楚乔传": 53127
No4. "賭城群英會": 22306
......
but in "https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02", not "Wiki" and "維基媒體基金會" data?
and see
https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/03
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/02
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/03
same problem.
"Wiki" only user data (no bot), in 20170801 views data also 66134
"維基媒體基金會" views data also 61413 in 20170801.
@Liuxinyu970226 The views data of the same article is the same in same day, but one of the sources lacks data of "Wiki" and "維基媒體基金會".
If the answers above can't help you, then I'm not sure that what's your aim. Here is a @MusikAnimal tool called Tool-Pageviews where its FAQ says
Why can't I view data for today's date?
The Wikimedia pageviews API generally takes a full 24 hours to populate, sometimes longer. In some situations you may see data missing for yesterday's date as well, which will be left blank rather than showing a count of zero views.
Since you 2nd re-open, should I consider that you want to make such populate dynamically?
@Liuxinyu970226 I mean same data of the same day, but lacks data of "Wiki" and "維基媒體基金會".
https://zh.wikipedia.org/api/rest_v1/feed/featured/2017/08/02 is 20170801 data (not "Wiki" and "維基媒體基金會" data), "楚乔传" is 53127
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/zh.wikipedia/all-access/2017/08/01 also is 20170801 data, "楚乔传" also is 53127
Although date of the url is not the same.(This is not yesterday's data! Yesterday is 20170807 )
@Shizao: Some pageviews that "appear" to be from users are actually from bots, some of those are removed on feeds endpoint but not on pageview api endpoint. That is the case for "wiki" and likely for "維基媒體基金會" (wikimedia foundation)
So those diffs are probably the reasons that both can't shown on feeds, but shown on pageviews (note that "維基媒體基金會", a zh-hant title, is a redirect to zh-hans "维基媒体基金会"):
diff | username/bot name |
39703037 | Antigng-bot (@Antigng) |
42891150 | 59.126.69.221 |
42891194 | 星耀晨曦(@RazeSoldier) |
43675263 | 小躍(@EagerLin, afair) |
43831579 | 2607:fb90:405:f60e:e7c3:9984:ce81:60a5 |
43831659 | @Tegel |
44175497 | @Zhanglide |
44294167 | @RabbitMeow |
45046375 | @Fxqf |
45231054 | InternetArchiveBot (@Cyberpower678) |
45298628 | @John123521 |
45298778 | @John123521 |
45463533 | 126.212.84.78 |
45463675 | @A2093064 |
42658624 | Yinweichen-bot (@Yinweichen) |
44703655 | InternetArchiveBot (@Cyberpower678) |
45047212 | @南极熊 |
At least one of non-bot edits that listed here are considered as (fake?) bot edits due to some reasons (they use AWB? Wikiplus? 3rd party scripts?)
Actually yes, except those which are rarely active. As using 2017-08-07 datas for example
...
Due to those examples, I don't think that this task should remain open. It's expected that main pages and special pages are considered as articles in pageviews but not in feeds, and there's nothing to be resolved.
Also @Shizhao your comments are still foggy, at least
It removes pages where we think the traffic is bot-inflated.
only user data
Haven't you seen T172379#3510994? there's at least Antigng-bot and InternetArchiveBot available and you say those are "only user data"?!
Also @Shizhao your comments are still foggy, at least
It removes pages where we think the traffic is bot-inflated.
only user data
Haven't you seen T172379#3510994? there's at least Antigng-bot and InternetArchiveBot available and you say those are "only user data"?!
rank | article | views data (source 1) | views data (source 2) | views data (source 3) |
---|---|---|---|---|
#1 | Wiki | 66134 | none | 66134 |
#2 | 維基媒體基金會 | 61431 | none | 61431 |
#3 | 楚乔传 | 53127 | 53127 | 53127 |
#4 | 賭城群英會 | 22306 | 22306 | 22306 |
#5 | 河伯的新娘_2017 | 18716 | 18716 | 18716 |
#6 | 三生三世十里桃花_(电视剧) | 17094 | 17094 | 17094 |
#7 | 超時空男臣 | 16636 | 16636 | 16636 |
#8 | 劉文雄_(基隆市) | 16248 | 16248 | 16248 |
#9 | 浪漫醫生金師傅 | 14282 | 14282 | 14282 |
#10 | 稍息立正我愛你 | 13872 | 13872 | 13872 |
#11 | 終極三國_(2017年) | 12413 | 12413 | 12413 |
#12 | 孤單又燦爛的神-鬼怪 | 11040 | 11040 | 11040 |
Why just source 2 lacks data of #1 and #2, and all other data is the same? At least source 2 and source 3 are data of only user.
Also @Liuxinyu970226
..... ....
Your comparison is wrong. The first link data from 20170807, and the second link data from 20170806.
@Shizhao I can only reply one question for now
Why just source 2 lacks data of #1 and #2, and all other data is the same?
T172379#3509744 from nuria
if just remove some bot data. see "wiki", the views data is 66134 ( pageview api endpoint rank #1, include some bot data), "六四事件" is 6621 (feeds endpoint TOP30 rank #30, remove some bot data, "wiki" not in TOP30). so 66134 - 6621 = 59513. Does this mean that there bots has visited at least 59k times "wiki" page in 20170801? Is it really? 59k bot views data, I really can't believe it. Who can provide details of this 59k bot data? Just a bot, or a lot of bot?
I doubt the feed algorithm considers pageviews when removing a page and very likely pages that show up a lot (over and over) regardless of traffic will not get displayed.
Now, that being said the "wiki" article has a lot of bot traffic not identified as such, this user agent below is responsible for a lot of pageviews (~60.000) on August 1st.
{"browser_major":"9","os_family":"Windows 7","os_major":"-","device_family":"Other","browser_family":"IE","os_minor":"-","wmf_app_version":"-"}
thx! It seems that "wiki" data was not normal for the last month: https://tools.wmflabs.org/pageviews/?project=zh.wikipedia.org&platform=all-access&agent=user&range=latest-60&pages=Wiki
"維基媒體基金會" seems to be?
Sorry but I do not understand what is the question, I think we have provided quite a bit of info about why things are different in those endpoints and will be closing ticket shortly.