Page MenuHomePhabricator

[REQUEST] WMTW facebook page want to have zhwiki yearly-aggregate top 10 articles viewed in 2016.
Closed, ResolvedPublic


What's requested:

  • (en) Wikimedia Taiwan facebook page editor would like to request the year-aggregate(2016/1/1-12/31) top 10 most viewed pages on Chinese Wikipedia.
  • (zh-hant) 台灣分會想知道中文維基百科2016-1-1到12-31全年總閱讀量的前十個條目

Why it's requested:

  • (en) The statistics data related to Chinese Wikipedia is always hot topic to post on social media. It would be a nice piece of material if we can get the raw data and post it on our social media for making it viral.
  • (zh-hant) 中文維基百科相關的統計數據是社群媒體上很有趣的話題,如果有相關的數據,我們可以在社群媒體上發布,引起討論。

When it's requested:

  • (en) As soon as possible, it would be still a trendy topic during the first week of 2017, before the second week.
  • (zh-hant) ASAP,在1月第二週前能夠出來,都還可以操作。

Other helpful information:

Event Timeline

I was wondering if anyone could gather such data myself and find out how to.
So I searched for "wiki page views".
One result was
That page links to .
That page has a link "topviews" to .
On Top Views, unfortunately "Date type" does not offer "Yearly" but only "Monthly".
So if gathering the data through some UI is possible through some other way, it seems to not be very obvious how. :(

@Shangkuanlc : Query for the top 100 is running, will post the result here once it has completed. As mentioned earlier, it will need some manual cleanup to extract the top 10 articles (essentially just removing non-mainspace entries; which would be tricky to do within that database query).
@Aklapper: Thanks for your research - FYI, this task came out of a Facebook discussion where I had already noted that this particular question is probably easiest to answer via a direct query of the - internal - pageview_hourly database. It might indeed be worth adding a "yearly" option to as this is a recurring question (I ran a similar query for the WMF Communications team last year).

@Tbayer & @Aklapper : Thank you for the instant response. My feedback would be double yes. Yes we can manually screen the top 100, and yes the yearly button would help greatly if it is possible.

The query result is below (it actually took less than an hour to complete). Looks like entertainment topics were popular. This should be enough information for you to generate the actual top 10 articles list, by restricting to mainspace (and also removing the entry for the minus sign page, which does not correspond to real views for that page, as explained e.g. in a recent thread on Analytics-l). Let me know in case there are further questions.

SELECT CONCAT('',page_title), SUM(view_count) AS views
FROM wmf.pageview_hourly
   year = 2016
   AND project = 'zh.wikipedia'
   AND agent_type = 'user'
GROUP BY page_title

_c0     views首页 78681290搜索 31770108 19893954链接搜索 7752836 6697490太陽的後裔 5273920瑯琊榜_(電視劇) 3824878我是歌手_(第四季) 3697336甘味人生 3528805芈月传 3326844馬惜珍家族 3264002马澄坤 3136751 3032802宋仲基 2992646月之戀人-步步驚心:麗 2619959监视列表 2531415你的名字。 2379038一念間 2324335用户登录 2312893 2295200電視劇) 2146014終極一班4 2140015防彈少年團 2138730雲畫的月光 2125985奔跑吧兄弟 2117197綜藝玩很大 2115774從零開始的異世界生活 2098864女医·明妃传 1821190 1805659愛上哥們 1737683周子瑜 1659986一把青_(電視劇) 1657532城寨英雄 1612823六四事件 1612169我的老師叫小賀 1594128霍建華 1576432微微一笑很傾城 1542096习近平 1538065後菜鳥的燦爛時代 1530614胡歌 1518953自殺突擊隊 1513404 1500824 1488818我的極品男友 1485012朴寶劍 1478184首页 1474260 1439116蔡英文 1429324年美國總統選舉 1407609唐納德·川普 1407251我和我的十七歲 1385721任意依戀 1379579殭 1373667宋慧喬 1369708節目列表 1349599 1347262年Running_Man節目列表 1319260精灵宝可梦系列 1317906藍色海洋的傳說 1314804又,吳海英 1280061孤單又燦爛的神-鬼怪 1264338名偵探柯南動畫集數列表 1247413春花望露 1243772中華民國 1236217請回答1988 1231507臺灣 1180789玖壹壹 1160423第88屆奧斯卡金像獎 1150202花千骨 1147331捕鼠器裡的奶酪 1141054火影忍者 1135583死侍 1128138年夏季奧林匹克運動會 1115120暗殺教室 1113045青云志 1109497中华人民共和国 1109089飛魚高校生 1104828朴信惠 1093430 1078969李鍾碩 1073952無限挑戰 1071005年中華民國立法委員選舉 1062655滾石愛情故事 1054103趙麗穎 1050169 1047989嫉妒的化身 1041020江泽民 1035924狼王子 1026486逃避雖可恥但有用 1012396香港 1008059幕後玩家 999882我們結婚了 999003植劇場 998543必娶女人 994676日本 992883周杰倫 988354我是歌手 979544黃致列 976600 970506我的少女時代 969338

100 rows selected (2830.778 seconds)
Beeline version 1.1.0-cdh5.5.2 by Apache Hive

(NB: For convenience I included the link to the desktop version for each page, but the numbers refer to the aggregate pageviews for desktop, mobile web and apps.)

Offtopic: @Aklapper, since you are here, any ideas why the string "1073952" in the previous comment auto-links to ? Is this a bug in Phabricator?

Thanks you. I am forwarding this to Reke. The related post will be shown in facebook page later this week.

(For the archives: the resulting FB post is here.)

Also, here is a FB video broadcasting also used this data (“一直剝維基, Keep Peeling Wiki", a livecast talkshow the chapter launched since December, 2016) , see 1:09-11:00.