Page MenuHomePhabricator

arywiki view stats too low for agent = user?
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue:

  • See the page views stats for "agent = user" in this link
  • Compare to the page views stats for all agents in this link

What happens?:
The page views seem relatively stable for all agents accross the last few months (January and February 2024), but somehow dip significantly in the last two months when only (human?) users are taken into consideration.

What should have happened instead?:
Since I don't know how the stats are collected, I don't know what to expect exactly. Are some user views currently miscategorized as automated views? Or maybe the opposite in the past, and the current views are more accurate? Do "user" views include or exclude registered bots? etc.

Event Timeline

Maurusian updated the task description. (Show Details)

@Maurusian thank you for reporting this. We will investigate and report back here on what we find.

After some preliminary review by @nshahquinn-wmf and @Mayakp.wiki we would like to observe the trend for another two months in order to more confidently determine if the drop is due to a real drop in user page views or an issue with the data itself. Since the pageview rate returned to roughly the October 2022 range, it is likely a real drop in user page views, but we need more time to pass in order to analyze the trend.

Thank you, again, @Maurusian for bringing this to our attention.

@Mayakp.wiki and @nshahquinn-wmf added this to Movement-Insights so you can check for data quality in May or June

@Maurusian I am removing the wikistats tag on this task since this is a data quality question. Please let me know know if you have questions about this. Happy to answer them!

This appears to be a pattern across many wikis:

wiki2024_Q1_human2023_Q1_human2024_Q1_spider2023_Q1_spider%_human%_spider2024_Q1_all2023_Q1_all%_all
fiu-vro.wikipedia.org2853578097492138715164882335.24129.712424072245857298.6
nso.wikipedia.org2531336929791721612121355636.53141.8719747451906535103.58
nah.wikipedia.org2665906758802208959226747039.4497.422475549294335084.11
mai.wikipedia.org64918114888774326748248702943.6173.9749759293975906125.15
stq.wikipedia.org2112134399211835801133079848.01137.9520470141770719115.6
bat-smg.wikipedia.org3507817256023406390179936248.34189.3137571712524964148.8
ary.wikipedia.org79889416213983696494218962149.27168.8244953883811019117.96
ay.wikipedia.org2045023998751673294132751251.14126.0518777961727387108.71
wa.wikipedia.org3780147381273344979215954251.21154.8937229932897669128.48
arc.wikipedia.org172872330305114221088682952.34128.813150821217134108.05
nap.wikipedia.org3545866731614355198263655152.67165.1947097843309712142.3
smn.wikipedia.org2837975383921916947164454352.71116.5622007442182935100.82
nrm.wikipedia.org2348914444071957523143913252.85136.0221924141883539116.4
cbk-zam.wikipedia.org213881403864126604670079252.96180.6614799271104656133.97
gn.wikipedia.org4284538028692879505181352153.37158.7833079582616390126.43
nds-nl.wikipedia.org55976610476452832503221851453.43127.6833922693266159103.86
rm.wikipedia.org1774813308131708058116674353.65146.418855391497556125.91
eml.wikipedia.org3953377287543358611220688454.25152.1937539482935638127.88
dsb.wikipedia.org2363684332883259474239679354.55135.9934958422830081123.52
lij.wikipedia.org4317967904983720493rOPUP257164956e6d54.62144.6741522893362147123.5

(table generated using the PAWS service, see source code)

One can see in the %_human column that on many wikis, the human traffic went down significantly, but according to the %_all column, the overall traffic remained the same more or less. There is quite a few of projects affected – did we improve spider detection? Something else?

After some preliminary review by @nshahquinn-wmf and @Mayakp.wiki we would like to observe the trend for another two months in order to more confidently determine if the drop is due to a real drop in user page views or an issue with the data itself. Since the pageview rate returned to roughly the October 2022 range, it is likely a real drop in user page views, but we need more time to pass in order to analyze the trend.

Thank you, again, @Maurusian for bringing this to our attention.

Hello @VirginiaPoundstone thanks for the feedback. The pageviews by human users certainly returned to October 2022 range, but the strange part is that the total views are actually in line with Q4 of 2023. That's why I suspected a miscategorization issue, either in the past or today. The results provided by @Urbanecm_WMF show that this is a potential issue on several Wikipedias. 117 of them had at least a 20% drop in human views, whereas total views either remained stable or increased.