While investigating T299559 I found that for the number of unique-devices for the wikidata and wikimedia project-families are a lot bigger than the sum of unique-devices per-domain for all sub-domains of wikidata/wikimedia when the contrary is expected.
this seems to be due to the offset part of the unique-devices metric, which account for users having made a single request to the domain and therefore have no last-visited cookie.
Description
Related Objects
- Mentioned In
- T325544: Update refinery-source PageviewDefinition to better handle `Special:` pages
T299559: Wikistats reports no mobile unique devices for Wikidata and MediaWiki.org - Mentioned Here
- T276472: Odd behavior in unique device counts
T325544: Update refinery-source PageviewDefinition to better handle `Special:` pages
T299559: Wikistats reports no mobile unique devices for Wikidata and MediaWiki.org
Event Timeline
@JAllemandou thank you for finding this! What do you have in mind for Product Analytics to investigate? I don't think we have much understanding of the inner workings of unique device counting, so I'm not sure we will be able to help much.
@odimitrijevic and I discussed the priority for this and do not think it should be prioritized above current work. We're not regularly reporting on Wikidata unique devices, in part because we are aware that the data and definitions should be further explored.
Data Engineering are the stewards for the existing unique devices definition. At some point in the future, we would like to revisit the definition and measurement of unique devices to account for changes in technology that may be impacting the measurement of unique devices. Product Analytics would lead the process in partnership with Data Engineering and become the stewards for future definitions. However, we do not currently have the capacity to take this on.
@kzimmerman let's discuss prioritizing. A significantly larger overcount may exist for the wikimedia project family.
Investigation results:
The overcount affecting unique_devices_per_project_family when compared with unique_devices_per_domain is due to an issue with how we check if webrequests are Special: pages or not mixing up with using only pageviews versus pageviews+redirect-to-pageviews.
The impact on wikidata is not huge, per_domain and per_project_family values have the same order of magnitude (a lot of Special:CentralAutoLogin pages).
The impact on wikimedia project-family is huge: multiplied by 20 between per_domain daily and per_project_familly daily (a BIG lot of banners on metawiki).
BUT: The wikimedia project family is not relevant as is, and should only be provided through per-domain for projects such as commonswiki - we remove the wikimedia project-family from the numbers we publish to the public.
For the Special: pages classification problem a new tasks has been created: https://phabricator.wikimedia.org/T325544
Thank you @JAllemandou! That explains things clearly. I have added the follow up work to the planning board.
I believe this is the same problem discussed in https://phabricator.wikimedia.org/T276472. Can they both be closed at the same time?
Another finding: the WMF-Last-Access-Global cookies are not set for wikimedia projects. So not only do we have wrong numbers for the offsets due to the Special: pages, but we also have wrong numbers for cookie-computed values.
This makes it even clearer: we shouldn't use project-family numbers for the wikimedia family!
I suggest we remove the row altogether from the data with a comment.
This makes it even clearer: we shouldn't use project-family numbers for the wikimedia family!
I suggest we remove the row altogether from the data with a comment.
Thanks @JAllemandou for getting to the bottom of this! So our top-level metrics then would be wikipedia, wiktionary, commonswiki etc.? And none of the alternatives make sense:
- Filter out the projects lacking WMF-Last-Access-Global cookies from the top-level wikimedia metric -- but maybe this would make the metric misleading because it'd be missing major wikimedia projects?
- Add support for WMF-Last-Access-Global on those wikis? But maybe this is more complicated than I think or wasn't set for a good reason on those wikis?
I'm assuming the challenge with Special: pages is fixable. For the record, I think that's reasonable to drop wikimedia if the technical issues aren't easily surmountable. It's tempting to report a wikimedia unique devices metric and I feel like I often see requests from Comms etc. for this sort of number. But in reality I assume if you're directly visiting a wikimedia site, you're likely viewing Wikipedia at some point that month too and so it doesn't really give us much new information to also add in wiktionary, commonswiki, etc. and as you're pointing out, opens up the opportunity for bugs and misleading data as we double-count devices.