Page MenuHomePhabricator

Unique Devices seasonal trends on small projects
Open, LowPublic

Description

While calculating the EU unique devices, We have observed a trend with data over a 5-year period, where we see a seasonal decline in the first half of the year, in Unique Devices for smaller project families such as Wikisource, Wikibooks, Wikiquote, Wikiversity and Wikinews.

from @MGerlach : I suspect the seasonal spike during Nov/Dec in the unique devices is an artifact of something but not genuine. looking at one of the projects (wikibooks but I observe the same with mediawiki or wikisource) and grouping unique devices by country, the main spike comes from Germany (DE), where each November there are around 30-40M unique devices accessing these smaller projects. this corresponds to roughly half the population of Germany (80M), which would be great but seems unrealistic. the other countries dont seem to experience similar spikes in unique devices. so I dont think this is related to campaigns or so, but rather that there might be something wrong with the numbers.
here is the query: https://superset.wikimedia.org/superset/explore/p/zkVb5gzQPLR/

Event Timeline

I start to suspect that this might be related to how the number of unique devices is aggregated over different projects (domains) into a project family:

@MGerlach : I was going thru the Unique devices documentation, and wondering if this is due to Redirects - https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#Redirects, which is a caveat.

@MGerlach : I was going thru the Unique devices documentation, and wondering if this is due to Redirects - https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#Redirects, which is a caveat.

I think you are right.
This matches the observation when looking at pageviews.

  • number of unique devices (per project family) in wikisource from germany (superset-query from above) showed spikes above 40M
  • number of pageviews to wikisource from germany (superset-query) stays below 10M even when adding all agent-types (user, automated, spider). so we dont have enough pageviews to account for all the unique devices.

In the unique-devices documentation, it mentions that

Our per-domain computation filters 301/302 requests (as those are not pageviews). That works well in the per-domain case, as the cookie is set on the 200 response. But it doesn't work for the global domain uniques calculation, as the cookie is being set "earlier".

this suggests that the unique devices per family computation considers 301/302 requests which do not appear in the unique devices per domain data nor in the pageviews data. To me it still seems that this approach somehow inflates the unique devices per project family because the numbers seem unrealistically large; for example for the wikisource/Germany pair.

another thought: given that these spikes occur in November each year, could it be related to banners from fundraising? I think this is the typical time window for their campaign. could the banners lead to the types of redirects that would cause such spikes in unique devices?

another thought: given that these spikes occur in November each year, could it be related to banners from fundraising? I think this is the typical time window for their campaign. could the banners lead to the types of redirects that would cause such spikes in unique devices?

@Mayakp.wiki I think the spikes in unique devices are indeed caused by the fundraising banners in German (mostly German Wikipedia). the closing of the banner on German Wikipedia seems to trigger a 302-request to some of the other projects (wikisource etc) which in turn artifically inflates the unique devices per project family. From what I understand the banners in Germany are independently run by WMDE; and thus different from the fundraising banners in other languages run by WMF which might explain why we see these spikes only from Germany.

Here is how I came to this conclusion:

  • in the query in the task description (unique devices in wikisource from Germany), you can see that there is a substantial increase in July 2023 (~4M) compared to June or earlier (~0.5M)
  • I checked the calendar in centralnotice and it seems that WMDE started to run fundraising-banners in early July (but not June or earlier) such as C23_WMDE_Desktop_DE_01_2. those have a low sampling rate (3-5%) which would explain why we see a smaller increase than the big spikes in Nov/Dec when sampling rate is much higher.
  • I checked the webrequest-logs for 301/302 requests to wikisource from Germany. I took a closer look into those requests in July (when there were banners):
    • in most cases the uri-query for the request in wikisource is: ?title=Special:HideBanners&duration=604800&category=fundraising&reason=close this suggests to me that the request comes from closing a fundraising banner
    • in most cases the referer is: https://de.m.wikipedia.org/ this suggests that the requests come from German Wikipedia suggesting that the banners are German fundraising banners.
    • comparing a single day in June (no banner) vs July (banners), I find a 10-fold increase of the number of those requests

I dont know enough about banners to understand why this is happening or how this could be addressed.

Thank you @MGerlach for the investigation! I think this is sufficient to prove that this is being caused due to banners. In order to fully understand

the closing of the banner on German Wikipedia seems to trigger a 302-request to some of the other projects (wikisource etc) which in turn artifically inflates the unique devices per project family

Im wondering if we can tag someone from CentralNotice to look into this weird behaviour? i'll reach out for suggestions.

Mayakp.wiki added a subscriber: Pcoombe.

@Pcoombe : Can you please help us with an issue we're noticing which is possibly due to how CentralNotice behaves?
We observed inflated numbers for unique devices in smaller projects like Wikisource during particular months of the year and the reason for that seems to be that the closing of the banner on German Wikipedia seems to trigger a 302-request to some of the other projects (like wikisource etc.) . we were hoping to validate if this issue is indeed happening due to CentralNotice and then open a new task for your team to fix, if need be.

pls let us know if there is a process we should follow to request help from your team.

Thank you @MGerlach for the investigation! I think this is sufficient to prove that this is being caused due to banners. In order to fully understand

the closing of the banner on German Wikipedia seems to trigger a 302-request to some of the other projects (wikisource etc) which in turn artifically inflates the unique devices per project family

Im wondering if we can tag someone from CentralNotice to look into this weird behaviour? i'll reach out for suggestions.

One option could be to reach out to folks from WMDE since this seems to be mainly coming from fundraising banners in Germany? The centralnotice-calender lists Till Mletzko (WMDE) as point-of-contact for the German fundraising campaign banners.

Hi @tmletzko, Can you please help us with an issue we're noticing which is possibly due to how CentralNotice behaves?
We observed inflated numbers for unique devices in smaller projects like Wikisource during particular months of the year and the reason for that seems to be that the closing of the banner on German Wikipedia seems to trigger a 302-request to some of the other projects (like wikisource etc.) . we were hoping to validate if this issue is indeed happening due to CentralNotice and then open a new task for your team to fix, if need be.

pls let us know if there is a process we should follow to request help from your team.

Hi @Mayakp.wiki, we will take a look at this and get back to you asap. Thanks for reporting!

Sorry for the delay in responding here. Yes it certainly sounds like this is related to the Special:HideBanners from CentralNotice. Note this isn't just for German fundraising banners, any banner which uses the standard mw.centralNotice.hideBanner() on closing will trigger the same requests. WMF fundraising banners are an exception, in those we're using a custom function which only attempts to set a cookie on the wikipedia.org domain (since we don't currently show banners on any other projects)

See also T117433: Spike: Investigate alternatives to Special:HideBanners cookie storm for cross-domain banner close-button and T244699: EPIC: Find alternative to Special:HideBanners cookies to mitigate the loss of 3rd-party cookie support. I think in most cases these attempts to set cookies aren't doing anything now, since browsers block them. We should decide if we want to change the behaviour of mw.centralNotice.hideBanner

Adding Fundraising Tech (not actually my team) for input

One thing we can do for now with the German fundraising banners is stop calling hideBanner() then manually store the cookie and only report to the //en.wikipedia.org/w/index.php?title=Special:HideBanners tracker in our close event handler.

I'll look into it this morning and stick a test banner up.

Mayakp.wiki added subscribers: odimitrijevic, Milimetric.

cc: @odimitrijevic , @Milimetric
tagging Data-platform-engineering as FYI. Once this is fixed by the WMDE Fundraising team we would expect to see a drop in the unique devices to wikisource from Germany (unique_devices_per_project_family_monthly table).

@AbbanWMDE I tested on aawiki. Looks good to me.

@CorinnaHillebrand_WMDE , can you please confirm if this is completed? and when was it deployed?

@Mayakp.wiki Our current (and future) live banners don't call hideBanner() anymore. Two of four banners were updated on September 14th, the other two changed on September 18th. You should see a drop in the numbers.

Thanks for confirming @kai.nissen !
I checked our dashboards and can see a drop in the unique devices from Germany https://superset.wikimedia.org/superset/explore/p/DJVXakAnYMw/
@MGerlach : I see a rise in unique devices in India. corresponds with a banner that ran in India between 9/1 and 9/20 https://meta.wikimedia.org/wiki/Special:CentralNotice

Wanted to note here that the smaller project families that were seeing increases during December every year did not see an increase in 2023-12 after this issue was fixed. see chart

image.png (805×1 px, 284 KB)

we will monitor the data for 6 months until we can conclude that the data is stable and that there are no declines in the first half of the year (and that this was merely a perception due to the redirects increasing unique devices during the second half of the year).