Page MenuHomePhabricator

Problem with delay caused by intake-analytics.wikimedia.org
Open, LowPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • I am building a list article with a large table in my sandbox.

What happens?:
When I save or preview the message "waiting for intake-analytics.wikimedia.org" is displayed, and Microsoft Edge says "Not Responding". After a delay of several minutes the save/preview is successful.

What should have happened instead?:
The save or preview should be almost instantaneous.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:Windows 10 with Microsoft Edge

Event Timeline

Reedy renamed this task from Problem with delay caused by input-analytics,wikimedia.org to Problem with delay caused by input-analytics.wikimedia.org.Nov 10 2021, 12:07 AM

Do you mean intake-analytics.wikimedia.org not input-analytics.wikimedia.org?

input-analytics.wikimedia.org is a NXDOMAIN, so it doesn't exist.

Reedy renamed this task from Problem with delay caused by input-analytics.wikimedia.org to Problem with delay caused by intake-analytics.wikimedia.org.Nov 10 2021, 12:17 AM
Reedy updated the task description. (Show Details)

yes, that is correct - sorry.

Sent from my iPad

@Downsize43: Does this also happen with another internet provider? Are there any add-ons or extensions installed in your browser?
Can you check the "network" tab of your web browser's developer tools for that call to intake-analytics.wikimedia.org?

Hi Aklapper,

My ISP is Telstra Australia. I do no have access to any other ISP.

The traffic from all Aust ISPs goes through the NBN.

This problem does not occur when I use Safari on iPad.

AFAIK there are no add-ons or extensions except Trend anti-virus.

I have no idea how to find the "network" tab.

Cheers,

John McGahan

This problem becomes more mysterious. It occurs on my HP laptop running Windows 10 and Microsoft Edge as supplied by HP, connecting to the internet by wifi.
It does not occur on my “old” desktop running MS Windows 10 and Edge, upgraded for free from earlier MS OS., and connected directly to the modem.
Regards,
John McGahan

Sent from my iPad

Same problem in ru.wikisource. When trying to create a page in a page namespace, you have to wait a very long time for this intake-analytics.wikimedia.org to finish. Impossible to work.

See also here at de.wiki.
Workaround is to block intake-analytics.wikimedia.org via uBlock Origin.

For me it's just annoying: T304426 but apparently this causes problems on production too. Why is there no opt-out?

This problem becomes more mysterious. It occurs on my HP laptop running Windows 10 and Microsoft Edge as supplied by HP, connecting to the internet by wifi.
It does not occur on my “old” desktop running MS Windows 10 and Edge, upgraded for free from earlier MS OS., and connected directly to the modem.
Regards,
John McGahan

Sent from my iPad

A guess: your wifi or associated router is crap, or your laptop has a low connection limit due to network configuration/firewall software. The request for intake-analytics pushes it over the edge and you're forced to wait for the timeout.

Crappy network performance just happens and the solution is for such low-importance requests (besides the ability to opt-out of them!) to happen after the page has loaded. It would also help if they were loaded from the same domain as the one you're visiting so you can use keep-alive for the connection. That could theoretically be realized by proxying the request from the project domain to intake-analytics.

The odd thing here is that it's (at least now, was this any different when the bug was filed?) a beacon: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon. It shouldn't block anything, so it could point to a software bug. IMHO that doesn't absolve website operators from responsibility though. And even if the immediate issues would be resolved, there should be a way (that doesn't require digging through the code!) to opt-out. Not having an opt-out for this doesn't suit Wikimedia. Or having it hidden so well that nobody can find it - I found a possible hint towards its existence in the code, but I have no idea how to use it.

As @AlexisJazz has said, analytics events are sent using the Beacon API. Requests sent using the Beacon API should not block the page unloading (before the browser navigates to the next page). If the analytics events are being sent using the Beacon API, then this could be a bug in Microsoft Edge.

However, if the Beacon API isn't supported or a browser extension is disabling the Beacon API, then the analytics events are sent using a "detached Image request". As noted in https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon#description, most browsers will block the page unloading for these requests.

The latter seems more likely to me but it doesn't explain why the request to https://intake-analytics.wikimedia.org is not resolving quickly but timing out (and that timeout being on the order of minutes).

The latter seems more likely to me but it doesn't explain why the request to https://intake-analytics.wikimedia.org is not resolving quickly but timing out (and that timeout being on the order of minutes).

I'm wondering if maybe the issue could be caused by an unresponsive DNS (in which case proxying from the project domain to intake-analytics would help, that's probably a good idea anyway), proxy that refuses the beacon or some web accelerator software.

This is very interesting to me because I'm currently investigating several issues related to intake-analytics.wikimedia.org and to a lesser extent, intake-logging.wikimedia.org.

These issues are:

Any evidence of requests to these endpoints timing out or resultng in 503 errors for the client, especially if they are repeatable, is something that could well provide a useful reference point for me to investigate further.

So would it be fair to say that the two test cases where intake-analytics.wikimedia.org appears to timeout and block the page loading are as follows?

  1. When editing a large table in Microsoft Edge on Windows 10 - but only some versions of Windows - as reported by @Downsize43
  2. When trying to create a new page in a page namespace on ru.wikisource.org - as reported by @Ratte

Unfortunately my German isn't good enough to translate the issue mentioned by @Torana on de.wikipedia.org

I'm really keen to see if I can get to the bottom of this issue, so please do share any other observations that you feel might be useful. Thanks also @phuedx for the information about the use of the Beacon API and the fall-back behaviour. That might indeed be useful.

Any evidence of requests to these endpoints timing out or resultng in 503 errors for the client, especially if they are repeatable, is something that could well provide a useful reference point for me to investigate further.

503? Reminds me of all the connection issues on beta cluster: T289029, T303160, T303165, T302699, T300525. Probably not related, I think, I assume intake-analytics doesn't run on the same stuff beta cluster runs on, but I don't really know.

Unfortunately my German isn't good enough to translate the issue mentioned by @Torana on de.wikipedia.org

I'm really keen to see if I can get to the bottom of this issue, so please do share any other observations that you feel might be useful. Thanks also @phuedx for the information about the use of the Beacon API and the fall-back behaviour. That might indeed be useful.

https://de.wikipedia.org/w/index.php?title=Wikipedia:Fragen_zur_Wikipedia&oldid=220783342#Hamster_m%C3%BCde?_und_was_ist_diese_Seite_intake-analytics.wikimedia.org?

Hamster müde? und was ist diese Seite intake-analytics.wikimedia.org?
Sind die Hamster müde? und was ist diese Seite intake-analytics.wikimedia.org, über die meine Anfrage geschickt wird, bevor es eine Minute oder so dauert, bis Wikipedia antwortet? --Jbergner (Diskussion) 10:37, 4. Mär. 2022 (CET)
War bis grade eben extrem langsam. Nicht nur bei uns. --Der-Wir-Ing ("DWI") (Diskussion) 10:43, 4. Mär. 2022 (CET)
Fun fact: uBlock Origin hat bei mir die Url intake-analytics.wikimedia.org geblockt, daher gibt es keine Probleme... --Magnus (Diskussion) 10:47, 4. Mär. 2022 (CET)
Danke für den Tipp - kann ich bestätigen. Ich hatte uBlock bisher für wikipedia.org deaktiviert. Nach der Sperrung von intake-analytics.wikimedia.org flutscht es wieder. --Zinnmann d 10:56, 4. Mär. 2022 (CET)

Translation:

Is the hamster tired? [this refers to a hamster wheel, jokingly implying that the WMF servers are powered by hamster wheels] and what is this site intake-analytics.wikimedia.org?
Is the hamster tired? and what is this site intake-analytics.wikimedia.org, over which my requests are being send, before it takes a minute or so, until Wikipedia answers? --Jbergner (Diskussion) 10:37, 4. Mär. 2022 (CET)
For a moment it was extremely slow. Not just for us. --Der-Wir-Ing ("DWI") (Diskussion) 10:43, 4. Mär. 2022 (CET)
Fun fact: uBlock Origin has blocked the intake-analytics.wikimedia.org for me, so it doesn't give any problems.. --Magnus (Diskussion) 10:47, 4. Mär. 2022 (CET)
Thanks for the hint - I can confirm. I had disabled uBlock for wikipedia.org. After blocking intake-analytics.wikimedia.org it's working smoothly. --Zinnmann d 10:56, 4. Mär. 2022 (CET)

@Downsize43 are you still able to replicate this problem with intake-analytics.wikimedia.org at all? I'd be really interested to find a reference case where intake-analytics:

a) takes a long time to respond
and
b) therefore prevents you from doing something

I've tried copying the table from your user page into my own and editing it with Microsoft Edge on Windows 11, but I can't get it to behave in the same way.
Many thanks.

Not sure what table of mine you used for your test. The problem remains
on my HP Laptop, but the delay is much reduced on some quick tests,
which may not be fully representative of the original cause of the problem.

The table in question is now contained in [[List of numbered roads in
Queensland]]. When I was creating this table I initially added  3 rows
(one screen full) for each save. As the table grew bigger the delay
became significant. As a means of finishing the task with less stress I
took to saving about 20 rows at a time, being happier with one long
delay than 6 shorter ones. These long delays gave me time for toilet
breaks and cups of coffee, often about 5 minutes.

Today I tested as follows:

  • Edit one row - 3 seconds
  • Copy the entire table to a sandbox - 6 seconds

*Delete 20 rows - 6 seconds

  • Re-add same 20 rows - 6 seconds.

As I said, not really reproducing the original circumstances, but still
showing a delay for the same reason.

Regards,

Downsize43