Page MenuHomePhabricator

Wikimedia projects not reachable for some Telecom Italia users
Closed, ResolvedPublic

Description

This is the public umbrella ticket for issues of Telecom Italia users who can't connect to Wikimedia infrastructure.

There is also T262613 but it's private because it has private IP addresses on it.

We will link individual reports to this umbrella ticket and want to make sure it can be found in public.

Traceroutes will stay in private pastebins.

Event Timeline

An update on my last known disposition of the issue:

  • It appears to be an intermittent problem; individual outages are not long in duration, but there have been multiple recurrences
  • The last time we had a report (from @Daimona and then @Superpes15) was on Thu, Sept 10, approx ~21:00 UTC +/- half an hour.
  • At the time, we had user TCP traceroutes that showed the path not completing after entering TI Sparkle's (AS6762) backbone network
  • We contacted both Cloudflare and TI Sparkle that day, after the outage was resolved, and they were not aware of any issue

Today we had reports of an issue from @Andyrom75 that was happening all the time on their Wind (AS1267) mobile connection, and was happening under some circumstances on their Vodafone (AS30722) connection, but we did not get a full traceroute or an IP address, so it's very hard to say what was going on or if the issue was related.

For anyone running into this, please follow https://www.mediawiki.org/wiki/How_to_report_a_bug#Reporting_a_connectivity_issue (but please note that this ticket is public so you may not want to post your IP and other personal data) - thanks!

For anyone running into this, please follow https://www.mediawiki.org/wiki/How_to_report_a_bug#Reporting_a_connectivity_issue (but please note that this ticket is public so you may not want to post your IP and other personal data) - thanks!

A better link -- please always pass this version to problem reporters -- is https://wikitech-static.wikimedia.org/wiki/Reporting_a_connectivity_issue

This is hosted on wikitech-static, which is on an independent hosting provider and network -- which anyone who is having connectivity issues almost certainly needs :)

but please note that this ticket is public so you may not want to post your IP and other personal data

If you are able to access Phabricator you can go to https://phabricator.wikimedia.org/paste/ and create a _private_ paste from there. Private pastes will keep your IP confidential but can still be included in public ticket comments. Subscribers of the private paste will be able to see the data and others will see the rest of the ticket but not the paste contents.

If you can't access Phabricator you can use https://share.riseup.net/ and paste the resulting URL in a private IRC window or send an email to noc@wikimedia.org.

Today we had reports of an issue from @Andyrom75 that was happening all the time on their Wind (AS1267) mobile connection, and was happening under some circumstances on their Vodafone (AS30722) connection, but we did not get a full traceroute or an IP address, so it's very hard to say what was going on or if the issue was related.

I have a personal laptop and a company one. Today I've tested both. Initially with my personal laptop everything works, but then stops. My company laptop works correctly.
Both are connected to the same WindTre ISP the difference is just on proxy.
My personal laptop access directly while the company one has a company proxy. Clearly if I disable the proxy, also my company laptop stops to work.

~15min ago connection has been restored. I'll test it again tomorrow.

~15min ago connection has been restored. I'll test it again tomorrow.

A blog also mentions that connectivity returned around 0.50. That might be a temporary respite as other users report they only have issues in rush hours, since a few days:
https://nitter.net/e_pagliarini/status/1304178420088217601
https://nitter.net/Ilfranck1/status/1305964937760706563

Today, all is fine. I would consider this issue closed and solved.

Today, all is fine. I would consider this issue closed and solved.

Thanks Andy, but we are going to keep it open and monitor it for a while now. There was another instance of it about 10 hours ago.

There was another instance of it about 10 hours ago.

Also right now it seems, at least from some TIM customer in Milan (in Wikimedia Italia's office).

There was another instance of it about 10 hours ago.

Also right now it seems, at least from some TIM customer in Milan (in Wikimedia Italia's office).

If you are in touch with the user, please ask them to follow the steps at https://wikitech-static.wikimedia.org/wiki/Reporting_a_connectivity_issue and email noc@wikimedia.org with their traceroute results and public IP address!

Yes, we're telling that to everybody (including to journalists who called WMIT, social media, internal mailing lists and colleagues). Did you get any information so far?

CDanis claimed this task.

After extensive investigation by one of our network connectivity providers, we believe that the cause has been discovered and fixed as of about 15:30 UTC today.

All the monitoring data we have from RIPE Atlas probes confirms this, and also we haven't received further user complaints as of that time (but have received a few "it works now" reports).

We'll prepare at least a lightweight incident report in the coming days.

Many thanks to all the users who submitted traceroutes and other debugging data! Resolving this issue would have been much harder otherwise.

If you do see further connectivity issues, please contact noc@wikimedia.org or #wikimedia-sre on Freenode IRC.

We'll prepare at least a lightweight incident report in the coming days.

Did this happen? I couldn't find it. Sorry if I looked in the wrong places.

(I realise others may have higher priority. For instance this one https://wikitech.wikimedia.org/wiki/Incident_documentation/20200814-isp-unreachable .)

We'll prepare at least a lightweight incident report in the coming days.

Did this happen? I couldn't find it. Sorry if I looked in the wrong places.

(I realise others may have higher priority. For instance this one https://wikitech.wikimedia.org/wiki/Incident_documentation/20200814-isp-unreachable .)

It did not -- it was waiting on a writeup from one of our connectivity providers.

We actually think we might be seeing some (likely smaller-scale) recurrences of the issue, from time to time. We're gathering more data and continuing to work with them on it.

Change 639949 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/dns@master] temporarily route Italy to codfw

https://gerrit.wikimedia.org/r/639949

Change 639949 merged by CDanis:
[operations/dns@master] temporarily route Italy to codfw

https://gerrit.wikimedia.org/r/639949