Page MenuHomePhabricator

en.wiki slow to respond when editing, and occasionally throws an error with Chrome search shortcuts, or blocked because missing HTTPS
Closed, ResolvedPublic

Description

These might be 3 separate issues. I'm reporting them together because this behavior is all happening within the last 6 hours or so.

en.wiki slow to respond
  • When editing any page, after using Visual Editor templates for citation and/or links, the page slows to a crawl, and simple actions like typing take several seconds to respond.
occasionally throws an error with Chrome search shortcuts
blocked because missing HTTPS
  • When trying to visit wikipedia.org, where normally I get an instant and direct connection to the portal, Chrome says the site does not use HTTPS. After a moment, or a hard refresh, it resolves. This happens on other trusted HTTPS sites from time to time, too, so normally I wouldn't think twice, but I'm listing it here given the other performance issues I've encountered.

Environment: macOS 12.6 on a WMF-issued 16-inch M1 Max

Event Timeline

Have these issues been happening for a while or only recently in the last hour or so, WikiMedia Status does show a spike in errors about half an hour ago

RhinosF1 triaged this task as Unbreak Now! priority.EditedJan 8 2023, 8:54 AM
RhinosF1 added subscribers: Intodesa, RhinosF1.
RhinosF1 added subscribers: SD_hehua, Stang, Ericliu1912.

There’s a marked spike in errors and a flurry of icinga alerts. Can someone please look?

07:37:20 <icinga-wm> PROBLEM - BGP status on cr2-eqsin is CRITICAL: BGP CRITICAL - No response from remote host 103.102.166.130 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
07:37:46 <icinga-wm> PROBLEM - BGP status on cr3-eqsin is CRITICAL: BGP CRITICAL - No response from remote host 103.102.166.131 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
07:38:24 <icinga-wm> PROBLEM - Host cp5024 is DOWN: PING CRITICAL - Packet loss = 100%

(and more hosts + mgmt)

07:39:18 <jinxer-wm> (ProbeDown) firing: Service text-https:443 has failed probes (http_text-https_ip6) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
(LOTS OF FLAPPING ALERTS, Pybal, hosts down, DNS errors)
07:50:20 <jinxer-wm> (ProbeDown) resolved: (5) Service text-https:443 has failed probes (http_text-https_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#text-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown

I can’t see any actual page firing in klaxon and no public response from SRE yet.

Joe lowered the priority of this task from Unbreak Now! to Low.Jan 8 2023, 9:05 AM
Joe removed a project: Wikimedia-Incident.
Joe subscribed.

This task reports continuing issues for hours, is unrelated to the issue that happened about 1 hour ago and that was handled by SRE

I can report today that while the site is still slower than I'd expect, there was a marked improvement. OK to close this for now and re-open, as needed (or as I can replicate better).

larissagaulia claimed this task.