We were notified that https://gerrit.wikimedia.org/r/1197694 broke Telegram previews for Wikipedia: https://w.wiki/FyaH
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Krinkle | T214998 RFC: Serve mobile and desktop variants through the same URL (unified mobile routing) | |||
| Resolved | Krinkle | T405931 [Clean up] Redirect m-dot URLs to canonical domains | |||
| Open | None | T409575 Telegram previews broken since unified mobile routing |
Event Timeline
Looking at the hostname breakdown:
I notice Telegram exclusive makes connections to our m-dot hostnames. This is true historically, and after the Oct 7-8 unification, and after the Oct 23 m-dot cleanup.
- If their preview crawler was classified as a mobile browser, we should have seen at least some requests to the canonical link (when users share those in messages), that counted a HTTP 302 redirect before Oct 7, followed by an m-dot HTTP 200. We see none of that. It's all HTTP 200 and all directly to m-dot.
- If their preview crawler wasn't classified as mobile, we should similarly have seen at least some requests to the canonical, but there are none. This suggests Telegram forcibly rewrites your link to an m-dot link even when you share a canonical link (why?).
- Given that the Oct 7 unification did not change anything, and that there are no HTTP 200s after Oct 23, and no change in hostnames throughout, that means their crawler both forcibly rewrites your link and neglects to follow HTTP redirects.
The combination of these two factors makes a workaround non trivial. If they had a problem in the past (with the mobile redirect) we could have classified them as desktop and serve canonical HTML to their crawler without redirect. But, their forceful rewriting of URLs means they'd skip over such as fix.
Most Telegram client apps are open source, and from a quick glance I see no logic in any TelegramDesktop, tdlib, or TelegramMessager repos relating to link preview generation or changing things for Wikipedia. I know these previews are by default generated server-side (given the link preview traffic is attributed to IPs owned by the Telegram ASN), but I figured perhaps the URL rewrite was app-side.
I've reported this upstream at https://bugs.telegram.org/c/56126. They seem reasonably responsive (various recent issue reports marked as "Fixed" or "Fix coming"), so let's hope they fix their link preview server soon!
a quick check using https://en.wikipedia.org/wiki/Main_Page?vgutierrez=tg resulted in telegram bot visiting https://en.m.wikipedia.org/wiki/Main_Page?vgutierrez=tg and retrying after getting a 301 instead of following the redirect
Telegram's link preview service does follow redirects for other websites. I tried it on my personal domain, and later on people.wikimedia.org and both work fine. This suggests their non-following of redirects is specific to their Wikipedia m-dot hack.
Redirect 301 /~krinkle/Quux.html https://people.wikimedia.org/~krinkle/Banana.html?from=Quux
GET https://people.wikimedia.org/~krinkle/Quux.html HTTP/1.1 301 Location: https://people.wikimedia.org/~krinkle/Banana.html?from=Quux
krinkle at stat1011.eqiad.wmnet in ~ $ kafkacat -C -b kafka-jumbo1013.eqiad.wmnet:9092 -o -10 -t webrequest_frontend_text 2>/dev/null | fgrep 'people.wikimedia'
{ …, "hostname":"cp3068.esams.wmnet", "http_method":"GET", "uri_host":"people.wikimedia.org","uri_path":"/~krinkle/Quux.html","uri_query":"?a=7", "referer":"-", "user_agent":"TelegramBot (like TwitterBot)", … "http_status":"301", … } { …, "hostname":"cp3068.esams.wmnet", "http_method":"GET", "uri_host":"people.wikimedia.org","uri_path":"/~krinkle/Banana.html","uri_query":"?from=Quux", "referer":"-", "user_agent":"TelegramBot (like TwitterBot)", … "http_status":"200", … }
And indeed, previews work fine for Wiktionary and other Wikimedia projects. Such as https://en.wiktionary.org/wiki/box



