Page MenuHomePhabricator

Images served with text/html content type
Open, NormalPublic

Description

I do not know if this an artifact of these images being proxy-ed by googleweblight but images such us: https://commons.wikimedia.org//wiki/File:Arm_muscles_back_numbers.png

Which should be of content-type: image/webp

Appears on webrequest data as text:

cp1081.eqiad.wmnet 1938530061 2019-07-01T02:17:56 7.66E-4 66.102.8.231 hit-local 200 21344 GET commons.m.wikimedia.org /wiki/File:Arm_muscles_back_numbers.p
ng text/html; charset=UTF-8 - NULL Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome
/some Mobile Safari/535.19 pt-BR ns=6;page_id=39031;https=1;nocookies=1 - true 0.0.21 some {"city":"Unknown","subdivision":"Unknown","longitude":"-97.82
2","timezone":"America/Chicago","country_code":"US","country":"United States","latitude":"37.751","continent":"North America","postal_code":"Unknown"} cp1075 hit/1, cp1081 miss {"bro
wser_family":"Chrome Mobile","os_major":"4","wmf_app_version":"-","browser_major":"38","os_minor":"2","os_family":"Android","device_family":"Nexus 5"} {"page_id":"39031","ns":"6","nocookie
s":"1","https":"1"} 2019-07-01 02:17:56 mobile web user NULL none {"project_class":"wikimedia","project":"commons","qualifiers":["m"],"tld":"org","project_family":"wik
imedia"} {"language_variant":"default","project":"commons.wikimedia","page_title":"File:Arm_muscles_back_numbers.png"} 39031 6 ["pageview"] {"organization":"Google Proxy
","autonomous_system_organization":"Google LLC","isp":"Google Proxy","autonomous_system_number":"15169"} text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 text 201971

Event Timeline

Nuria created this task.Wed, Sep 11, 10:03 PM
Restricted Application added a project: Operations. · View Herald TranscriptWed, Sep 11, 10:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This has the effect that these images are being considered content pageviews when they are just asset requests

I think we need to add proxy=googleweblight to x-analytics

jbond triaged this task as Normal priority.Thu, Sep 12, 11:06 AM
jbond added a subscriber: jbond.

We need to add googlewblight to the proxy list to make sure it is treated appropriately, i think misc/trusted_proxies.json is outside my boundaries so possibly @BBlack or @ema can do it.

cc @Ottomata just in case he can do the change too

Nuria updated the task description. (Show Details)Thu, Sep 12, 11:02 PM
jijiki added a subscriber: jijiki.Fri, Sep 13, 3:06 AM

The URL mentioned at the top isn't a media URL, it actually is HTML content and is a pageview. Try it in your browser: https://commons.wikimedia.org//wiki/File:Arm_muscles_back_numbers.png

Can we get a separate and appropriately-titled ticket about the Weblight addition to the trusted proxies list and rationale, and where the upstream source of IPs to whitelist is? Keep in mind our proxies database is only manually curated (thus will inevitably fall behind upstream changes), and currently lacks many proxies (IIRC, it only has OperaMini to date). It was an outgrowth of the now-defunct Zero project. We may want to consider a better system for managing "trusted" proxies for analytics purposes into the future.

Nuria added a comment.Fri, Sep 13, 4:58 PM

I have started another ticket that as you mentioned, better explains the rationale behing having "trusted proxies", we really do not need them if we can capture the original ip: https://phabricator.wikimedia.org/T232795