Page MenuHomePhabricator

Insecure POST traffic
Closed, ResolvedPublic

Description

For normal browser User-Agents, this isn't a big issue: they're fetching pages over GET, getting redirected to HTTPS at that point, and then POST-ing to protocol-relative URLs. However, there's other code out there that just does direct initial POSTs to us over HTTP apparently. Direct POST traffic has two issues wrt HTTPS:

  1. Redirecting POST traffic is tricky to begin with. Not all user agents understand how to do it correctly or consistently. General wisdom seems to be that 307 is the correct code (which is similar in nature to 302, but meant to avoid POST being transformed into GET - there's a 308 that's like 301 as well, but it's even less likely to be widely implemented).
  2. Redirecting POST traffic doesn't actually secure it if the clients don't remember to keep using HTTPS afterwards anyways, because they've already POSTed their data insecurely before a redirect can happen. It's really more of a stopgap / wake up call move, as a precursor to simply breaking it with 403 Forbidden or similar in an effort to get people to notice the breakage and fix their software/configurations. Arguably, we could just skip the redirect step and go straight to breaking them. Either one requires that we take what measures we can to notify the community and/or fix broken software where we can first.

Currently we're not doing either one, and there's a fairly decent volume of insecure POST traffic flowing. I took a few minutes' sample on a single text cache server and turned up these counts of User-Agents doing it:

# cut -d: -f2- postua.log |sort|uniq -c|sort -rn
    499  Peachy MediaWiki Bot API Version 2.0 (alpha 8)
    234  php wikibot classes
    203  AnomieBOT/1.0 (TagDater; see [[User:AnomieBOT]])
    150  Jakarta Commons-HttpClient/3.1
     48  Kindle/1143472533 CFNetwork/711.4.6 Darwin/14.0.0
     40  www.productontology.org/1.0 (Contact: martin.heppATunibw.de) AppEngine-Google; (+http://code.google.com/appengine; appid: s~productontology)
     32  Java/phoneme_advanced-Core-1.3-b16 sjmc-b111
     29  plog4u.org/3.0
     28  ColdFusion
     23  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFTHWI Build/KTU84M)
     21  SineBot/1.5.19(User:SineBot)
     19  Kindle/1143472533 CFNetwork/711.3.18 Darwin/14.0.0
     13  Java/1.7.0_21
     13  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFSOWI Build/KTU84M)
     10  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFASWI Build/KTU84M)
      9  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0
      8  Kindle/1143214083 CFNetwork/711.3.18 Darwin/14.0.0
      8  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T230NU Build/KOT49H)
      8  AnomieBOT/1.0 (MedComClerk; see [[User:MediationBot]])
      7  Zend_Http_Client
      7  WikiFunctions ApiEdit/5.2.0.1 (Microsoft Windows NT 6.1.7601 Service Pack 1; .NET CLR 2.0.50727.5420)
      7  Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
      7  Kindle/1143214083 CFNetwork/711.4.6 Darwin/14.0.0
      6  Kindle/1143472533 CFNetwork/711.1.16 Darwin/14.0.0
      6  Kindle/1143472533 CFNetwork/672.1.15 Darwin/14.0.0
      5  UKBot [[:no:Bruker:UKBot]] - MwClient/0.7.2.dev1 (https://github.com/mwclient/mwclient)
      5  Dalvik/2.1.0 (Linux; U; Android 5.0; SM-G900V Build/LRX21T)
      5  Dalvik/2.1.0 (Linux; U; Android 5.0.1; SCH-I545 Build/LRX22C)
      5  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFARWI Build/KTU84M)
      4  Snoopy v1.2.4
      4  MwClient/0.6.6 (https://github.com/mwclient/mwclient)
      4  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727 ; .NET CLR 4.0.30319)
      4  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )
      4  JWBF DEVEL
      4  Dalvik/2.1.0 (Linux; U; Android 5.1.1; Nexus 7 Build/LMY48G)
      4  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFAPWI Build/KTU84M)
      3  node.js - nodemw - commons-maintenance-bot@toollabs - Maintainer: Rainer Rillke - rillke@wikipedia.de - [[:commons:User:Rillke]]
      3  Mozilla/5.0 (X11; U; Linux armv7l) AppleWebKit/999+ (KHTML, like Gecko) Version/5.0 Safari/999.9+ KindleWidgetUserAgent/1.0
      3  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
      3  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SD4930UR Build/KTU84P)
      3  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-G900T Build/KOT49H)
      3  Dalvik/1.6.0 (Linux; U; Android 4.4.2; LG-V410 Build/KOT49I.V41010d)
      2  python-wikitools/1.2
      2  Python-urllib/2.7
      2  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      2  Kindle/1143472533 CFNetwork/750.2 Darwin/15.0.0
      2  Kindle/1143472533 CFNetwork/711.1.12 Darwin/14.0.0
      2  Kindle/1143472533 CFNetwork/711.0.6 Darwin/14.0.0
      2  Kindle/1142165514 CFNetwork/711.0.6 Darwin/14.0.0
      2  DispensersTools (+http://dispenser.homenet.org/~dispenser/)
      2  Dalvik/2.1.0 (Linux; U; Android 5.1; XT1097 Build/LPE23.32-14)
      2  Dalvik/2.1.0 (Linux; U; Android 5.0.1; SM-N910V Build/LRX22C)
      2  Dalvik/2.1.0 (Linux; U; Android 5.0.1; SM-N910P Build/LRX22C)
      2  Dalvik/2.1.0 (Linux; U; Android 5.0.1; GT-I9505 Build/LRX22C)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.4; 2014818 MIUI/V6.3.5.0.KHJMIBL)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T530NU Build/KOT49H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T520 Build/KOT49H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T330NU Build/KOT49H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T310 Build/KOT49H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-G900H Build/KOT49H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.2.2; GT-P5210 Build/JDQ39)
      2  Dalvik/1.6.0 (Linux; U; Android 4.2.2; GT-P5113 Build/JDQ39)
      2  Dalvik/1.6.0 (Linux; U; Android 4.2.1; M470BSA Build/JOP40D)
      2  Dalvik/1.6.0 (Linux; U; Android 4.1.1; HP Slate 7 Build/JRO03H)
      2  Dalvik/1.6.0 (Linux; U; Android 4.0.4; LG-SU640 Build/Tomato_SU640_V20D_0629)
      2  Avast SimpleHttp/3.0
      2  Apache-HttpClient/UNAVAILABLE (java 1.4)
      2  AnomieBOT/1.0 (BrokenRedirectDeleter; see [[User:AnomieBOT III]])
      1  Theo's Little Bot (http://en.wikipedia.org/wiki/User:Theo's_Little_Bot) / nodemw
      1  The Incutio XML-RPC PHP Library -- WordPress/4.2.2
      1  Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15
      1  Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
      1  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
      1  MediaWiki::API/0.39
      1  Kindle/1143472533 CFNetwork/711.2.23 Darwin/14.0.0
      1  Kindle/1143214083 CFNetwork/672.1.15 Darwin/14.0.0
      1  Kindle/1143210217 CFNetwork/711.1.16 Darwin/14.0.0
      1  Kindle/1143210217 CFNetwork/711.1.12 Darwin/14.0.0
      1  Kindle/1143210217 CFNetwork/672.1.13 Darwin/14.0.0
      1  Kindle/1142951941 CFNetwork/711.3.18 Darwin/14.0.0
      1  Kindle/1142951941 CFNetwork/711.1.16 Darwin/14.0.0
      1  Kindle/1142951941 CFNetwork/672.1.15 Darwin/14.0.0
      1  Kindle/1142706362 CFNetwork/711.1.16 Darwin/14.0.0
      1  Kindle/1142706362 CFNetwork/672.1.15 Darwin/14.0.0
      1  Kindle/1142706362 CFNetwork/672.1.14 Darwin/14.0.0
      1  Kindle/1142423837 CFNetwork/711.2.23 Darwin/14.0.0
      1  Kindle/1142423837 CFNetwork/711.1.16 Darwin/14.0.0
      1  Kindle/1142169601 CFNetwork/672.1.14 Darwin/14.0.0
      1  Kindle/1141899455 CFNetwork/609.1.4 Darwin/13.0.0
      1  HTTPRetriever/1.3.0.0
      1  GroupMeBotNotifier/1.0
      1  Dalvik/2.1.0 (Linux; U; Android 5.1.1; Nexus 10 Build/LMY47V)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0; SM-G900H Build/LRX21T)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0; SM-G900F Build/LRX21T)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0; SAMSUNG-SM-N900A Build/LRX21V)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0; NXA116QC164 Build/LRX21V)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0; ASUS_T00J Build/LRX21V)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; XT1068 Build/LXB22.46-28)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; XT1033 Build/LXB22.46-32)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; VK810 4G Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; SM-T800 Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; SM-T705 Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; SM-T550 Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; SM-G925V Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; SAMSUNG-SM-T807A Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; LG-V400 Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; LG-D801 Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; HTC One Build/LRX22G)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; D5803 Build/23.1.A.1.28)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.2; C6802 Build/14.5.A.0.270)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.1; SM-N910T Build/LRX22C)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.1; SAMSUNG-SGH-I537 Build/LRX22C)
      1  Dalvik/2.1.0 (Linux; U; Android 5.0.1; HTC6525LVW Build/LRX22C)
      1  Dalvik/2.1.0 (Li
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; XT1254 Build/SU2-12)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; XT1056 Build/KXA21.12-L1.28)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; XT1049 Build/KXA21.12-L2.7)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; XT1030 Build/SU6-7.2)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SO-02G Build/23.0.B.1.38)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SO-01G Build/23.0.B.1.59)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-T337V Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-T337T Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-N910T Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-G530M Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; SAMSUNG-SM-G900A Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; MI 4LTE MIUI/V6.6.2.0.KXDCNCF)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; HTC Desire Eye Build/KTU84P)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.4; C6603 Build/10.5.1.A.0.292)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.3; Nexus 7 Build/KTU84L)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFTHWA Build/KTU84M)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.3; KFAPWA Build/KTU84M)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; XT811 Build/XT811)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; TM1088 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SPH-L900 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SPH-L710 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T320 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T311 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T231 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T217S Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-T210R Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-P600 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-N900S Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SM-N9005 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SCH-I435 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SC-04E Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; SAMSUNG-SM-N900A Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; QMV7B Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; Panasonic P61 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; LGMS323 Build/KOT49I.MS32310c)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; LG-LS995 Build/KOT49I.LS995ZVA)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; LG-D295 Build/KOT49I.A1411108394)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; HUAWEI P7-L10 Build/HuaweiP7-L10)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; HP 10 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; GT-P5210 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; D6503 Build/17.1.2.A.0.314)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; BLU STUDIO 5.0 C HD Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; ASUS_T00J Build/KVT49L)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; A1-830 Build/KOT49H)
      1  Dalvik/1.6.0 (Linux; U; Android 4.4.2; 306SH Build/SA300)
      1  Dalvik/1.6.0 (Linux; U; Android 4.3; SM-S765C Build/JLS36C)
      1  Dalvik/1.6.0 (Linux; U; Android 4.3; SM-N900T Build/JSS15J)
      1  Dalvik/1.6.0 (Linux; U; Android 4.3; SM-N9005 Build/JSS15J)
      1  Dalvik/1.6.0 (Linux; U; Android 4.3; GT-I9500 Build/JSS15J)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; SO-04E Build/10.3.1.B.2.42)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; SM-T110 Build/JDQ39)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; SH-08E Build/S8210)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; SGP311 Build/10.3.1.C.0.136)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; K00L Build/JDQ39)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; HUAWEI G750-T01 Build/HuaweiG750-T01)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.2; GT-I9152 Build/JDQ39)
      1  Dalvik/1.6.0 (Linux; U; Android 4.2.1; A240 Build/JOP40D)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; Xoom Build/JZO54K)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; SHW-M250K Build/JZO54K)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; SH-02E Build/S9290)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; MediaPad 7 Youth Build/HuaweiMediaPad)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; GT-P3100 Build/JZO54K)
      1  Dalvik/1.6.0 (Linux; U; Android 4.1.2; GT-I9100 Build/JZO54K)
      1  Dalvik/1.6.0 (Linux; U; Android 4.0.4; SAMSUNG-SGH-I727 Build/IMM76D)
      1  Dalvik/1.6.0 (Linux; U; Android 4.0.3; GT-P5100 Build/IML74K)
      1  Dalvik/1.6.0 (Linux; U; Android 4.0.3; F-01D Build/V08R31A)
      1  AnomieBOT/1.0 (BAGBot; see [[User:AnomieBOT]])

The obvious community bots will probably be easy to fix with some notification. I'm more worried about things like Android/Kindle UAs in there. Why are they even in this list? It's possible it's from Apps on these devices that are using POSTs to api.php for getting article content snippets somehow? I would've thought it those would be GETs though. Could be our own apps for all I know.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

what is the issue with 1.7 that makes it difficult to support HTTPS for this case?

@Merl: Could you elaborate?

I am a java expert. And i am using a libary which normally needs java 1.8. I rewrote some parts, so that i can use the libary with java 1.7 on labs.
Problem is that this library in this limited rewrite version does not know ssl protocol, but i have added a trigger which adds ssl support when getting a redirect answer from server.
This was quite simply the implement, but rewriting the whole code the have full ssl support with java 1.7 is imo too much unnecessary work.
The first http request of my bot is always a sitematrix request on meta and then a http siteinfo request on local wikis. While doing these first requests the bot changes to https. So when doing the login action later my bot is already using https.

I don't understand why i should investigate so much work although i have a currently working solution and one day when labs changes to java 1.8 no additional work will be needed for full ssl support (exempt replacing the library to original version).

I don't understand why i should investigate so much work although i have a currently working solution and one day when labs changes to java 1.8 no additional work will be needed for full ssl support (exempt replacing the library to original version).

The solution doesn't actually work if it relies on redirects for initial POST requests, as those can't be securely redirected. What would work would be one of:

  • Do an initial GET request to the API URL at startup to notice the redirect to HTTPS, and then cache that information to use HTTPS for all POST traffic from there on
  • Simply update whatever configuration is supplying the initial API URLs to use https:// URLs

The patch at https://gerrit.wikimedia.org/r/#/c/221974/ has been updated to go straight to 403 when it's merged, as the 307 redirect doesn't buy us much really (zero in terms of security, and I guess it temporarily doesn't break some subset of clients who would later be broken by the 403, but it's not really softening the blow for any particular client, which would fall into one of the two breakage buckets).

I'm thinking at this point we should target non-labs first (enforce it for non-WMF IPs, basically), as that buys us a lot with the outside world, and then keep pushing on the labs bots issues a bit more (which are the bulk of the bot requests anyways).

A post to wikitech-l should probably happen soon as well, but we should decide on deadline dates to put in it. There's also some investigation to do on the Dalvik/Kindle requests...

Regarding the Dalvik, Kindle, and also the "Java/phoneme_advanced" (also mobile stuff), the POST requests seem to be hitting the Parse API:

action=parse&format=json&text=%27%27%27Timberlake%27%27%27+may+refer+to+the+following%3A%0A%0A%3D%3DPeople%3D%3D%0A*+%5B%5BBob+Timberlake+%28American+football%29%5D%5D%2C+former+All-American+college+and+NFL+football+player%0A*+%5B%5BBob+Timberlake+%28artist%29%5D%5D%2C+North+Carolina+painter%2C+artist+and+designer+of+clothing+and+furniture+%0A*+%5B%5BCharles+B.+Timberlake%5D%5D%2C+former+U.S.+Representative+from+Colorado%0A*+%5B%5BChris+Timberlake%5D%5D%2C+professional+basketball+player+for+the+Purefoods+Tender+Juicy+Giants+of+the+PBA%0A*+%5B%5BCraig+Timberlake%5D%5D%2C+American+stage+actor%0A*+%5B%5BGary+Timberlake%5D%5D%2C+former+Major+League+Baseball+pitcher%0A*+%5B%5BHenry+Timberlake%5D%5D%2C+colonial+American+officer%2C+journalist%2C+and+cartographer%0A*+%5B%5BHenry+Timberlake+%28merchant+adventurer%29%5D%5D%2C+prosperous+London+ship+captain%0A*+%5B%5BJames+Timberlake%5D%5D%2C+American+lawman%0A*+%5B%5BJustin+Timberlake%5D%5D%2C+American+singer+and+actor%0A*+%5B%5BJessica+Timberlake%5D%5D%2C+American+actress%0A*+%5B%5BPhilip+Hunter+Timberlake%5D%5D%2C+American+entomologist%0A*+%5B%5BRichard+Timberlake%5D%5D%2C+Free+Banking+economists%0A*+%5B%5BTimberlake+Wertenbaker%5D%5D%2C+British+playwright%0A%0A%3D%3DPlaces%3D%3D%0A*+%5B%5BErlanger%2C+Kentucky%5D%5D%2C+previously+known+as+Timberlake%0A*+%5B%5BTimberlake%2C+North+Carolina%5D%5D%0A*+%5B%5BTimberlake%2C+Ohio%5D%5D%0A*+%5B%5BTimberlake%2C+Virginia%5D%5D%0A*+%5B%5BTimberlake+High+School%5D%5D%2C+Spirit+Lake%2C+Idaho%0A%0A%3D%3DCourt+cases%3D%3D%0A*+%27%27%5B%5BTimberlake+v.+the+State%5D%5D%27%27+%281980%29%0A%0A%3D%3DSee+also%3D%3D%0A*+%5B%5BTimber+Lake+%28disambiguation%29%5D%5D%0A%0A%7B%7Bdisambig%7D%7D%0A%5B%5BCategory%3ASurnames%5D%5D&title=Timberlake

Does anyone know what generates these requests? Is it our own mobile app, or someone else's?

The format parameter applies to all API requests: https://en.wikipedia.org/w/api.php?action=help&modules=main

Heh, I clicked "Submit" before I was done editing, so now you see the process I go through, that wasn't supposed to be there :) I first document whatever completely ignorant thing I'm thinking, then I go back and figure out what I can and refine it :)

Latest updates on traffic sampling on the text cluster:

root@cp1065:~# grep User-Agent postua5.log|cut -d: -f2-|egrep 'Kindle|Dalvik'|wc -l
395
root@cp1065:~# grep User-Agent postua5.log|cut -d: -f2-|egrep -v 'Kindle|Dalvik'|sort|uniq -c|sort -rn
    242  Jakarta Commons-HttpClient/3.1
     48  Java/phoneme_advanced-Core-1.3-b16 sjmc-b111
     28  plog4u.org/3.0
     12  Zend_Http_Client
      8  Dispenserbot (+http://dispenser.homenet.org/~dispenser/)
      8  ColdFusion
      6  Apache-HttpClient/UNAVAILABLE (java 1.4)
      5  Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
      4  www.productontology.org/1.0 (Contact: martin.heppATunibw.de) AppEngine-Google; (+http://code.google.com/appengine; appid: s~productontology)
      4  P3-CMS LinqToWiki
      4  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727 ; .NET CLR 4.0.30319)
      3  Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15
      3  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
      2  MwClient/0.6.4
      2  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
      2  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )
      2  GrandPadWikiAPIHelper/1.0 cobra/2.0 (grandpad.net; okhttp)
      1  python-requests/2.5.1 CPython/2.7.6 Linux/3.13.0-45-generic
      1  Mozilla/5.0 (Linux; U; Android 4.2.2; vi-vn; ALCATEL ONE TOUCH 7040D Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.2 Mobile Safari/534.30
      1  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.0.3705)
      1  Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
      1  HTTPRetriever/1.3.0.0
      1  GroupMeBotNotifier/1.0

(Also, I've checked the mobile and upload clusters, and they're not getting any insecure POST traffic at all)

It's especially suspicious that the mobile-looking Dalvik/Kindle UAs are not hitting the mobile cluster (they're hitting the standard language wikis like en.wikipedia.org, etc). I suspect these are not our own mobile apps or they'd at least be using the mobile endpoints/hostnames. The Kindle UAs in particular are often from iPhone's (rather than my initial suspicion of actual Kindle devices), so they may be from Amazon's Kindle app on the iPhone, pulling up content fragments as some kind of reference tool. I really don't know if they've coordinated this at all with us in the past.

My thoughts on how to proceed from here are:

  1. Go ahead and break insecure POST for all clusters except the text cluster: the others aren't getting any insecure POST traffic of note anyways, and this will prevent someone deploying any new code which relies on that and creates a new dependency.
  2. Make a post to wikitech-l summarizing the basics of the issue and the remaining requests, and announcing that we'll be breaking insecure POST 30 days after that email.
  3. Try to find an avenue to reach out to Amazon in particular about their Kindle app.
  4. Continue monitoring the shifts in the traffic levels and UAs, and then break it for everywhere but WMF IPs (labs) before breaking it for labs as well, but not before the announced date.

Change 221974 abandoned by BBlack:
HTTPS: Break insecure POST with 403

https://gerrit.wikimedia.org/r/221974

Change 237110 had a related patch set uploaded (by BBlack):
Limit insecure POST to text-cluster only

https://gerrit.wikimedia.org/r/237110

I've taken the liberty of sending a mail to the support department of grandpad.net

Glad to see my bots are no longer listed.

This is the team from grandpad.net - @TheDJ thank you for reaching out to us.

We've corrected the problem (changed http to https) and we're rolling out the change to all our users over the next few days.

@BBlack Can you clarify what the timeline is for the breaking change? You mentioned an 'announced date' but I don't seem to be able to find what that date is.

I've taken the liberty of sending a mail to the support department of grandpad.net

This is the team from grandpad.net - @TheDJ thank you for reaching out to us.

We've corrected the problem (changed http to https) and we're rolling out the change to all our users over the next few days.

@BBlack Can you clarify what the timeline is for the breaking change? You mentioned an 'announced date' but I don't seem to be able to find what that date is.

Thanks! We haven't made any announcement yet. The discussion above is more about planning to make such an announcement. We'd probably announce it 30 days out from the change, and then of course still have to re-evaluate remaining traffic again before we really do it...

Change 237110 merged by BBlack:
Limit insecure POST to text-cluster only

https://gerrit.wikimedia.org/r/237110

Change 266958 had a related patch set uploaded (by BryanDavis):
Log user-agents that are using HTTP when HTTPS is preferred

https://gerrit.wikimedia.org/r/266958

Change 266958 merged by jenkins-bot:
Log user-agents that are using HTTP when HTTPS is preferred

https://gerrit.wikimedia.org/r/266958

Change 267207 had a related patch set uploaded (by BryanDavis):
Log user-agents that are using HTTP when HTTPS is preferred

https://gerrit.wikimedia.org/r/267207

Change 267207 merged by jenkins-bot:
Log user-agents that are using HTTP when HTTPS is preferred

https://gerrit.wikimedia.org/r/267207

So, we've had the API warning up for a couple of months now. In general, we've continually fallen behind on promises to notify -> kill insecure POSTS, for lack of time to deal with this. In theory we should've been announcing a kill date back in September. Note to self: actually make an announcement to the relevant mailing lists giving a final cutoff date.

@BBlack If you want someone to remind you about it, I am happy to volunteer. ;)

The kibana dashboard at https://logstash.wikimedia.org/#/dashboard/elasticsearch/api-feature-usage-http (NDA required) shows the long tail of bots that are still exploiting the loophole.

Thinking ahead a little (because I'm sure the long tail will still be long after we go through the announcement -> cutoff date phase): it would be fairly trivial to make this a soft-ish landing. We can link to the bug in the error response, and we can also start out with random errors and slowly increase the percentage (e.g. fail out 1% of such requests randomly on the first day and go from there).

Announcement email (finally) sent! The cutoff dates/process are:

  • 2016-06-12 - We'll randomly reject 10% of insecure POST with "403 - Insecure POST Forbidden - use HTTPS"
  • 2016-07-12 - Reject all insecure POST similarly

ML Archive links:

Also linked to enwiki's Bot owner's noticeboard here:

The Community team will be helping us track down and notify bot account owners making insecure requests from the logstash data as well.

Announcement email (finally) sent! The cutoff dates/process are:
....

The Community team will be helping us track down and notify bot account owners making insecure requests from the logstash data as well.

We are going to send notice to labs-annouce list as well

Change 289205 had a related patch set uploaded (by BBlack):
VCL: block 10% insecure post on non-"secure_post" clusters

https://gerrit.wikimedia.org/r/289205

@Steinsplitter reported to me on irc that

for protocol relative urls in mwclient, scheme='https' must be set in the config to enable https otherwise it will use http

Hopefully that bit of knowledge can help others out in their migration.

@Steinsplitter reported to me on irc that

for protocol relative urls in mwclient, scheme='https' must be set in the config to enable https otherwise it will use http

Hopefully that bit of knowledge can help others out in their migration.

@Steinsplitter : Could you explain a little more? mwclient does not accept a scheme parameter as I'm aware of.

@Steinsplitter reported to me on irc that

for protocol relative urls in mwclient, scheme='https' must be set in the config to enable https otherwise it will use http

Hopefully that bit of knowledge can help others out in their migration.

@Steinsplitter : Could you explain a little more? mwclient does not accept a scheme parameter as I'm aware of.

When using mwclient.ex.ConfiguredPool() (deprecated, but old scipts...)

@Qgil

sort of a hail mary ping as I'm not sure where to raise the alarm on this.

It seems merlbot is very active on de.wikipedia. When it has been affected in the past we were told it is relied on. This bot as-is will cease to be able to function when this task comes to fruition. @Merl seems to be away for an extended period. The actual bot seems not to have a license and is very complex and nuanced. Without maintainer intervention Cloud-Services folks cannot carry this forward. It seems prudent to broadcast these facts to anyone appropriate in the WMDE or de.wp communities but we are unsure who that would be.

edit:

see also T121279 ;)

My team left a message on wiki for Merlissimo on 15 May, but the bot owner probably hasn't seen it and realistically cannot be expected to respond, given the situation.

Do you think that a note at Wikipedia:Technik/Werkstatt would produce any assistance or alternatives (e.g., bots that can replace part or all of what Merlbot is doing)?

@Steinsplitter is active in this task and he might have suggestions about next steps. CCing @Bmueller just in case she has other ideas.

@Qgil, thanks for letting me know. @Andrew already emailed me last night. I'm going to talk to some people and we try to figure something out.

Note T121279 is more-specific to the Merlbot/Java issues and has some recent traffic at the bottom too.

Graph from logstash of insecure req rate over the past 28 days in 12h increments:

2016-06-03-185730_1907x438_scrot.png (438×1 px, 54 KB)

Change 289205 merged by BBlack:
VCL: block 10% insecure post on non-"secure_post" clusters

https://gerrit.wikimedia.org/r/289205

Latest list of accounts still making insecure requests over the past ~24H: T136674#2385440

Change 298336 had a related patch set uploaded (by BBlack):
Insecure POST: 20% fail for labs, 100% for external

https://gerrit.wikimedia.org/r/298336

Change 298336 merged by BBlack:
Insecure POST: 20% fail for labs, 100% for external

https://gerrit.wikimedia.org/r/298336

Change 298539 had a related patch set uploaded (by BBlack):
Mailing list announcement link in 403 response for insecure-post

https://gerrit.wikimedia.org/r/298539

Change 298539 merged by BBlack:
Mailing list announcement link in 403 response for insecure-post

https://gerrit.wikimedia.org/r/298539

Change 299532 had a related patch set uploaded (by BBlack):
insecure post: 100% failure, loophole closed

https://gerrit.wikimedia.org/r/299532

Change 299532 merged by BBlack:
insecure post: 100% failure, loophole closed

https://gerrit.wikimedia.org/r/299532