Page MenuHomePhabricator

Review HttpsSupport logging code for re-deployment
Closed, ResolvedPublic

Description

When Tim reviewed the HTTPS support detection survey in 2013, he identified several issues with the instrumentation that made us doubt the quality of the results:

1As I was saying on IRC two days ago, I don't think that that data is
2correct. The test used upload.wikimedia.org, at a time when the
3browser would already have a keepalive connection open to port 80 but
4not port 443. Client-side aborts caused by navigating away from the
5page are not logged, and so if the HTTP request completes earlier than
6the HTTPS request, this opens a window for a systemic error in the
7results. If the user navigates away after the completion of the HTTP
8request, but before the completion of the HTTPS request, this would be
9logged as an HTTPS failure.
10
11My theory two days ago was that the size of the window would be the
12time it took the browser to do the connection setup, or possibly the
13whole request. But it occurs to me now that the browser may have its
14maximum number of concurrent connections open when the test starts. So
15the request might be queued by the browser until a connection slot
16opens up. That queue wait time might depend on network bandwidth, as
17well as RTT.
18
19If both the HTTP and the HTTPS tests used a hostname which is unlikely
20to have a keepalive connection open, and if the order of the two tests
21was randomized rather than always sending the HTTP request first, then
22I think you would get closer to accurate results. However, actual
23performance differences between HTTPS and HTTP would still cause a
24systemic error.
25
26-- Tim Starling

Since Erik would like us to re-run this test, I have revised the instrumentation code to address these issues.

  • The requests are now for //performance.wikimedia.org, which is not likely to have an open connection.
  • The order of the two tests is randomized.
  • Both tests are set to time out after 5s, but we never log before 6 seconds have elapsed. This ensures that if the user navigates away from the page before both tests have had 5 full seconds to succeed, we don't log anything at all. This should control for systemic error resulting from different performance characteristics of HTTP and HTTPS.

The original instrumentation code is available here:
https://raw.githubusercontent.com/wikimedia/mediawiki-extensions-WikimediaEvents/433cd6874b6f16a0d6940d19ffe01b75acd80580/modules/ext.coreEvents.httpsSupport.js

The diff containing my proposed changes is available here:

At Erik's request, I am keeping this private. I am obviously aware that the code from the previous round of data collection is publicly available, and that the changes will be public too if and when we start serving them to clients.

Event Timeline

ori assigned this task to tstarling.
ori raised the priority of this task from to Needs Triage.
ori updated the task description. (Show Details)
ori changed the visibility from "Public (No Login Required)" to "Custom Policy".
ori changed the edit policy from "All Users" to "Custom Policy".
ori changed Security from None to Software security bug.
ori added subscribers: ori, Eloquence, csteipp, faidon.

I read the code, it seems OK, although I didn't rigorously verify every line since I'm not very familiar with this sort of JavaScript at the moment. Ori's theory for controlling systematic errors seems sound.

I don't think this needs to be private. Erik said in the relevant email thread that the sitewide HTTPS-by-default project is not private.

Why is this still being handled as a security issue?

This ticket can be made public. Background: This data collection is part of preparing for a decision on the default HTTPS configuration across wikis, so we get a better sense of failure rates with different clients / in different geographies.

ori changed the visibility from "Custom Policy" to "Public (No Login Required)".
ori changed the edit policy from "Custom Policy" to "All Users".
Restricted Application changed the visibility from "Public (No Login Required)" to "Custom Policy". · View Herald TranscriptMar 4 2015, 11:33 PM
Restricted Application changed the edit policy from "All Users" to "Custom Policy". · View Herald Transcript
Restricted Application added a project: acl*security. · View Herald Transcript
Krenair changed the visibility from "Custom Policy" to "Public (No Login Required)".
Krenair changed the edit policy from "Custom Policy" to "All Users".
Krenair changed Security from Software security bug to None.
Krenair added a project: HTTPS-by-default.

Change 195817 had a related patch set uploaded (by Krinkle):
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195817

Change 195817 merged by jenkins-bot:
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195817

Change 195822 had a related patch set uploaded (by Ori.livneh):
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195822

Change 195822 merged by Ori.livneh:
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195822

Change 195823 had a related patch set uploaded (by Ori.livneh):
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195823

Change 195823 merged by Ori.livneh:
Re-introduce HTTPS support detection

https://gerrit.wikimedia.org/r/195823

BBlack subscribed.