Page MenuHomePhabricator

(www.)wmfusercontent.org should respond to HTTP
Closed, ResolvedPublic

Description

This domain doesn't currently have any HTTP response. Let's either redirect it to www.wikimedia.org, or at least have it respond with the default Apache doc root that indicates it points to a Wikimedia Foundation server.

Event Timeline

Krinkle created this task.Jul 3 2015, 10:13 PM
Krinkle raised the priority of this task from to Medium.
Krinkle updated the task description. (Show Details)
Krinkle added a project: acl*sre-team.
Krinkle changed the visibility from "Public (No Login Required)" to "Custom Policy".
Krinkle changed the edit policy from "All Users" to "Custom Policy".
Krinkle added subscribers: Krinkle, BBlack, JeanFred, Mike_Peel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2015, 10:13 PM
Krinkle renamed this task from (www.)wmfusercontent.org should respond to TTP to (www.)wmfusercontent.org should respond to HTTP.Jul 3 2015, 10:13 PM
Krinkle set Security to None.
Krinkle changed the visibility from "Custom Policy" to "Public (No Login Required)".
Krinkle changed the edit policy from "Custom Policy" to "All Users".
Restricted Application added a subscriber: Matanya. · View Herald TranscriptJul 3 2015, 10:14 PM

Change 222859 had a related patch set uploaded (by John F. Lewis):
(www.)wmfusercontent.org point to text-lb

https://gerrit.wikimedia.org/r/222859

Change 222859 abandoned by John F. Lewis:
(www.)wmfusercontent.org point to text-lb

https://gerrit.wikimedia.org/r/222859

Change 222860 had a related patch set uploaded (by John F. Lewis):
(www.)wmfusercontent.org point to text-lb

https://gerrit.wikimedia.org/r/222860

Change 222860 abandoned by John F. Lewis:
(www.)wmfusercontent.org point to text-lb

https://gerrit.wikimedia.org/r/222860

jcrespo changed the task status from Open to Stalled.Sep 11 2015, 5:48 PM
jcrespo added a subscriber: jcrespo.

I want to remember there was some disagreement on this issue.

jcrespo lowered the priority of this task from Medium to Low.Sep 11 2015, 5:48 PM
jcrespo removed a project: Patch-For-Review.

I want to remember there was some disagreement on this issue.

What disagreement?

That having an HTTP page was a good solution or even a desired one (because the lack of TLS). I didn't participate on the discussion, though, not remember it very well, but I wanted to reflect the reality "no one is working on this" by lowering the priority.

There's some confusion here due to the use of "HTTP". This issue isn't about protocol (HTTP vs HTTPS). It's just about whether, if a user were to browse to https://www.wmfusercontent.org (or the root domain as https://wmfusercontent.org), there should actually be an HTML page there, or some other indication of the domain's legitimacy or purpose. Right now there is no address for that, and users would have to type or guess or https://www.wmfusercontent.org based on investigating that they saw the browser load some content from phab.wmfusercontent.org.

It sounds nice on the surface, but honestly it doesn't prove anything to the user in any legitimate sense, so it seems wasteful to even set it up.

https://phab.wmfusercontent.org is served with a wildward certificate. So we do have one. And phab.wmfusercontent.org is served from varnish/misc so presumably the certificates is already installed as well.

Pointing wmfusercontent.org there would solve this I think. Given the below works:

$ curl 'https://phab.wmfusercontent.org' -H 'Host: wmfusercontent.org'
> ... Error: 404, Domain not served here ..

It's not forensic proof, but it's a courtesy to our users. (Perhaps akin to https://www.gstatic.com or https://googleapis.com)

https://phab.wmfusercontent.org is served with a wildward certificate. So we do have one. And phab.wmfusercontent.org is served from varnish/misc so presumably the certificates is already installed as well.

Again, this issue has nothing to do with HTTPS/TLS issues. It's all fine on that level, yes.

Pointing wmfusercontent.org there would solve this I think. Given the below works:

$ curl 'https://phab.wmfusercontent.org' -H 'Host: wmfusercontent.org'
> ... Error: 404, Domain not served here ..

It's not forensic proof, but it's a courtesy to our users. (Perhaps akin to https://www.gstatic.com or https://googleapis.com)

I think the question is, why? There was some IRC discussion on this before. It doesn't really prove anything, and we're not publishing or linking those hostnames anywhere. This seems to be satisfy pointless curiosity of users who look at a browser developer console and wonder about an internally-used domainname, then try a completely different hostname from that domain in their browser and get annoyed that it doesn't do anything?

This seems to be satisfy pointless curiosity of users who look at a browser developer console and wonder about an internally-used domainname, then try a completely different hostname from that domain in their browser and get annoyed that it doesn't do anything?

I am one of them.

This seems to be satisfy pointless curiosity of users who look at a browser developer console and wonder about an internally-used domainname, then try a completely different hostname from that domain in their browser and get annoyed that it doesn't do anything?

I am one of them.

I came across this issue when I was trying to use an extension that restricted access to third-party domains when I accessed websites (see T104730, which might be a duplicate of this report or vice-versa?). It seemed odd that this website was linked to then, even more so since it didn't provide any information about what the URL was being used for by HTTP.

It would be trivial to add a splash page to the domain, or to redirect it appropriately - probably far less time than the discussion on these two tickets has taken up!

The problem here is with people's perceptions mostly :/ It's a common pattern to use multiple domainnames to fetch sub-resources of a site. Aside from the obvious examples like gstatic, even here we traditionally had every wiki page on e.g. en.wikipedia.org also loading resources from bits.wikimedia.org.

The crux here is: how does one define whether these alternate domains are "3rd party" or legitimate? There really is no good way to know that, although you could get heuristically close with whois lookups perhaps (which this case would pass muster on). The bottom line is that the original site you visited (phabricator.wikimedia.org) sent you links to that "third party" domain (phab.wmfusercontent.org). Either you trust that the original site is uncompromised and operates that way normally and accept it as functional and fine, or... well, there's really no other alternative.

The alternative is you wonder if phab.wmfusercontent.org is actually owned by some malicious third party and phabricator.wikimedia.org was somehow compromised into linking you there. However, in that scenario, the malicious third party also controls what you're asking to see as verification: any other content also hosted at that malicious domain, which the attacker might fill in as "Hey this is totally a legit site operated by the WMF". That aside, the arguments for wmfusercontent.org and/or www.wmfusercontent.org make no fundamental sense to begin with. The links you're trying to investigate were to phab.wmfusercontent.org, which actually does load and show you a copy of the main phabricator site. You can't just go invent other non-existent hostnames that happen to be in the same domain to do your verification with, that makes no fundamental sense.

At the root of it all, if you (or by extension, some browser extension?) thinks that heading to https://wmfusercontent.org really tells you *anything* useful that you want or need to know to validate something about our phab pages, you're mistaken. There's no point in appeasing the mistaken and invalid assumptions of a small minority by introducing new hostnames and pages into our infrastructure. This is one of many many examples of a historical pattern here at the WMF of defaulting to creating any and everything anyone ever asks for without regard to the utility of it or any kind of long-term cost:benefit analysis...

The problem here is with people's perceptions mostly :/ It's a common pattern to use multiple domainnames to fetch sub-resources of a site. Aside from the obvious examples like gstatic, even here we traditionally had every wiki page on e.g. en.wikipedia.org also loading resources from bits.wikimedia.org.

That example would be a great way forward: keep the domain names to a minimum, and use subdomains to point towards different servers. That way, there are minimal different domain names to verify, and there's the confidence in them provided by their use for the main projects. As an example: phabricator.wikimedia.org is far more trustworthy than wikimediaphabricator.org would be.

The problem here is with people's perceptions mostly :/ It's a common pattern to use multiple domainnames to fetch sub-resources of a site. Aside from the obvious examples like gstatic, even here we traditionally had every wiki page on e.g. en.wikipedia.org also loading resources from bits.wikimedia.org.

That example would be a great way forward: keep the domain names to a minimum, and use subdomains to point towards different servers. That way, there are minimal different domain names to verify, and there's the confidence in them provided by their use for the main projects. As an example: phabricator.wikimedia.org is far more trustworthy than wikimediaphabricator.org would be.

Yeah but if you look closer, that example (en + bits) doesn't share anything but the trailing .org. It's wikiPedia vs wikiMedia. Similarly, all of our wikis under wikipedia.org also make hits at various times to e.g. login.wikimedia.org too.

Yeah but if you look closer, that example (en + bits) doesn't share anything but the trailing .org. It's wikiPedia vs wikiMedia. Similarly, all of our wikis under wikipedia.org also make hits at various times to e.g. login.wikimedia.org too.

I saw that before I replied. :-) We have meta.wikimedia.org, commons.wikimedia.org and other XX.wikimedia.org projects that provide confidence in the validity of that domain. Sure, it would be nice to use a single domain for everything, but Wikipedia vs. Wikimedia is a very different conversation!

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

We could have a philosophical debate about whether software exists to provide comfort for humans or not, if that would help. ;-) But it might be simpler to just move this dependency to a *.wikimedia.org domain, or at least give it a http response?

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

We could have a philosophical debate about whether software exists to provide comfort for humans or not, if that would help. ;-) But it might be simpler to just move this dependency to a *.wikimedia.org domain, or at least give it a http response?

It does have an HTTP response. Go type phab.wmfusercontent.org into your browser. As for simpler, that all depends on your point of view about the costs here. I'm willing to waste time on this trivial issue because I want to reverse a trend that over time creates hundreds and thousands of little costs that seem simple up-front, but carry a long-term maintenance burden in the aggregate.

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

We could have a philosophical debate about whether software exists to provide comfort for humans or not, if that would help. ;-) But it might be simpler to just move this dependency to a *.wikimedia.org domain, or at least give it a http response?

It does have an HTTP response. Go type phab.wmfusercontent.org into your browser. As for simpler, that all depends on your point of view about the costs here. I'm willing to waste time on this trivial issue because I want to reverse a trend that over time creates hundreds and thousands of little costs that seem simple up-front, but carry a long-term maintenance burden in the aggregate.

This ticket is about wmfusercontent.org . That doesn't have a HTTP response.

Why would anyone go there? That hostname doesn't exist, and has never been linked anywhere.

We seem to have come full circle... I'm still not convinced that it's worth spending more time discussing this than it would take to fix the issue!

BBlack added a comment.EditedSep 14 2015, 9:04 PM

These are the kinds of things we've had to deal with over the past several months, most of which could've been avoided by and are a part of this larger philosophical problem, IMHO:

T101048 (we have 140+ junk redirect domains to deal with...)
T102815 (removed junk www subdomains from all the language wikis...)
T102814 (removed junk multi-level subdomains of wikipedia.org)
T102826 (as above for wikimedia.org)
T102827 (the special case of the oddball donate subdomain)
T107575 (problems caused by pointlessly aliasing a site from one domain to another)
T110511 (basically the same)

... And that's just the links I could dredge up quickly on the specific topic of pointless hostnames. Look how long it took to resolve those of them that are even resolved. Some will remain outstanding for months. Some have taken up pages of discussions, helped induced critical bugs due the added complexities of working around them, wasted time pushing more changes through gerrit, etc. The time burden wasted on these kinds of things is significant in the long run when they can hold up, detract from, or make more-complicated other work years into the future.

hashar added a subscriber: hashar.Sep 14 2015, 9:14 PM

There is nothing requiring http on wmfusercontent.org and I am not sure what would be the use case.

Since the whole domain can host any arbitrary file (per design) and is solely used for inclusion from other sites, there is no point in having a landing page here. Human being are not sent to / ever.

As @BBlack described, this task should just be declined imho.

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

Not if the contents are “This is a legitimate domain owned by the WMF, as you can verify checking that it appears at https://wikitech.wikimedia.org/wiki/WMF_domains_you_didn't_know_about_but_are_legitimate” which is precisely how I would fill such domain. (And yes, we have wikimedians nit-picking enough that will check that it is ineedd protected and will verify _who_ last edited it)

Those domains that hide behind a dns error are red flags for phishing, scam, etc. Should our users be trusting subdomains like wmfuserscontent.org or wikimedia.mtk4988.com just because "something loaded content from there"?
Actually, it wouldn't be hard to prepare a social engineering attack in a wiki js with one of those...

But now we're off in the territory of human comfort levels, not software. It's still meaningless for any real verification to populate non-existent related hostnames just for people to look at them and feel re-assured by something that has zero re-assurance value in the real world.

Not if the contents are “This is a legitimate domain owned by the WMF, as you can verify checking that it appears at https://wikitech.wikimedia.org/wiki/WMF_domains_you_didn't_know_about_but_are_legitimate” which is precisely how I would fill such domain. (And yes, we have wikimedians nit-picking enough that will check that it is ineedd protected and will verify _who_ last edited it)

And what percentage of users are ever going to go there, or click that link? Why would they ever even know or care? Their browser says everything's fine by default, and guess what, it is. We're talking about serving the special interest of a paranoid minority who aren't even being paranoid in the right direction. And we weren't really talking about putting a message there initially, but just sending it to a "domain not served here" page, like https://www.gstatic.com/ does (which is a the equivalent situation for google.com). You *can* today navigate to https://phab.wmfusercontent.org/ and get better than that, and that's the only hostname that exists there. The other hostnames discussed here do not exist for a developer tool or extension to even inform you of.

Those domains that hide behind a dns error are red flags for phishing, scam, etc. Should our users be trusting subdomains like wmfuserscontent.org or wikimedia.mtk4988.com just because "something loaded content from there"?
Actually, it wouldn't be hard to prepare a social engineering attack in a wiki js with one of those...

They're not "hiding behind a DNS error". The DNS error is legitimate, and it's the user's fault in this case. The user has typed a non-existent hostname into an address bar, which they've invented based on guesswork and hope, and it didn't work.

If you're going to go through all that trouble, then do something more-legitimate like check the whois record for the domain that concerns you to see that it's registered to the Foundation, and/or navigate to phab.wmfusercontent.org and click the HTTPS icon in the top left to see that it has an Org Validation cert with our name on it.

Bawolff closed this task as Resolved.Jul 10 2016, 5:40 PM
Bawolff added a subscriber: Bawolff.

both http://phab.wmfusercontent.org and http://wmfusercontent.org respond with 301 Moved Permanently. Closing as fixed.

Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfusercontent.org/ gives the error message:

Unhandled Exception ("Exception")
This Phabricator install is configured as "https://phabricator.wikimedia.org", but you are using the domain name "phab.wmfusercontent.org" to access a page which is trying to set a cookie. Access Phabricator on the configured primary domain or a configured alternate domain. Phabricator will not set cookies on other domains for security reasons.

That's a minor issue, but it would be good to fix it if possible.

Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfusercontent.org/ gives the error message:

To be clear, i did not fix, i just noticed it was fixed.

Unhandled Exception ("Exception")
This Phabricator install is configured as "https://phabricator.wikimedia.org", but you are using the domain name "phab.wmfusercontent.org" to access a page which is trying to set a cookie. Access Phabricator on the configured primary domain or a configured alternate domain. Phabricator will not set cookies on other domains for security reasons.

That's a minor issue, but it would be good to fix it if possible

Arguably thats a reasonable error message. Its saying you should go to the other domain.

Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfusercontent.org/ gives the error message:

To be clear, i did not fix, i just noticed it was fixed.

OK - thanks to whoever fixed it (I can't spot who that was from this conversation)!

Unhandled Exception ("Exception")
This Phabricator install is configured as "https://phabricator.wikimedia.org", but you are using the domain name "phab.wmfusercontent.org" to access a page which is trying to set a cookie. Access Phabricator on the configured primary domain or a configured alternate domain. Phabricator will not set cookies on other domains for security reasons.

That's a minor issue, but it would be good to fix it if possible

Arguably thats a reasonable error message. Its saying you should go to the other domain.

It's saying that there was a cookie error with accessing this domain - which is different from saying that you should just access the other domain. We can be more transparent here (by just redirecting to the other domain), but this is definitely an improvement on what happened before!