Page MenuHomePhabricator

Phabricator needs to expose notification daemon (websocket)
Closed, ResolvedPublic

Description

Security review is done (See comment from @csteipp)

Now we need to figure out how to proxy the websocket through both nginx and varnish.

Phab setup docs:

https://secure.phabricator.com/book/phabricator/article/notifications/

Event Timeline

Dzahn created this task.Sep 16 2015, 3:59 PM
Dzahn raised the priority of this task from to Normal.
Dzahn updated the task description. (Show Details)
Dzahn added subscribers: Negative24, scfc, Matanya and 10 others.
greg updated the task description. (Show Details)
greg set Security to None.
mmodell changed the task status from Open to Stalled.Oct 6 2015, 9:18 AM
mmodell moved this task from To Triage to Administration (UI) on the Phabricator board.
mmodell added a project: Blocked-on-Security.

with all due respect, it has to be reviewed by security before ops can step in :)

csteipp added a subscriber: csteipp.

Security review is done. Note my comments about no aphlict.log, and making sure the Admin server is not exposed anywhere, when setting it up.

MZMcBride renamed this task from Pharicator needs to expose notification daemon (websocket) to Phabricator needs to expose notification daemon (websocket).Nov 18 2015, 6:00 PM
mmodell updated the task description. (Show Details)Nov 18 2015, 6:03 PM
mmodell edited projects, added Blocked-on-Operations; removed Blocked-on-Security.

The security task is closed. Is this still stalled and Blocked-on-Operations ?

What would un-stall it? Should it be assigned to anyone in ops?

We need to make a plan to get connectivity through to the end host for this. This will probably fall on operations yes but approval to make such a plan is only 48 hours old. This may take a bit to get on the schedule.

The recent Phab upgrade chatter has had my teams ask me to check on this. I think it may have gotten swallowed by the holidays, among other conflicting priorities. Is there an update on the progress of getting connectivity through to the end host for this?

@chasemp just tagging you because your comment was last. Know of any progress towards the plan to unblock this?

greg added a comment.Mar 31 2016, 10:55 PM

We need to make a plan to get connectivity through to the end host for this. This will probably fall on operations yes but approval to make such a plan is only 48 hours old. This may take a bit to get on the schedule.

What can I do to help with this, @chasemp ?

We need to make a plan to get connectivity through to the end host for this. This will probably fall on operations yes but approval to make such a plan is only 48 hours old. This may take a bit to get on the schedule.

What can I do to help with this, @chasemp ?

I wish I had time to get into this. I'm trying to jog my own memory here and recall the standoff. Someone from Release-Engineering-Team could put up changes for both the misc-web reverse proxy and the LVS changes, but someone from ops would have to punch holes in the firewall allowing it to come in since this will operate a new port. The configuration for passthrough or bypass of nginx/varnish on misc-web was TBD iirc. One of the gents from Traffic (brandon or emanuele) are best suited to know what the Right Thing is, I would really just be asking them anyway.

@greg thanks for waking this thread up. :) My teams are still asking for live updated boards.

@chasemp I'm happy to help coordinate any communication or necessary meetings. Just lemme know.

greg added a comment.Apr 5 2016, 9:01 PM

So, as for next steps:

Someone from Release-Engineering-Team could put up changes for both the misc-web reverse proxy and the LVS changes,

I'll see what we can do here versus our other priorities. (aka: manager speak for "too much on our plates, will try to do it").

but someone from ops would have to punch holes in the firewall allowing it to come in since this will operate a new port. The configuration for passthrough or bypass of nginx/varnish on misc-web was TBD iirc. One of the gents from Traffic (brandon or emanuele) are best suited to know what the Right Thing is, I would really just be asking them anyway.

We'll ping them when we (RelEng) have that first part above done.

fgiunchedi added a subscriber: fgiunchedi.

this doesn't seem to be blocked on ops ATM, let us know when the pieces are in place and if we can help

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 27 2016, 8:42 AM

We've basically never configured any websockets stuff through our Traffic layer before. Phab isn't the only use-case, either. We also have stream.wikimedia.org (rcstream) which doesn't currently flow through cache_misc (but we wish it did) because of questions/complexities about websockets.

There's some obvious google links on the topic here:

https://www.varnish-cache.org/docs/trunk/users-guide/vcl-example-websockets.html
https://www.nginx.com/blog/websocket-nginx/

The most important questions, to me anyways, are about whether we're trying to do websocket traffic in parallel with regular HTTPS traffic over the same basic channels (phabricator.wikimedia.org:443 -> cache_misc -> iridium.wikimedia.org:80), using upgrade/connection headers to signal the switch of a connection to WS-mode, or if phab's websocket stuff would be on a completely different public hostname and/or backend server and/or different port.

Dzahn added a comment.May 10 2016, 2:31 PM

... or if phab's websocket stuff would be on a completely different public hostname and/or backend server and/or different port.

As far as i understand it this is to enable the "notification server" (T765) and that says "Enable real-time notifications. You must also run a Node.js based notification server for this to work. " , so based on that i assume it will be a different backend, that Node.js based server.

@Dzahn: yes but it can run on the same hardware as the www service.

chasemp updated the task description. (Show Details)May 11 2016, 7:33 PM
Dzahn added a comment.May 11 2016, 8:27 PM

So then same hostname but different port.

@Dzahn yes I believe so, port 22280 by default.

See "Terminating SSL with a Load Balancer" in the setup instructions.

greg added a comment.Jun 14 2016, 8:22 PM

(All blockers are resolved)

Is the node.js notification service already running on iridium? Do we need some matching config in public DNS + private phab so that it knows its own public hostname/port?

(or, reading the docs, do we want to map phab.wm.o/ws/ to :22280? either way, it doesn't seem configured at all on the iridium side yet)

@BBlack: not set up on iridium because I wasn't entirely clear when/if it would become possible.

do we want to map phab.wm.o/ws/ to :22280

Yes I think that's how it should be set up.

There is minimal setup required to get the node.js service running. I can work on that soon but it will likely be a day or two before I have time to puppetize it.

greg awarded a token.Jun 14 2016, 8:40 PM
BBlack moved this task from Triage to Up Next on the Traffic board.Jun 14 2016, 10:00 PM
mmodell changed the task status from Stalled to Open.Jun 15 2016, 1:34 AM
mmodell awarded a token.

What is the status, please?

Having T765: Enable notification server (real-time pop-up notifications) in Phabricator (this task is its blocker) finally resolved would help a lot to prevent increasing number of actions based on not knowing the task was update in the meantime...

There's a little bit of refactoring work (already in-progress) to do on the Varnish side to support it "correctly", but even if that weren't ready in time we can use DNS hacks (aliased alternate hostname) to do this today. The service needs to exist on iridium so we have something to point the traffic at first, though.

Thanks for the update @BBlack. To be honest, not knowing all the context, I am a bit confused, so I would appreciate any ETA (in a week, in a month, this quarter, by the end of year, not likely ever...) as well. Thank you.

I wouldn't be the one working on turning up the service on iridium, and I'm not sure who would, so I can't really answer that.

BBlack moved this task from Up Next to Triage on the Traffic board.Sep 30 2016, 1:19 PM

There's a little bit of refactoring work (already in-progress) to do on the Varnish side to support it "correctly", but even if that weren't ready in time we can use DNS hacks (aliased alternate hostname) to do this today. The service needs to exist on iridium so we have something to point the traffic at first, though.

I don't think we can use an alternate host name (DNS hack) for websocket traffic because of browsers' same-origin policy. Also we need to terminate ssl just like we do with phabricator web traffic.

@BBlack: I will configure the notification service on iridium.

BBlack added a comment.Oct 4 2016, 2:16 AM

... but even if that weren't ready in time we can use DNS hacks (aliased alternate hostname) to do this today.

I don't think we can use an alternate host name (DNS hack) for websocket traffic because of browsers' same-origin policy.

The hack isn't browser-facing, it's just internal request-routing stuff (e.g. setting up an "iridium-wss" hostname aliasing "iridium" to distinguish the varnish->iridium traffic for the distinct ports).

Change 313937 had a related patch set uploaded (by 20after4):
Configuration for Aphlict

https://gerrit.wikimedia.org/r/313937

@BBlack: https://gerrit.wikimedia.org/r/#/c/313937/ is a first-attempt at puppetizing the aphlict notification service

BBlack moved this task from Triage to General on the Traffic board.Oct 4 2016, 12:43 PM
Aklapper moved this task from Doing to Misc on the Phabricator board.Oct 7 2016, 6:43 PM

Change 313937 merged by Dzahn:
phabricator: Configuration for Aphlict

https://gerrit.wikimedia.org/r/313937

merged per prototype/"labs-only" no-op in prod http://puppet-compiler.wmflabs.org/4348/

jayvdb added a subscriber: jayvdb.Nov 15 2016, 11:55 AM

Change 345617 had a related patch set uploaded (by 20after4):
[operations/puppet@production] Phab: User base:service_unit for aphlict

https://gerrit.wikimedia.org/r/345617

Change 345617 merged by Dzahn:
[operations/puppet@production] Phabricator: Use base:service_unit for aphlict

https://gerrit.wikimedia.org/r/345617

There's a little bit of refactoring work (already in-progress) to do on the Varnish side to support it "correctly", but even if that weren't ready in time we can use DNS hacks (aliased alternate hostname) to do this today. The service needs to exist on iridium so we have something to point the traffic at first, though.

@BBlack https://gerrit.wikimedia.org/r/#/c/379005/ should be the last piece needed to get the notification server up and running on phab1001 so that we can test the networking pieces. I think I have a better understanding of the lvs & varnish parts now, so I can take a stab at writing the puppet config but I'll need someone from traffic to review and correct my mistakes.

Change 389782 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Add phab1001-aphlict alias

https://gerrit.wikimedia.org/r/389782

Change 389782 merged by BBlack:
[operations/dns@master] Add phab1001-aphlict alias

https://gerrit.wikimedia.org/r/389782

Change 389794 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cache_misc: fix cookies websockets

https://gerrit.wikimedia.org/r/389794

Change 389794 merged by BBlack:
[operations/puppet@production] cache_misc: fix cookies websockets

https://gerrit.wikimedia.org/r/389794

mmodell closed this task as Resolved.EditedNov 7 2017, 8:38 PM
mmodell assigned this task to BBlack.

YAY! it only took 2.16 years!

disclaimer: (no snark was intended with this comment)

@BBlack spent a bunch of time debugging issues with websockets + varnish, so thanks a lot for your time and expertise, Brandon! I definitely couldn't have done it.

Change 389799 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] aphlict: add the CNAME in codfw, too

https://gerrit.wikimedia.org/r/389799

Change 389799 merged by BBlack:
[operations/dns@master] aphlict: add the CNAME in codfw, too

https://gerrit.wikimedia.org/r/389799