Security review is done (See comment from @csteipp)
Now we need to figure out how to proxy the websocket through both nginx and varnish.
Phab setup docs:
https://secure.phabricator.com/book/phabricator/article/notifications/
Security review is done (See comment from @csteipp)
Now we need to figure out how to proxy the websocket through both nginx and varnish.
Phab setup docs:
https://secure.phabricator.com/book/phabricator/article/notifications/
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • chasemp | T1047 Phabricator lacks a "notification" feed similar other project management software: Enable "Flame" | |||
Resolved | • mmodell | T75791 Following Notification from "bell" menu does not clear notification | |||
Resolved | • mmodell | T103444 Desktop notifications | |||
Resolved | • mmodell | T97650 Conpherence wont refresh chat messages automatically, needs manual reload | |||
Resolved | • mmodell | T765 Enable notification server (real-time pop-up notifications) in Phabricator | |||
Resolved | BBlack | T112765 Phabricator needs to expose notification daemon (websocket) | |||
Resolved | • csteipp | T1286 Aphlict security review | |||
Resolved | BBlack | T134870 Support websockets in cache_misc | |||
Declined | BBlack | T107749 HTTP/1.1 keepalive for local nginx->varnish conns |
Security review is done. Note my comments about no aphlict.log, and making sure the Admin server is not exposed anywhere, when setting it up.
The security task is closed. Is this still stalled and Blocked-on-Operations ?
What would un-stall it? Should it be assigned to anyone in ops?
We need to make a plan to get connectivity through to the end host for this. This will probably fall on operations yes but approval to make such a plan is only 48 hours old. This may take a bit to get on the schedule.
The recent Phab upgrade chatter has had my teams ask me to check on this. I think it may have gotten swallowed by the holidays, among other conflicting priorities. Is there an update on the progress of getting connectivity through to the end host for this?
@chasemp just tagging you because your comment was last. Know of any progress towards the plan to unblock this?
I wish I had time to get into this. I'm trying to jog my own memory here and recall the standoff. Someone from Release-Engineering-Team could put up changes for both the misc-web reverse proxy and the LVS changes, but someone from ops would have to punch holes in the firewall allowing it to come in since this will operate a new port. The configuration for passthrough or bypass of nginx/varnish on misc-web was TBD iirc. One of the gents from Traffic (brandon or emanuele) are best suited to know what the Right Thing is, I would really just be asking them anyway.
So, as for next steps:
I'll see what we can do here versus our other priorities. (aka: manager speak for "too much on our plates, will try to do it").
but someone from ops would have to punch holes in the firewall allowing it to come in since this will operate a new port. The configuration for passthrough or bypass of nginx/varnish on misc-web was TBD iirc. One of the gents from Traffic (brandon or emanuele) are best suited to know what the Right Thing is, I would really just be asking them anyway.
We'll ping them when we (RelEng) have that first part above done.
this doesn't seem to be blocked on ops ATM, let us know when the pieces are in place and if we can help
We've basically never configured any websockets stuff through our Traffic layer before. Phab isn't the only use-case, either. We also have stream.wikimedia.org (rcstream) which doesn't currently flow through cache_misc (but we wish it did) because of questions/complexities about websockets.
There's some obvious google links on the topic here:
https://www.varnish-cache.org/docs/trunk/users-guide/vcl-example-websockets.html
https://www.nginx.com/blog/websocket-nginx/
The most important questions, to me anyways, are about whether we're trying to do websocket traffic in parallel with regular HTTPS traffic over the same basic channels (phabricator.wikimedia.org:443 -> cache_misc -> iridium.wikimedia.org:80), using upgrade/connection headers to signal the switch of a connection to WS-mode, or if phab's websocket stuff would be on a completely different public hostname and/or backend server and/or different port.
As far as i understand it this is to enable the "notification server" (T765) and that says "Enable real-time notifications. You must also run a Node.js based notification server for this to work. " , so based on that i assume it will be a different backend, that Node.js based server.
@Dzahn yes I believe so, port 22280 by default.
See "Terminating SSL with a Load Balancer" in the setup instructions.
Is the node.js notification service already running on iridium? Do we need some matching config in public DNS + private phab so that it knows its own public hostname/port?
(or, reading the docs, do we want to map phab.wm.o/ws/ to :22280? either way, it doesn't seem configured at all on the iridium side yet)
@BBlack: not set up on iridium because I wasn't entirely clear when/if it would become possible.
do we want to map phab.wm.o/ws/ to :22280 |
Yes I think that's how it should be set up.
There is minimal setup required to get the node.js service running. I can work on that soon but it will likely be a day or two before I have time to puppetize it.
What is the status, please?
Having T765: Enable notification server (real-time pop-up notifications) in Phabricator (this task is its blocker) finally resolved would help a lot to prevent increasing number of actions based on not knowing the task was update in the meantime...
There's a little bit of refactoring work (already in-progress) to do on the Varnish side to support it "correctly", but even if that weren't ready in time we can use DNS hacks (aliased alternate hostname) to do this today. The service needs to exist on iridium so we have something to point the traffic at first, though.
Thanks for the update @BBlack. To be honest, not knowing all the context, I am a bit confused, so I would appreciate any ETA (in a week, in a month, this quarter, by the end of year, not likely ever...) as well. Thank you.
I wouldn't be the one working on turning up the service on iridium, and I'm not sure who would, so I can't really answer that.
I don't think we can use an alternate host name (DNS hack) for websocket traffic because of browsers' same-origin policy. Also we need to terminate ssl just like we do with phabricator web traffic.
@BBlack: I will configure the notification service on iridium.
The hack isn't browser-facing, it's just internal request-routing stuff (e.g. setting up an "iridium-wss" hostname aliasing "iridium" to distinguish the varnish->iridium traffic for the distinct ports).
Change 313937 had a related patch set uploaded (by 20after4):
Configuration for Aphlict
@BBlack: https://gerrit.wikimedia.org/r/#/c/313937/ is a first-attempt at puppetizing the aphlict notification service
merged per prototype/"labs-only" no-op in prod http://puppet-compiler.wmflabs.org/4348/
Change 345617 had a related patch set uploaded (by 20after4):
[operations/puppet@production] Phab: User base:service_unit for aphlict
Change 345617 merged by Dzahn:
[operations/puppet@production] Phabricator: Use base:service_unit for aphlict
@BBlack https://gerrit.wikimedia.org/r/#/c/379005/ should be the last piece needed to get the notification server up and running on phab1001 so that we can test the networking pieces. I think I have a better understanding of the lvs & varnish parts now, so I can take a stab at writing the puppet config but I'll need someone from traffic to review and correct my mistakes.
Change 389782 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] Add phab1001-aphlict alias
Change 389794 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cache_misc: fix cookies websockets
Change 389794 merged by BBlack:
[operations/puppet@production] cache_misc: fix cookies websockets
@BBlack spent a bunch of time debugging issues with websockets + varnish, so thanks a lot for your time and expertise, Brandon! I definitely couldn't have done it.
Change 389799 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] aphlict: add the CNAME in codfw, too
Change 389799 merged by BBlack:
[operations/dns@master] aphlict: add the CNAME in codfw, too