Page MenuHomePhabricator

mediawiki - node SSR HTTP request follow-up [2*2h]
Closed, ResolvedPublic

Description

T206200 implemented the basics for getting the node SSR result into wikibase. Before this can be called ready for production we should invest more time to look at mensurability, edge cases, requirements by operations, ...

Braindump Topics

  • timeout => T215912
  • log failed connections => T215913
  • what happens on failure? (e.g. render the root element and trust in client-side re-rendering 1, 2)
    • how do we avoid storing a suboptimal result in the ParserCache?
  • TLS => answer from ops -> right now this isn't supported by them Should we make a ticket to propose this?
  • custom user agent incl. version information of the mw/wb system performing the request T217399
  • log download time => right now we cannot get these free from the network level. In the future maybe yes. We will have to log from the service
  • graceful service shutdown
    • add healthcheck T215920
    • configure helm T215921 question to ops: helm help, anyone? Configure /healthcheck to yield a useful result (e.g. during start, on graceful shutdown)
  • caching => until we decide we need to add a caching layer in between mediawiki and the SSR service this should be covered by T214679
  • node service performance

Comments

  • Traffic in "the opposite direction" is discussed in T209961

Event Timeline

Pablo-WMDE renamed this task from wikimedia - node SSR request follow-up to wikimedia - node SSR HTTP request follow-up.Nov 1 2018, 9:33 AM
Pablo-WMDE updated the task description. (Show Details)
Pablo-WMDE renamed this task from wikimedia - node SSR HTTP request follow-up to mediawiki - node SSR HTTP request follow-up.Nov 20 2018, 3:12 PM
Hanna_Petruschat_WMDE renamed this task from mediawiki - node SSR HTTP request follow-up to mediawiki - node SSR HTTP request follow-up [2*2h].Feb 6 2019, 3:15 PM

Hi @WMDE-leszek and @Addshore,
we thought about and added to the Topics and created (superficial) task for the things we deem needed before go live (mind you, this ticket is restricted to the traffic btw. mw and the node service, explicitly _not_ the other direction).
Could you please have a look at the questions indicated in red and see if you already have answers, or can point us to someone at the wmf to dive into these issues with?

Thanks

I had a quick chat with @Addshore, and I'll try to point to WMF teams we believe are the best suited to provide information on those topics, the particular team member who is to our knowledge the best suited/approachable (e.g. based in Europe) in the particular topic area, and also IRC channels where teams in questions are generally present. This if of course does not mean IRC is the only way to get answers, and the mentioned individuals are the ones you must be in contact with.
Email addresses of WMF staff can be found through the staff page https://wikimediafoundation.org/role/staff-contractors/

  • TLS => question to ops: how is this configured?
  • log download time => question to ops: can we get these metrics somewhere on the network level?

Those two should be under the responsibility of "Site Reliability Engineering" team. The possible contact person could be Giuseppe L. He goes as _joe_ on IRC, or as Joe here in phabricator. Team's IRC channel is #wikimedia-serviceops.

  • configure helm T215921 question to ops: helm help, anyone? Configure /healthcheck to yield a useful result (e.g. during start, on graceful shutdown)

That seems like the responsibility of the "Release Engineering" team. The person with helm expertise is definitely Tyler C. He goes as thcipriani on IRC and phabricator. He is unfortunately not based inside Europe timezones as far as I know. IRC channel of the channel is #wikimedia-releng.