Page MenuHomePhabricator

Enable use of session storage service in MediaWiki
Open, NormalPublic2 Story Points

Description

Implement storage of MediaWiki sessions in the session storage service.

See: RESTBagOStuff
See also: T210717: Find an alternative to HHVM curl connection pooling for PHP 7

Event Timeline

Eevans created this task.Feb 7 2019, 4:58 PM
Eevans triaged this task as Normal priority.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptFeb 7 2019, 4:58 PM
Clarakosi moved this task from Backlog to Next on the User-Clarakosi board.Feb 8 2019, 5:43 PM

Should this task go to @BPirkle now?

Eevans reassigned this task from Eevans to BPirkle.Mar 15 2019, 3:54 PM

Should this task go to @BPirkle now?

Yeah.

Relevant documentation: https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI
Development instance is at deployment-sessionstore01.deployment-prep.eqiad.wmflabs

I have, on my local development wiki, a working KaskBagOStuff (naming suggestions welcome) that communicates with the development instance at deployment-sessionstore01.deployment-prep.eqiad.wmflabs (via some tricky port forwarding) and successfully allows basic session management for happy login and logout.

I reviewed the following pages, which have some wide-ranging discussion:

To make sure I understand the conclusions, I have a few questions:

  1. ERRORS: there is mention of using the more descriptive error information from the API. Within the context of SessionManager/BagOStuff/MultiHttpClient, does that mean adding that to the existing logging output? Or is there somewhere else that detailed error information should appear?
  2. FORMAT: does it matter what format I store the structured session data in? I saw mention of JSON, and that's what my exploratory code uses right now. I don't think the API/service really cares (that's the beauty of opaque key/value storage). But with the extensive discussion of future key/map stores, including the possibility of someday setting fields within sessions, etc. I'd like to confirm JSON is acceptable/desirable for the implementation I'm writing now.
  3. CHECK-AND-SET: BagOStuff/RESTBagOStuff has CAS support, but there's a FIXME tag related to the $casToken in RESTBagOStuff::doGet(). Is this something I need to dig into, or is the existing behavior sufficient for everything that will use the new service?
  4. COLLATERAL DAMAGE: all I've tested is login and logout, and that's just locally on my MediaWiki vagrant. I'm concerned there are major things I'm missing. I see discussion about CentralAuth and OAuth, but I'm frankly clueless about how/if these are affected by what I'm doing. What might I be breaking that I need to be concerned with?
  5. SET-IF-NOT-EXISTS: the API uses POST (rather than RESTBagOStuff's PUT) for sets, so my KaskBagOStuff uses POST as well. Per https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI#Notes, I will do nothing with PUT at this time, and we will not initially have set-if-not-exists support. This isn't really a question, I just wanted to explicitly state it.
  6. TTL: similarly, there is significant discussion of per-write TTLs, probably via a Cache-Control header, but the notes say this is for the future and we will initially not have per-write TTLs.

I think most of these are probably best addressed by someone familiar with session management in MediaWiki, but...

I have, on my local development wiki, a working KaskBagOStuff (naming suggestions welcome) that communicates with the development instance at deployment-sessionstore01.deployment-prep.eqiad.wmflabs (via some tricky port forwarding) and successfully allows basic session management for happy login and logout.

Out of curiosity, what is it that prevents you from using RESTBagOStuff?

I reviewed the following pages, which have some wide-ranging discussion:

  • https://phabricator.wikimedia.org/T206010
  • https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI
  • https://www.mediawiki.org/wiki/Talk:Requests_for_comment/SessionStorageAPI
  • https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-11-14-22.04.log.html To make sure I understand the conclusions, I have a few questions:
    1. ERRORS: there is mention of using the more descriptive error information from the API. Within the context of SessionManager/BagOStuff/MultiHttpClient, does that mean adding that to the existing logging output? Or is there somewhere else that detailed error information should appear?
    2. FORMAT: does it matter what format I store the structured session data in? I saw mention of JSON, and that's what my exploratory code uses right now. I don't think the API/service really cares (that's the beauty of opaque key/value storage). But with the extensive discussion of future key/map stores, including the possibility of someday setting fields within sessions, etc. I'd like to confirm JSON is acceptable/desirable for the implementation I'm writing now.
    3. CHECK-AND-SET: BagOStuff/RESTBagOStuff has CAS support, but there's a FIXME tag related to the $casToken in RESTBagOStuff::doGet(). Is this something I need to dig into, or is the existing behavior sufficient for everything that will use the new service?
    4. COLLATERAL DAMAGE: all I've tested is login and logout, and that's just locally on my MediaWiki vagrant. I'm concerned there are major things I'm missing. I see discussion about CentralAuth and OAuth, but I'm frankly clueless about how/if these are affected by what I'm doing. What might I be breaking that I need to be concerned with?
    5. SET-IF-NOT-EXISTS: the API uses POST (rather than RESTBagOStuff's PUT) for sets, so my KaskBagOStuff uses POST as well. Per https://www.mediawiki.org/wiki/Requests_for_comment/SessionStorageAPI#Notes, I will do nothing with PUT at this time, and we will not initially have set-if-not-exists support. This isn't really a question, I just wanted to explicitly state it.

I think I mentioned this before, but I'll leave it here for posterity sake. Cassandra does have proper CAS support (via paxos), and so implementing this in Kask for session storage is doable. That said, it comes at a cost (extra round trips which limit throughput and increase latency) and so I'd like to avoid implementing this in Kask unless we need it. During the RFC process there seemed to be consensus that the read-modify-write that BagOStuff does in this context was safe, if that's the case, I think it'd be preferable to continue doing that. It might be beneficial though to double-check that assumption.

  1. TTL: similarly, there is significant discussion of per-write TTLs, probably via a Cache-Control header, but the notes say this is for the future and we will initially not have per-write TTLs.

During the RFC process, I took away that we needed exactly two TTL periods, one for sessions, and one for central auth...state (if that's the write word), where the latter needed to be somewhat less than the former to ensure correctness. If this is correct (and again, it would be good to double-check this assumption), then we could support this by deploying a separate instance of Kask for central auth (I think there is an argument for this being a Good Idea™ anyway).

Again, I'd only push back on implementing TTL support in the event we don't really need it, but this is a far smaller deal than with CAS above. If it would be at all helpful, let me know and we can add it straightaway. And, for what it's worth, it has been suggested that if we do support TTLs, that we require them to be less-than-or-equal-to the configured default.

I have, on my local development wiki, a working KaskBagOStuff (naming suggestions welcome) that communicates with the development instance at deployment-sessionstore01.deployment-prep.eqiad.wmflabs (via some tricky port forwarding) and successfully allows basic session management for happy login and logout.

Out of curiosity, what is it that prevents you from using RESTBagOStuff?

RESTBagOStuff is hard-coded to use PUT, while the API uses POST. Also, RESTBagOStuff is hard-coded to use PHP serialization for the value data, which I'm guessing nobody is in favor of. (Well, Joe wanted "json storing base64-encoded serialized php"... <grin>)

I rearranged things a bit so that RESTBagOStuff and KaskBagOStuff share almost all their code, except overrides for the essential differences, so there's not really going to be a lot of new code for this, mostly just some shuffling.

TTL: similarly, there is significant discussion of per-write TTLs, probably via a Cache-Control header, but the notes say this is for the future and we will initially not have per-write TTLs.

During the RFC process, I took away that we needed exactly two TTL periods, one for sessions, and one for central auth...state (if that's the write word), where the latter needed to be somewhat less than the former to ensure correctness. If this is correct (and again, it would be good to double-check this assumption), then we could support this by deploying a separate instance of Kask for central auth (I think there is an argument for this being a Good Idea™ anyway).

Again, I'd only push back on implementing TTL support in the event we don't really need it, but this is a far smaller deal than with CAS above. If it would be at all helpful, let me know and we can add it straightaway. And, for what it's worth, it has been suggested that if we do support TTLs, that we require them to be less-than-or-equal-to the configured default.

Do the existing configuration variables allow us to point session storage at one Kask instance and Central Auth at a different instance, or is there a bit more work for me to do there? (Not sure who this question is pointed at, but someone is bound to know...)

I have, on my local development wiki, a working KaskBagOStuff (naming suggestions welcome) that communicates with the development instance at deployment-sessionstore01.deployment-prep.eqiad.wmflabs (via some tricky port forwarding) and successfully allows basic session management for happy login and logout.

Out of curiosity, what is it that prevents you from using RESTBagOStuff?

RESTBagOStuff is hard-coded to use PUT, while the API uses POST. Also, RESTBagOStuff is hard-coded to use PHP serialization for the value data, which I'm guessing nobody is in favor of. (Well, Joe wanted "json storing base64-encoded serialized php"... <grin>)

I rearranged things a bit so that RESTBagOStuff and KaskBagOStuff share almost all their code, except overrides for the essential differences, so there's not really going to be a lot of new code for this, mostly just some shuffling.

AFAIK, RESTBagOStuff was written for this purpose (though quite some time ago, long before the RFC established what the API would be). I wonder if you can just...change it?

TTL: similarly, there is significant discussion of per-write TTLs, probably via a Cache-Control header, but the notes say this is for the future and we will initially not have per-write TTLs.

During the RFC process, I took away that we needed exactly two TTL periods, one for sessions, and one for central auth...state (if that's the write word), where the latter needed to be somewhat less than the former to ensure correctness. If this is correct (and again, it would be good to double-check this assumption), then we could support this by deploying a separate instance of Kask for central auth (I think there is an argument for this being a Good Idea™ anyway).

Again, I'd only push back on implementing TTL support in the event we don't really need it, but this is a far smaller deal than with CAS above. If it would be at all helpful, let me know and we can add it straightaway. And, for what it's worth, it has been suggested that if we do support TTLs, that we require them to be less-than-or-equal-to the configured default.

Do the existing configuration variables allow us to point session storage at one Kask instance and Central Auth at a different instance, or is there a bit more work for me to do there? (Not sure who this question is pointed at, but someone is bound to know...)

I'm curious about this as well.

WDoranWMF set the point value for this task to 2.Mon, Apr 15, 3:23 PM