Page MenuHomePhabricator

RFC: Backend for synchronized data from Wikipedia mobile apps
Open, Needs TriagePublic

Description

  • Affected components: TBD.
  • Engineer for initial implementation: TBD.
  • Code steward: TBD.

Motivation

Our mobile apps (namely the Android app, for now) will need to synchronize certain data with the user's account on Wikipedia, so that the user's data will persist across different devices, and be accessible from different platforms. This includes various preferences that the user sets within the app, as well as more complex user-created structures like private reading lists, which are currently being developed client-side.

While it is possible to use userjs preferences for storing this information, it becomes impractical for the more complex data (such as reading lists), because all userjs options are transmitted with each pageview for a logged-in user, which would make pageview payloads inefficient for heavy users of these features.

Requirements

(Specify the requirements that a proposal should meet.)


Exploration

Proposal: Authenticated key-value store

Implement a simple, private, per user key-value storage API.

Each user will have their own keyspace, and the keyspace used will always be that for the currently-authenticated user. There will be no access to other users' storage other than by logging into the other user's account (e.g. in a separate session). This avoids one of the major complaints about Gather: since Gather lists were publicly visible, it required policing for violations of policies which the community was not inclined to perform.

The store will provide no "revision history" and no logging: when updating or deleting a value, the old value is erased without possibility of recovery. Logging and/or history are required when a resource may be changed by multiple users or is publicly visible, neither of which are the case here and omitting this reduces the complexity of the implementation significantly.

Operations supported on the store will minimally include get, set, add, and delete. Ideally CAS will be supported for modifications, and ideally batch operations (e.g. multiple gets or sets in one request) will be allowed.

Open questions:

  • Should this be implemented as a MediaWiki action API endpoint or a restbase service?
    • As a MediaWiki action API endpoint, it would be available in all MediaWiki installations without further effort and could potentially reuse existing code for communicating with storage backends. @Anomie will likely write and maintain it in this case.
    • As a restbase service, it might be easier to integrate a backend that isn't already supported by MediaWiki, and the input format wouldn’t necessarily be constrained to being equivalent to HTTP form posts. A developer willing to create and maintain it would need to be found.
  • What backend should be used to store the data?
    • If we go the action API route: The easy solution would be an SQL table, much like the existing user_properties table. On the other hand, with a little effort we could abstract the backend so that different solutions can be plugged in without rewriting everything; in this case, would it be best to use an existing abstraction such as BagOStuff or create a new one?
  • What limits should be placed on the implementation?
    • Key length? (for comparison, user_properties limits to 255 bytes)
    • Value length? (for comparison, user_properties limits to 65535 bytes)
    • Total number of keys or total value size (per user)?
  • Should there be one store per wiki, or a global store? Or, in other words, should using the store require a centralized account?
  • Should expiration be supported?
  • Should enumeration of keys be supported? For example, "return all keys with prefix 'foo'".
  • Should non-string values be natively supported in some manner?
    • We recommend no. Clients may store non-string values in a serialized format (e.g. json), or they may use one key per value and an additional "index" key if necessary.
  • Should "tagging" be natively supported in some manner?
    • We recommend no. Clients wanting tagging can easily enough implement it on top of the existing storage by using a key to store the list of keys having a particular tag.
  • Does anyone have ideas for preventing misuse (cf. Commons being used for illegal file sharing) besides setting a relatively low limit on total data per user?

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We originally set the priority and put this in the backlog before @Anomie wrote the description and sent his update to wikitech-l.

If the request here is to share mobile app configuration between devices, I would like to better understand why people feel that storing and managing outside/external client applications' preferences should be the purview or responsibility of MediaWiki.

In thinking a bit more about this task and as already alluded to in previous comments, if MediaWiki's involvement is needed here, it already has both authenticated and unauthenticated key–value stores. The revision/text/page/user database tables provide one version. The watchlist/page/user tables provide a second version. The categorylinks/category/page tables provide a third version. The user_properties/user tables provide a fourth version.

I think pinning Gather's failure on having public lists dramatically misses the point that both Gather's implementation and deployment were poorly managed. A better example to look at might be MassMessage and its use of ContentHandler. With the type of flexibility that I think is being sought here—support for both blobs of structured data and support for things that might be a bit more arbitrary—MediaWiki's page objects already provide this. As an added bonus, they also come with versioning, monitoring, content suppression capability, anti-abuse features, limited write restrictions, and even more limited read restrictions.

If the limited write and read restrictions are truly problematic, I'd much rather see these two very common feature requests properly addressed instead of trying to work around them by building yet another separately managed set of tables.

I can't help but think of https://www.mediawiki.org/wiki/Everything_is_a_wiki_page. While using wiki pages undeniably has its own set of challenges, it immediately answers open questions in this task about limiting key and value sizes (wiki pages have a maximum page.page_title length and a maximum page content size) or listing keys by prefix (wiki page titles already support listing by prefix due to the unique index we place on page.page_title). Wiki pages also support "tagging" via the categorization system. (The categorization system is another piece of MediaWiki infrastructure that could desperately use love instead of building out yet another feature to indefinitely support.)

brion added a comment.Jul 20 2016, 6:43 PM

User preferences are not good for this sort of usage as they're packed into a single blob that gets shipped around, etc. Here, we want a separate blob that doesn't get shipped around in other places and may grow arbitrarily large (though will usually be either not present for a given user, or relatively small).

Watchlist doesn't cover the case because it's not the watchlist, it's a separate list. (And I don't know how much other data folks may want to store.)

Pages are conceivable as a backing store, but our data management model makes pages public by default, whereas preferences and watchlists are not. It also introduces a revisioning model that is explicitly not asked for here. It also pushes the storage for the data into a combination of primary page/revision database and the primary ES text store, something data folks are asking to avoid.

As far as I know, tagging, categories, and sorting anything in any particular order are not requirements asked for here.

Could perhaps be server-"accelerated" by doing a bunch of page status checks from the server-side copy of the list and returning a short list of 'pages needing handling', but I don't know if that's something that's planned or necessary.

If it were something the apps teams want to do, they'd need to write an extension that hooked the appropriate events and did whatever it is they want done. It's well outside the scope of this RFC.

I find this task difficult to follow.

MediaWiki core already has:

  • a per-user list in the form of Special:Watchlist
  • public page lists in the form of categories

While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists.

  • per-user preferences that work with MediaWiki extensions and gadgets

This is discussed in the description. See the second paragraph under "Background".

Regarding user preferences, how do OAuth applications handle this?

If you mean the app preferences, that's outside the scope of MediaWiki and OAuth. This would be for something like the configuration of the font size in the mobile app.

If you're meaning permission for OAuth consumers to access the stored data, most likely there would be a new user right that would have to be granted for an OAuth consumer to be able to make use of the store.

Can you really say that the data won't ever need to be joined?

Yes.

It seems like joining against the page table to check for page existence for reading lists, for example, is an obvious use-case.

If they want to do that with their reading lists, then they would need to implement something else to store them. BTW, such a thing would be complicated by the fact that their "reading lists" plan is to have one list with titles from multiple wikis, so a join would be complicated in any case. That's all well outside the scope of this proposal.

Not having any ability to undo or track changes also seems foolish given that you're dealing with users and user input. (While watchlists are similar in not tracking changes or allowing removed entries to be easily re-added, this is more of a bug than a feature.)

User preferences don't have history or undo either. I can't think of anything that's user-private that does, actually.

RobLa-WMF triaged this task as High priority.

We originally set the priority and put this in the backlog before @Anomie wrote the description and sent his update to wikitech-l. We're speaking about it in E234

Tgr added a comment.Jul 20 2016, 8:27 PM

Watchlist doesn't cover the case because it's not the watchlist, it's a separate list. (And I don't know how much other data folks may want to store.)

Global watchlists (T126641: [RFC] Devise plan for a cross-wiki watchlist back-end) are close enough and abstracting them into something more generic that can store multiple lists would be another way to fulfill the reading list use case. Maybe worth discussing at E235: ArchCom RFC Meeting W29: Devise plan for a cross-wiki watchlist back-end (2016-07-20, #wikimedia-office)? It would probably be more complex (especially given the slightly different requirements for the different types of lists) than a simple key-value store, though.

brion added a comment.Jul 20 2016, 8:30 PM

One open question that I see coming up in quick discussions is the abuse question -- there are I think two main areas of potential abuse for single-user-access blob storage:

  1. denial of service (storing lots of data for the lulz)
  2. inappropriate file sharing (store copyrighted or illegal files as blobs, share an account)

I suspect both methods can already be abused somewhat with user prefs and other things, though an overly simplistic per-user blob store could increase size limits (not sure if/what limits are on prefs now offhand). In both cases, an abuse tool would need client-side code of some kind (JS running on site, or some client tool) since it's not directly HTTP-addressable.

Formatting requirements could make it harder to store arbitrary file blobs, but nothing makes it impossible. (Eg, store your evil data as base64 strings in a JSON structure.)

(not sure if/what limits are on prefs now offhand)

There's no limit on number of entries or total size, as far as I know.

How would everyone feel about discussing this in next week's ArchCom RFC office hour (E237)?

How would everyone feel about discussing this in next week's ArchCom RFC office hour (E237)?

Works for me.

User preferences are not good for this sort of usage as they're packed into a single blob that gets shipped around, etc. Here, we want a separate blob that doesn't get shipped around in other places and may grow arbitrarily large (though will usually be either not present for a given user, or relatively small).

We could fix/change this architecture. I filed T140858 specifically about reconsidering outputting every user option into the page HTML.

I find this task difficult to follow.

MediaWiki core already has:

  • a per-user list in the form of Special:Watchlist
  • public page lists in the form of categories

While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists.

Let's say that tomorrow you had this authenticated key–value store implemented, what would you use it for specifically? I get the feeling that every time someone asks for a use-case, there's weirdly a lot of hand-waving.

  • per-user preferences that work with MediaWiki extensions and gadgets

This is discussed in the description. See the second paragraph under "Background".

Quoting that paragraph:

While it is possible to use userjs preferences for storing this information, it becomes impractical for the more complex data (such as reading lists), because all userjs options are transmitted with each pageview for a logged-in user, which would make pageview payloads inefficient for heavy users of these features.

So we already have an authenticated per-user key–value store. Could you, for example, add a new key/prefix, similar to userjs, that simply gets omitted from the HTML?

Regarding user preferences, how do OAuth applications handle this?

If you mean the app preferences, that's outside the scope of MediaWiki and OAuth. This would be for something like the configuration of the font size in the mobile app.

You say that app preferences are outside the scope of MediaWiki... what do you want to use this authenticated key–value store for, exactly? I'm still lost.

Let's say that tomorrow you had this authenticated key–value store implemented, what would you use it for specifically?

What would I use it for? Nothing. I'm not involved with the mobile apps.

So we already have an authenticated per-user key–value store. Could you, for example, add a new key/prefix, similar to userjs, that simply gets omitted from the HTML?

Such a thing would be possible, but would make the code handling user options even more complex.

Regarding user preferences, how do OAuth applications handle this?

If you mean the app preferences, that's outside the scope of MediaWiki and OAuth. This would be for something like the configuration of the font size in the mobile app.

You say that app preferences are outside the scope of MediaWiki... what do you want to use this authenticated key–value store for, exactly? I'm still lost.

MediaWiki and the OAuth extension don't care how an app stores its user preferences, although this proposed API would be one way that an app could do so.

To clarify a few points, in anticipation of E237...

The Android app actually already uses user-js preferences for storing some of our user settings within the app. The only thing that prevented us from using user-js from saving reading lists is the fact that they're transmitted unconditionally with every pageview.

So, technically all we would need is something that's equivalent to user-js, except not sent with every pageview (sent only upon specific request). We wouldn't need it to support tagging / categories / sorting / expiration / etc. A lot of that can be done client-side, if necessary.

Either that, or...

If we single out reading lists as a use case, the "ultimate" way of saving them is to implement multiple watchlists in MediaWiki, and allow watchlists to be named, and allow watchlists to be cross-wiki. But AFAIK this is still a long way off.

So, a good intermediate solution should be a balance of something that's painless to implement in the backend, but flexible and general enough to support the needs of the apps (and other clients) in the short- to possibly-long term.

Tgr added a comment.EditedJul 27 2016, 6:32 PM

We could fix/change this architecture. I filed T140858 specifically about reconsidering outputting every user option into the page HTML.

There are many problems with using user_properties:

  • they are loaded on every request, which would lead to memory and performance issues if they were used more heavily
  • they are output into the HTML on every request
  • there is no way to query them individually via the API (all options will be output into the API response)
  • the values are limited to 65535 bytes which is actually not that much for storing something like article lists (the limit for page titles is 255 bytes so a few hundred article names fit into a single record at most, even if you go for an efficient format and not e.g. JSON with metadata which would otherwise be a much more convenient choice)
  • they are not cross-wiki (you can get around that by selecting a specific wiki and using that, like the mobile apps do with meta, but there are many problems with that - users not being logged in on that wiki, users not having an attached account on that wiki, having to build in assumptions about Wikimedia's farm setup into supposedly generic software)
  • in general mixing a key-value store (userjs-*) with an internal configuration store seems like questionable design.

In short, user_properties is aimed at storing a small amount of internal per-wiki configuration settings which are needed very often. The key-value store would be aimed at storing large amounts of global data which are needed infrequently. It could replace userjs-* keys, although those do not seem heavily used anyway.

(FWIW enwiki users now have 58 different userjs-* keys, out of which 10 seem to be used by more than a handful users. Mostly those seem to be related to WMF products, with the exception of Wikipedia:igloo. Other large wikis use them even less.)

Anomie updated the task description. (Show Details)Jul 27 2016, 8:50 PM
Anomie updated the task description. (Show Details)
daniel added a subscriber: daniel.Jul 27 2016, 9:40 PM

There is a proposed standard called "RemoteStorage" that seems to fit the bill pretty well: https://remotestorage.io/. It defines an OAuth protected online storage interface based on a REST interface. The idea being that clients should be able to choose where they store their data, and storage providers should all talk the same protocol. The spec seems to be pretty mature, see https://datatracker.ietf.org/doc/draft-dejong-remotestorage/. I'm not sure how mature the available implementations are, but even if we end up writing our own K/V storage interface, we should perhaps follow this spec. Or at least evaluate it.

GWicke added a subscriber: GWicke.EditedJul 27 2016, 9:41 PM

Another point we should consider is the requirement for data format versioning & migration. While it is fine to handle all validation & migration in a single client, doing the same consistently across a large number of clients would be a challenge at best. Essentially, unconstrained blobs would make the data private to a single client. Even a single client like the app can run into problems. For example, an old version of the Android app might not have the code to gracefully deal with newer formats unless formats are only ever changed in backwards-compatible ways, and unknown information is carefully preserved.

All of this would be less of an issue if we handled schema validation & migration on the server. Setting up a key-value bucket for each use case along with a schema & documentation is pretty easy, and I think would provide a better balance of flexibility, stability & usability. The per-usecase separation would allow us to document & version each API, deprecate & drop APIs that are no longer needed, and set appropriate quotas & rate limits per use case.

Great IRC conversation today! I've posted the log in the event (E237). Gergo's concerns outlined in T128602#2499662 sound like dealbreakers for the user_properties approach. There's a more careful summarization of E237 that one of us should do, but it seems like either a new DB table, a RESTbase backend, or some sort of 3rd party storage system are the most viable alternatives we discussed (right?)

Here's the beginning of a detailed comment @jcrespo made on E237:

In E237#2836, @jcrespo wrote:

For the specific use case of reading lists (this is not for the general case, everything you mentioned regarding generic store solution still stands):

Jaime then proposes a schema and shows how we might use it. If I read this correctly, Jaime's comments seem to comport with what @tstarling suggested during the meeting:

21:28:19 <TimStarling> my vote is to just add a table
...
21:28:46 <TimStarling> avoid joins so that you can hack up some cross-server thing later if need be, query groups or something

Both of you seem to be suggesting that we avoid overengineering something to try solving the specific case of better watchlist management, correct?

Hopefully these two goals aren't mutually exclusive:

  1. Accelerating progress on our watchlist data architecture
  2. Providing mobile apps (and other client software) generic key-value storage for ease of developer prototyping of account-specific features
daniel added a comment.EditedAug 3 2016, 8:21 PM

I would like to push a bit more on looking into the IETF RemoteStorage spec proposal. It seems to fit the bill quite well. I believe that RemoteStorage should be evaluate to answer two questions:

  • can we use an existing RemoteStorage implementation to cover the needs put forth by this RFC? (And if not, why not?)
  • If we can't use an existing implementation, should we implement the RemoteStorage protocol as specified? (And if not, why not?)

From skimming the spec, it seems like a good approach, and designed exactly for our use case. So we should at least consider going with a (semi-)well known spec.

I would like to push a bit more on looking into the IETF RemoteStorage spec proposal.

It's worth pointing out a couple of things about that spec:

  • It's an Internet Draft. Those are fairly easy to produce, and all have this disclaimer: Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
  • It doesn't appear to be the product of a working group. One would have to dig further to figure out what that means.
  • The version you're citing "Expires: 15 December 2014". Fortunately, there appears to be a more up-to-date version of it available: https://datatracker.ietf.org/doc/draft-dejong-remotestorage/

My inclination is also to break our habit of attempting to improve upon crufty inventions like the wheel. That said, this particular spec doesn't appear to be ready (based on my cursory investigation).

There are probably some good lessons to be learned from the document, and from the discussions around it. It's probably worth at least a skim. Does it make sense to include this in the reference material for this RFC?

  • If we can't use an existing implementation, should we implement the RemoteStorage protocol as specified? (And if not, why not?)

If our own implementation is in the action API, "why not?" is because that draft's request/response paradigm (using HTTP verbs and statuses) doesn't match the action API's (using HTTP as a transport, with its own verbs and status reporting mechanism) at all. OTOH, the draft's paradigm seems ideally suited to restbase as I understand it.

Tgr added a comment.EditedSep 14 2016, 3:18 AM

I took some time to look into remoteStorage. The communitiy behind the spec seems reasonably active, there are multiple people involved in maintaining the draft, regular pull requests / wiki edits / forum posts; there are at least four independent server implementations, and lots of clients. (See github:remotestorage/spec:source.txt for the latest version of the spec, community.remotestorage.io and wiki.remotestorage.io for the community, unhosted.org for the wider vision.)

There are two design choices that make it a poor match for our use case: that they are aiming for more of a file server than a key-value store, and that they are aiming for public stores (ie. a store might be tied to a specific user, but not to a specific application or application provider). Which makes sense; TBH the point of trying to conform to a public data read/write standard (with capability discovery and a protocol for asking user permission and whatnot) for implementing our own data storage is entirely beyond me.

More specifically,

  • remoteStorage stores documents, not strings; that results in a bunch of requirements that we otherwise wouldn't need (the ability to store item metadata such as content-type, ability to handle chunked PUT requests, a rather complex versioning scheme)
  • remoteStorage implementations must be able to list the keys. It's implemented in a way that imitates a directory tree, which would force us to care about "directories" not growing too large, which would mean appliations could not choose the keys freely and could be forced to use some sort of hashed subdir scheme.
  • Authentication is by CORS and OAuth 2.0 bearer tokens or Kerberos. CORS is problematic if we ever want to use the K-V store in browsers, since IE9 XDomainRequest does not support REST verbs; we can avoid that problem by hosting on the same domain but that would make it even more pointless than it already is. OAuth means we would have to host two OAuth servers, one for MediaWiki, the other for the remoteStorage service, both with their own authorization dialogs, which seems super confusing. (Or I suppose we could set up another endpoint where a client can exchange a session ID for a bearer token... ofc that would mean not using any of the reference implementations.) Kerberos is only mentioned in passing as an alternative and I'm not familiar with it, so no idea how that would work.
Tgr added a comment.Sep 14 2016, 4:02 AM

I'll try to summarize what are the options and their status as I understand them:

  1. do not build a key-value store, write a dedicated domain-specific API every time we need authenticated data storage (for the current apps need which resulted in this RfC, that would probably mean some sort of multiple watchlists feature in core).
    • That seems like the ideal long-term solution. I mainly see the key-value store as a rapid prototyping tool that would allow us to quickly build client features that need data storage, and cheaply modify or discard them as our understanding of use cases changes. (Yes, the WMF shares the standard problem where prototypes are promoted to final products without any change out of laziness / lack of resources. That does not mean prototypes are a bad idea; not following up on them is.)
    • In the shorter term, I worry that we would build an API without a good understanding of what we need of it (we certainly did that with Gather, which had a somewhat similar scope). Also, there is some major work happening on watchlists already; trying to work in parallel on cross-wiki watchlists and multiple watchlist would be unlikely to work out well.
  2. build it as a new action API module.
    • This still seems like the reasonable thing to me. We could freeride on a lot of things that MediaWiki and the API already provides (authentication, DB handling with cross-wiki access, continuations/batching, centralauthtoken API etc) so it would be fairly easy to do.
  3. add it to the options / userinfo APIs (the user_props table) with some hacks, e.g. don't embed options starting with a _ into the page HTML
    • still seems like a bad idea to me for the reasons expressed in T128602#2499662 (most fundamentally, it would be a big pile of hacks and I don't see what the advantage would be compared to doing it cleanly; it's doubtful that it would be significantly less work)
  4. build it as a RESTBase service.
    • This also looks like a reasonable option, although I don't know enough about RESTBase to really judge (we get a lot of things free for the action API, not sure how much that'd be true for RESTBase). In any case, that would be out of scope for Reading Infrastructure; maybe MCS could do that.
  5. use some external tool, such as a server that implements remoteStorage.
    • remoteStorage seems to be a poor match per T128602#2635622; I don't think any other alternative came up.

The RfC was stalled on making a decision (and on reviewing the remoteStorage draft which is now done). What would be the process for moving it forward?

He7d3r added a subscriber: He7d3r.Oct 17 2016, 12:54 PM
dr0ptp4kt moved this task from Backlog to Next Up on the Reading-Admin board.

I see Remotestorage has some downsides, I guess it also depends on how much you'd like to give users the ability to choose their own server to store their data on rather than 'just' putting it on a wikimedia server.

If Remotestorage is a contender still, perhaps we could do something together in that area with Nextcloud - we'd like to give people the ability to store data in a location of their choice (be it a private Nextcloud server or one at a provider) and I can imagine a scenario where WikiMedia would run a Nextcloud instance to store this (and other?) data for users while enabling the users to, instead, pick a server of their own choosing as store. There are already some 5-10 million Nextcloud users (I'm counting its predecessor, ownCloud, here too) and we would be interested in having Remotestorage support.

Now even without Remotestorage but with a more custom API, Nc might be a solution for data storage, we have a rather nice app development interface which would make it perhaps easier to support various data storage needs. WebDAV/CalDAV and CardDAV are already built in. And scaling Nextcloud is a pretty 'solved' problem. See https://docs.nextcloud.com/server/10/admin_manual/installation/deployment_recommendations.html - note that the numbers there assume every user connects 2-3 times per minute to the server (sync clients...), which in case of an app just querying for bookmarks upon usage means you can probably scale easily 100x easier.

Just some food for thought ;-)

Aklapper removed RobLa-WMF as the assignee of this task.Nov 7 2016, 11:11 PM
daniel changed the task status from Open to Stalled.Jul 22 2019, 4:50 PM

Interest in this seems to have stalled. Please re-open if there is still a need for this.

Restricted Application added a project: Platform Engineering. · View Herald TranscriptDec 18 2019, 8:28 AM
kostajh changed the task status from Stalled to Open.Dec 18 2019, 8:30 AM
kostajh added a subscriber: pmiazga.

Interest in this seems to have stalled. Please re-open if there is still a need for this.

I'd say we (Growth-Team) are interested in this, per T241037 and T223645), and I think Readers Web (cc @pmiazga) have interest in it as well.

I had a Gerrit repo created for this a while back, thinking I'd work on it in the course of a separate project, but ended up going in a different direction.

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AuthenticatedKeyValueStore

daniel raised the priority of this task from High to Needs Triage.Dec 18 2019, 9:47 PM
Krinkle moved this task from Under discussion to Old on the TechCom-RFC board.Dec 18 2019, 9:48 PM
Krinkle added a subscriber: Krinkle.

Moving to backlog for until it is more clear which team(s) tentatively commit to maintaining/resourcing implementation. It sounds like the underlying product use case(s) may also be accommodated in the platform by other means outside this RFC. We'll await the outcome of that decision.

Ok, sounds like people want the feature, but for it to continue to be considered it needs a team to adopt and resource it. Does one of the teams mentioned above want to move this forward?

eprodromou added a subscriber: eprodromou.

This proposal ticks a lot of the boxes we use to evaluate projects: there's a clear need, there's current interest from the Product department, and it also seems to be related to our recent work with Kask, MainStash, ....

I've moved this project to our Future Initiatives board, and we'll be evaluating for resourcing and scheduling. If CPT decides not to pick up this project, we'll let everyone know ASAP (and why).

Joe added a subscriber: Joe.Mar 4 2020, 8:52 AM

I think the task description here is mixing a feature request and implementation details.

AIUI the feature request is to be able to store data (not sure why key-value only access is declared, we can revisit that) on the backend on a per-user basis, where "users" are "wiki logged in users" as I understand it.

My questions would be:

  • Why a different storage is needed compared to the current user preferences system?
  • Why a generic interface rather than providing a more structured interface to interact with user data via the mediawiki API?
  • Did privacy considerations (in particular regarding regulations worldwilde) have been taken into account?

FWIW, I don't think we should focus on creating a generic "per-user storage space" unless it's specifically needed, and should either expand or reuse the current framework for storing user preferences instead.

bearND added a comment.Mar 4 2020, 3:59 PM

Why a different storage is needed compared to the current user preferences system?

is addressed in the task description:

While it is possible to use userjs preferences for storing this information, it becomes impractical for the more complex data (such as reading lists), because all userjs options are transmitted with each pageview for a logged-in user, which would make pageview payloads inefficient for heavy users of these features.

Tgr added a comment.Mar 5 2020, 5:09 AM

Why a generic interface rather than providing a more structured interface to interact with user data via the mediawiki API?

Because storing user-specific application data is a generic need. It would be a waste of everyone's time to build a separate API every time someone needs to do that.
(In practice people tend to do the user options API to avoid that, and as the RFC explains that's an antipattern.)

Did privacy considerations (in particular regarding regulations worldwilde) have been taken into account?

To the extent they are taken into account for other features, yes.

FWIW, I don't think we should focus on creating a generic "per-user storage space" unless it's specifically needed, and should either expand or reuse the current framework for storing user preferences instead.

I'm not sure what's the difference between a generic per-user storage space and expanding the current framework for storing user data.
Also between generically needed and specifically needed. Generically needed means it's specifically needed for multiple specific things, I'd think.

Krinkle renamed this task from Create and deploy an extension that implements an authenticated key-value store. to RFC: Authenticated key-value store.Mar 11 2020, 1:36 AM

Is there still a current specific product use case for this? That would help us evaluate if something generic vs. a specific implementation makes more sense.

Joe added a comment.Mar 12 2020, 7:08 AM

I'm not sure what's the difference between a generic per-user storage space and expanding the current framework for storing user data.
Also between generically needed and specifically needed. Generically needed means it's specifically needed for multiple specific things, I'd think.

That expanding the current framework will not try to create a new storage paradigm with a wide enough API to be used for anything. To further explain my worries - I think that creating a general interface for storing user-related data data in key-value format would push everyone to the new anti-pattern of shoehorning anything into it. Including data that would have some relation to other data (I can think of reading lists as such things).

Tgr added a comment.Mar 12 2020, 8:03 AM

Usually well worth the trade-off of not having to do schema changes, IMO.

Usually well worth the trade-off of not having to do schema changes, IMO.

In order for TechCom to guide on whether a dedicated or general solution is appropiate within our environment, it needs to know what feature or product the RFC is (initially) for.

Krinkle renamed this task from RFC: Authenticated key-value store to RFC: Backend for synchronized data from Wikipedia mobile apps.Apr 3 2020, 9:06 PM
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.

(Updated to use the new RFC template. Re-triaging as phase 1 as the original requirements appear to no longer apply.)