ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office)

Hosted by daniel on Jul 27 2016, 9:00 PM - 10:00 PM.



  • Location: #wikimedia-office IRC channel
  • Topic: T128602: Create and deploy an extension that implements an authenticated key-value store
  • Meeting type: Problem definition

Meeting summary

  • LINK: https://phabricator.wikimedia.org/E237 Phab event link (robla, 21:01:38)
  • LINK: https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC (robla, 21:02:09)
  • Question discussed: does the store need to be authenticated? (robla, 21:05:38)
  • conversation turned to discussion of use cases for Mobile App (robla, 21:08:47)
  • Discussion turned to authentication possibilities, and then to using user_props (robla, 21:24:36)
  • LINK: https://www.mediawiki.org/wiki/Manual:User_properties_table (Marybelle, 21:26:03)
  • LINK: https://remotestorage.io/ (DanielK_WMDE__, 21:27:11)
  • DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC (robla, 21:37:08)
  • ACTION: ArchCom needs to bump the priority on a watchlist specific RFC (robla, 21:56:18)

People present (lines said)

  • brion (78)
  • tgr (37)
  • Marybelle (36)
  • robla (31)
  • DanielK_WMDE__ (30)
  • anomie (28)
  • SMalyshev (25)
  • gwicke (22)
  • dbrant (14)
  • niedzielski (12)
  • Scott_WUaS (7)
  • wm-labs-meetbot` (3)
  • TimStarling (3)
  • mdholloway (2)
  • Pchelolo (1)

Full log

121:00:40 <robla> #startmeeting ArchCom 2016W30: authenticated key-value store
221:00:40 <wm-labs-meetbot`> Meeting started Wed Jul 27 21:00:40 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:00:40 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:00:40 <wm-labs-meetbot`> The meeting name has been set to 'archcom_2016w30__authenticated_key_value_store'
521:01:01 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
621:01:25 <Marybelle> tgr: When you say "large amounts of global data which are needed infrequently", what do you mean specifically?
721:01:38 <robla> #link https://phabricator.wikimedia.org/E237 Phab event link
821:02:09 <robla> #link https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC
921:02:13 <tgr> the specific use case that resulted in this RfC was reading lists
1021:02:42 <tgr> ie lists of favorite articles which are synchronized across devices
1121:03:03 <Marybelle> > While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists.
1221:03:05 <robla> dbrant and anomie: thanks for your work on this!
1321:03:07 <Marybelle> This task makes no sense to me.
1421:03:28 <robla> I think maybe we can start with the list of questions in the task
1521:03:37 <Marybelle> If you want a private list of favorite articles, that sounds like a watchlist.
1621:03:41 <SMalyshev> question: does the store itself needs to be authenticated? I mean, we store sessions relying just on long random ID to be secure. Can't we use another long random ID to secure prefs?
1721:04:06 <Scott_WUaS> Hi All!
1821:04:16 <niedzielski> o/
1921:04:18 <brion> SMalyshev: by default you'd do it behind the API which is authenticated
2021:04:23 <anomie> SMalyshev: Sessions are that way because of the bootstrapping problem: you have to start *somewhere*.
2121:04:32 <brion> No need to invent new auth methods
2221:04:47 <robla> alright, I guess we'll start with SMalyshev 's question, and then move to "Should this be implemented as a MediaWiki action API endpoint or a restbase service?"
2321:04:51 <mdholloway> SMalyshev: at least for the apps' use case, the reading lists should be private and therefore we'd need authentication
2421:04:54 <mdholloway> hi all, btw
2521:04:55 <tgr> SMalyshev: not sure what you mean by the store itself being authenticated
2621:05:06 <SMalyshev> brion: well, yes, if it's mediawiki API then sure. but understand it's one of the options?
2721:05:10 <Marybelle> robla: I think focusing on implementation before defining use-cases is silly.
2821:05:11 <tgr> the API should authenticate, just like sessions do
2921:05:38 <robla> #info Question discussed: does the store need to be authenticated?
3021:05:56 <brion> Marybelle: the implementation of the store? Or the client side app feature that uses it?
3121:06:06 <anomie> If it's in the action API, it'll be authenticated just like anything else. If it's in restbase, I think that has support for the same sort of thing (but I don't know details).
3221:06:16 <tgr> how that's represented in the backend is an implementation details that is IMO low relevance compared to other factors for choosing backends
3321:06:20 <brion> The app is a thing that already exists and already has a reading list feature.
3421:06:23 <Marybelle> brion: The use-cases of having another authenticated store.
3521:06:43 <Marybelle> If this is for private lists of articles, that sounds like a watchlist.
3621:07:02 <brion> Marybelle ok. The use case is to store per user private data on the server and be able to retrieve and update it, with no versioning.
3721:07:23 <Marybelle> We do that already in several places. :-)
3821:07:47 <Marybelle> I'm not sure making yet another private place is needed.
3921:07:55 <brion> Marybelle: yes, in ways that don't match this use case as already replied on the bug.
4021:08:08 <Marybelle> Okay. I guess I'm the only person not getting it.
4121:08:09 <brion> Either we can change one of those methods to match, or make a new one.
4221:08:13 <tgr> note that the apps already store global authenticate private data, but they are currently using user_props for that, which is not terrible but suboptimal
4321:08:13 <dbrant> Marybelle: this is a sort of generalization of watchlists, where the user can have multiple lists, each with its own attributes (name, description, whether it's saved for offline reading, etc)
4421:08:26 <tgr> so reading lists are not in fact the only current use case
4521:08:47 <robla> #info conversation turned to discussion of use cases for Mobile App
4621:08:56 <niedzielski> tgr: we currently use user prefs to store theme info. we'd probably migrate that to this new store
4721:09:02 <Marybelle> brion: Right. I'd really like to avoid yet another special thing to maintain indefinitely.
4821:09:05 <niedzielski> (just as an example)
4921:09:21 <brion> *nod* keep em simple when possible :)
5021:09:22 <SMalyshev> reading https://phabricator.wikimedia.org/T128602#2499662 it looks like the delta vs. user_properties is mostly size/performance concerns?
5121:09:30 <Marybelle> niedzielski: You think MediaWiki's database should store a user's app theme preference?
5221:09:32 <gwicke> dbrant: do you see this to eventually covering collection (book) functionality as well?
5321:10:22 <tgr> SMalyshev: and keeping complexity down
5421:10:36 <tgr> user prefs need to be loaded as a bundle, for perf reasons
5521:10:50 <niedzielski> Marybelle: what i mean to say is that we currently use user options to store a theme preference. if we had a general purpose key / value store, that would be a more appropriate spot
5621:10:52 <tgr> reading lists would have to be loaded one by one, again for perf reasons
5721:11:02 <anomie> SMalyshev: On the back end, anyway, although there's a question of whether it should be cross-wiki or if users should just pick one "central" wiki. On the front end it'll support fetching individual keys instead of having to fetch the whole thing at once, and possibly some other bits.
5821:11:14 <Marybelle> niedzielski: I'm not sure I get why a mobile app/client gets to use MediaWiki's database to store its preferences.
5921:11:18 <tgr> mixing the two would probably result in more awkward code then having different services for them
6021:11:25 <Marybelle> That seems weird.
6121:11:28 <DanielK_WMDE__> from the RFC, it seems to me that one premise was "re-doing watchlists is hard, let's just a do a key/value store, that's easy".
6221:11:30 <DanielK_WMDE__> But perhaps it's not that easy to do right, and so we should perhaps re-do watchlists instead? After all, we discussed global watchlists last week! Perhaps we'll be doing that anyway...
6321:11:52 <niedzielski> Marybelle: do you mean why store preferences on a server vs on device? or mediawiki specifically?
6421:11:57 <dbrant> Marybelle: the MediaWiki db already stores plenty of user settings. Whether these settings apply to the desktop browsing experience or the app experience shouldn't matter.
6521:12:00 <brion> Marybelle: because it's an app for the site which you log in with your site credentials, and it's that or invent a new storage service ?
6621:12:15 <DanielK_WMDE__> We have several needs that drive changes to watchlists: global watchlists, multiple watchlists, automatic expiry...
6721:12:25 <anomie> DanielK_WMDE__: Part of the idea was prototyping their reading lists on top of a basic key-value store, instead of designing something then having to redo it when they find out they need different behavior.
6821:13:01 <SMalyshev> DanielK_WMDE__: I think it's bigger than just watchlists, watchlists would be one usecase for this?
6921:13:24 <tgr> DanielK_WMDE__: actually the original proposal was to the k-v story as a temporary solution that makes sense on its own as well and then eventually migrate to a dedicated lists API based on some sort of lists concept in core
7021:13:35 <brion> Right thisll be used for the other user data that's currently shoehorned into userjs prefs as I understand?
7121:13:36 <tgr> we might have given up on that by now, not sure
7221:13:39 <Marybelle> brion: I guess I think about non-Wikimedia Foundation apps. Would those clients also be using MediaWiki's/Wikimedia's database to store their user preferences and data?
7321:13:52 <robla> anomie: having a flexible solution sounds really nice. what if we find out it didn't work the way we wanted it to? How does this not become tech debt?
7421:13:53 <dbrant> gwicke: i wouldn't see why not; do you mean something like "turn my reading list into a pdf"?
7521:13:55 <brion> Marybelle: sure, why not?
7621:14:02 <anomie> DanielK_WMDE__: As for watchlists in particular, the apps want the ability to have multiple lists that each aren't limited to a single wiki, and extra metadata, but not (I think?) actually the recentchanges-filtering functionality watchlists have.
7721:14:05 <DanielK_WMDE__> SMalyshev: the driving use cases is (named) reading lists (aka bookmarks). which is very similar to watchlists. a k/v store could be used to cover this to some degree, as long as the lists don't become very large.
7821:14:13 <gwicke> dbrant: yes
7921:14:13 <Marybelle> It seems outside the scope and responsibility of MediaWiki a bit.
8021:14:57 <brion> Marybelle: no more than watchlist and user prefs for the web UI surely
8121:15:00 <anomie> robla: Are you asking about the key-value store, or a specialized reading-list service? If the former, that's the nice thing about a simple, generic key-value service.
8221:15:05 <DanielK_WMDE__> anomie: cross-wiki watchlists are a lot of fun, as discussed here last week (or was it the week before)?
8321:15:28 <Marybelle> brion: MediaWiki the application using the MediaWiki database isn't so crazy. Any random client application using the MediaWiki database seems a lot zanier.
8421:15:29 <tgr> DanielK_WMDE__: apart from lists in core being a big and long project, I think it would be much more sane to go into it *after* we have a good understanding of the use cases, and a key-value store is great for prototyping
8521:15:32 <brion> Those can all be implemented separately and could invent separate places to store their data, but I think it would not be super practical
8621:16:03 <brion> Depends how narrowly or broadly you view MediaWiki IMO
8721:16:15 <DanielK_WMDE__> i see two questions here, and we should perhaps pick one to discuss. a) do we want/need a generic k/v store, what needs does it address, what features does it need? and b) how do we best implement (or prototype) reading lists for mobile?
8821:16:20 <brion> And how narrowly or broadly you view Wikipedia as a site or product or place
8921:16:22 <DanielK_WMDE__> which of the two should we discuss?
9021:16:27 <tgr> Gather tried to build its own API from the start and maintain it across use-case changes, and pretty much ended up with a key-value store (JSON blobs in an SQL table) with lots of cruft around it
9121:16:33 <robla> anomie: more the latter. I'll simplify to 3 options: 1) wild success 2) questionable success 3) obvious failure. outcome 2 is where tech debt accrues
9221:16:42 <gwicke> one issue with a generic key-value store without schema enforcement is that any client side app could write any kind of blob to any key
9321:16:45 <anomie> And really, one of the open questions here is whether the key-value store should actually be in MediaWiki (action API) or should be a separate service for WMF app use (restbase).
9421:17:04 <gwicke> this would put the burden of schema checking / validation squarely on the client
9521:17:13 <Marybelle> brion: Do other big sites let client applications use their databases for arbitrary private data? Like Twitter and Facebook and friends?
9621:17:51 <anomie> gwicke: You say that like it's a disadvantage. It could as well be an advantage.
9721:17:57 <brion> Marybelle: are they platforms for sharing free knowledge?
9821:18:02 <brion> :)
9921:18:11 <gwicke> anomie: it's a trade-off
10021:18:21 <brion> Anyway, we already can store tons of arbitrary data as you point out Marybelle
10121:18:37 <brion> The question is can we do it in a way that's efficient and meets the needs of users
10221:18:41 <Marybelle> I mean, if I were making a regular bookmark application, I wouldn't expect MediaWiki to be my back-end off-hand.
10321:18:45 <gwicke> there is the related issue of schema migrations
10421:18:58 <anomie> robla: Yeah, a specialized reading-list service would certainly have the danger of falling into #2. That's why I personally don't want to build one, at least not without decent planning to make it more likely to hit #1.
10521:18:59 <SMalyshev> given that you can just create a wiki page and dump the data there I don't think it changes a lot
10621:19:06 <gwicke> and format versioning
10721:19:20 <brion> Schemas are out of scope for now, imo
10821:19:32 <Scott_WUaS> (gwicke: can you please clarify what "collection (book) functionality" is? Thanks)
10921:19:54 <anomie> Scott_WUaS: I'm guessing https://www.mediawiki.org/wiki/Extension:Collection
11021:19:58 <gwicke> brion: they are necessarily in scope, the question is just where you handle them
11121:20:03 <brion> Marybelle: maybe, or you might store it in one of the several places in the application servers user database that it makes available for that sort of thing
11221:20:04 <Marybelle> brion: This use-case seems a bit against sharing free knowledge, if these are per-user and private, FWIW.
11321:20:15 <Scott_WUaS> anomie: thnx
11421:20:20 <gwicke> if you say it's out of scope on the server, then that implicitly means that they will need to be handled on the client
11521:20:32 <Marybelle> Or a separate server.
11621:20:34 <DanielK_WMDE__> would it be an option to just expose an existing K/V system to the public (with the necesssary auth in place)?
11721:20:39 <brion> gwicke: yes it's explicitly in the clients sphere of responsibility
11821:21:00 <Marybelle> DanielK_WMDE__: Existing like Redis or something?
11921:21:06 <anomie> gwicke: Once you start shoving schemas and stuff into it, it's no longer a generic key-value store. The client is free to implement a schema on top of a generic key-value store if it wants, which makes the store itself more flexible.
12021:21:15 <gwicke> with multiple clients, this might be tricky to support
12121:21:19 <brion> DanielK_WMDE__: if we can query individual items and not send them to every view, user props would work.
12221:21:26 <SMalyshev> DanielK_WMDE__: I think that would be one of the solutions. If we have a suitable one
12321:21:27 <DanielK_WMDE__> Marybelle: yes. though redis explicitly says it's designed to be accessed by trusted clients only (i just checked) http://redis.io/topics/security
12421:21:31 <brion> That's basically the difference
12521:21:46 <robla> Marybelle is not likely to be convinced about use cases this hour, but other folks seem more interested in talking implementation, so let's focus on implementation
12621:21:53 <brion> :)
12721:21:58 <SMalyshev> DanielK_WMDE__: it doesn't have to be redis directly, can be redis (or other non-auth k/v) behind Mediawiki API front
12821:22:03 <tgr> anomie: I guess something similar to how EventLogging schemas are handled could be done, I doubt it's worth the effort though
12921:22:05 <Marybelle> I asked on the task about just using a separate key/namespace in user_props and just filtering.
13021:22:14 <Marybelle> You don't have to send every user option on every page load.
13121:22:17 <Marybelle> I'm not sure why we do.
13221:22:17 <DanielK_WMDE__> brion: ok, new plan: hack user props that keys starting with an underscore will be skipped when writing props into jsconfig.
13321:22:20 <SMalyshev> DanielK_WMDE__: which supports only API like "give me my data", not "give me her data"
13421:22:21 <brion> Marybelle might work
13521:22:29 <Marybelle> DanielK_WMDE__: +1
13621:22:47 <brion> :)
13721:23:05 <anomie> DanielK_WMDE__: Disadvantage: every existing user of user_props has to be updated to deal with the filtering, and we have to make sure the additional data doesn't have negative performance impact on the existing uses.
13821:23:10 <DanielK_WMDE__> filtering user_props by prefix seems the simples solution by far...
13921:23:21 <brion> Are there any low level probs with how that's stored?
14021:23:21 <niedzielski> would user options be able to hold thousands of arbitrary keys ok?
14121:23:30 <DanielK_WMDE__> anomie: yes. existing uses need to be surveyed
14221:23:35 <tgr> see https://phabricator.wikimedia.org/T128602#2499662 for a list of problems with user_props
14321:23:39 <SMalyshev> DanielK_WMDE__: if there's separate keys for up_property that would work I think
14421:24:01 <gwicke> setting up key-value storage is fairly easy, I would say quite a bit easier than handling format versioning & migrations correctly
14521:24:08 <anomie> Also, jcrespo didn't like the idea of putting it in the main database much in https://phabricator.wikimedia.org/T128602#2476545
14621:24:10 <brion> Ok there's a byte limit I think on those, is that a problem?
14721:24:16 <brion> Could be lifted with a schema tweak
14821:24:17 <SMalyshev> anomie: does existing API right now just dumps all keys for the user, or it chooses specific ones?
14921:24:32 <brion> Ah yes, and we did have req to move to separate db cluster
15021:24:36 <robla> #info Discussion turned to authentication possibilities, and then to using user_props
15121:24:38 <brion> Which is easy to do per table iirc
15221:24:45 <Marybelle> tgr: First three bullets seem trivially solvable.
15321:24:46 <anomie> SMalyshev: The existing API query only supports fetching all data for the user, unless I'm completely mistaken.
15421:25:01 <DanielK_WMDE__> hm, i just found this: https://remotestorage.io/
15521:25:02 <dbrant> brion: what's the limit, roughly?
15621:25:03 <brion> Yeah API needs enhancement for query
15721:25:07 <niedzielski> maybe it would be easier to drop a new generic key value store if the feature is unpopular than to clear user options
15821:25:15 <DanielK_WMDE__> no idea if it's good, but it seems worth a look.
15921:25:16 <brion> dbrant: 65534 bytes iirc
16021:25:17 <SMalyshev> anomie: right, but that's only one API. I don't think it should be too hard to make this API skip certain keys in DB?
16121:25:29 <brion> Should be a matter of changing column type
16221:25:54 <gwicke> niedzielski: if there are clear patterns in how keys are structured, or there is only a single use case using this service, yes
16321:25:56 <SMalyshev> anomie: also, if main DB is bad, we could make it two-stage - store opaque id in main db, store actual data in better storage
16421:26:03 <Marybelle> https://www.mediawiki.org/wiki/Manual:User_properties_table
16521:26:12 <anomie> SMalyshev: Then you're **really** making things complicated.
16621:26:30 <SMalyshev> it's not *that* complicated I think... just one more call
16721:26:31 <anomie> At that point, why not just use the actual-data storage?
16821:26:33 <Marybelle> How is a filter complicated?
16921:26:35 <anomie> directly
17021:26:41 <SMalyshev> anomie: because auth, etc.
17121:26:54 <tgr> again, if the goal is keeping things simple then having two separate systems with a very simple mode of operation seems better than having one that tries to be the mix of the two
17221:26:56 <SMalyshev> I think maintaining two auth systems in sync is worse
17321:26:58 <anomie> SMalyshev: No, not direct access to the backend.
17421:27:00 <DanielK_WMDE__> Huh. interesting. https://remotestorage.io/ is sponsored by the Wau Holland Stiftung? that indicates it's not industry bullshit. doesn't ell me if it's any good for our use case, but
17521:27:11 <DanielK_WMDE__> #link https://remotestorage.io/
17621:27:22 <brion> Ooh that's neat
17721:27:30 <anomie> SMalyshev: But the frontend directly accessing the better-storage backend instead of looking up in user_properties then in the better-storage.
17821:27:32 * brion bookmarks for later
17921:27:33 <SMalyshev> anomie: that means two APIs instead of one, otherwise the same. I'd rather have users learn one API :)
18021:28:05 <anomie> SMalyshev: Ah, the Gather approach?
18121:28:19 <TimStarling> my vote is to just add a table
18221:28:35 <gwicke> data like reading lists would likely be interesting to multiple applications; I would be surprised if the app would end up being the only user
18321:28:46 <TimStarling> avoid joins so that you can hack up some cross-server thing later if need be, query groups or something
18421:29:01 * anomie is actually more interested in "BagOStuff vs some custom abstraction" over "where exactly do we hide the database table"
18521:29:06 <TimStarling> this whole project seems like something that could be done in hardly any more time than it takes to have this meeting
18621:29:16 <brion> Hehe
18721:29:32 <gwicke> simple key-value starage, sure
18821:29:34 <SMalyshev> defining what to do is often longer than doing it... it's normal :)
18921:29:42 <gwicke> but that's hardly a solution
19021:29:47 <dbrant> +1 to gwicke -- i can easily see mobile web and/or desktop surfacing reading lists for logged-in users.
19121:29:50 <robla> anomie: should we try to answer that question now?
19221:29:52 <Scott_WUaS> :)
19321:30:25 <DanielK_WMDE__> Is this spec in line with what we need? https://datatracker.ietf.org/doc/draft-dejong-remotestorage/
19421:30:29 <brion> That might be a more interesting question for the watchlist cross wiki magic future discussion :)
19521:30:39 <anomie> robla: Wouldn't hurt. Although "action API vs restbase" is an even more interesting question, since it's the difference between me writing it and me not ;)
19621:30:49 <DanielK_WMDE__> if so, perhaps we can rely on some ready-to-go solution.
19721:31:07 <brion> I just don't want mobile apps to get bogged down in the meantime while we bikeshed reading lists
19821:31:17 <gwicke> dbrant: once we have multiple clients, it might make more sense to provide an API that gives a bit more guarantees than just "it's a blob of bytes"
19921:31:25 <SMalyshev> DanielK_WMDE__: does it have TLDR description?
20021:31:47 <DanielK_WMDE__> SMalyshev: https://remotestorage.io/
20121:32:48 <gwicke> updating the schema when all your schema handling is implemented in $n clients sounds hard
20221:32:58 <SMalyshev> DanielK_WMDE__: this seems to be client-side storage?
20321:33:26 <tgr> DanielK_WMDE__: how would auth work?
20421:33:30 <SMalyshev> or the picture is misleading
20521:33:41 <DanielK_WMDE__> SMalyshev: no, it's not.
20621:33:50 <tgr> DanielK_WMDE__: "If <auth-dialog> is a URL, the user can supply their credentials for accessing the account (how, is out of scope)
20721:33:58 <tgr> - not so promising
20821:33:59 <DanielK_WMDE__> SMalyshev: the client picks it's storage provider. local is one option. all providers implement the same protocol
20921:34:00 <anomie> DanielK_WMDE__: At a glance, that looks like it would serve the purpose. It wouldn't be able to be done in the context of the action API, but that's not a requirement (assuming it's not me writing it, anyway).
21021:34:00 <Marybelle> Is 65,535 bytes really insufficient?
21121:34:25 <brion> Long names add up and it's hard to make guarantees :)
21221:34:25 <DanielK_WMDE__> tgr: they seem to use oauth
21321:34:29 <Marybelle> That sounds like a lot of page IDs.
21421:34:43 <brion> 64k would likely be enough in practice until it breaks on some extreme case
21521:34:43 <robla> SMalyshev: it looks like the storage is OAuth protected server storage (that basically provides similar constraints to most client storage solutions)
21621:35:00 <tgr> forcing all apps to go through an extra oauth authorization screen might be suboptimal
21721:35:14 <DanielK_WMDE__> using an actual standard to implement this would be nice.
21821:35:27 <DanielK_WMDE__> forhtering the idea of "clients pick where they store their data" is even nicer
21921:35:29 <robla> DanielK_WMDE__ agreed
22021:35:32 <tgr> anyway, that protocol seems like it might be a better solution but evaluating it within this IRC meeting is not realistic
22121:35:49 <brion> Well ideally you'd probably want the same login for login and storage in a case like this
22221:35:52 <DanielK_WMDE__> even if we implement this ourselves, perhaps we should implement the protocol defined there.
22321:36:20 <DanielK_WMDE__> tgr: i agree. i'll link to it from the ticket
22421:36:23 <brion> I am curious about it and it smells useful for unofficial apps and such, do bring it up in future :)
22521:36:24 <tgr> and worth mentioning again that building a k-v store that relies on the action API is super simple
22621:36:27 <dbrant> Marybelle: brion: for reference, the current record holder for the most pages in reading lists in our app is over 7000.
22721:36:50 <tgr> bulding a new API that should match a draft protocol with OAuth 2 and whatnot is no
22821:36:52 <niedzielski> i don't think 64k would be too small personally. it would encourage favoring distinct keys instead of json blobs
22921:36:54 <tgr> *not*
23021:37:08 <robla> #info DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC
23121:37:12 <brion> 7000 dang! Might fit in int IDs bit maybe not titles yeah :)
23221:37:38 <brion> Anyway that bits changeable if we need it
23321:38:01 <brion> Easy to bump the column type up one
23421:38:32 <SMalyshev> I understand that if we use IETF one we'd need a client for remoteStorage protocol and a backend. which should rely on some kind of k'v storage inside mediawiki
23521:39:07 <SMalyshev> so we're kind of back to sq. 1? maybe with more standard client API
23621:39:20 <robla> so...my understanding is that the Mobile Apps team plans to prototype *something* this quarter; should they use one of our existing tech, or should they explore something new?
23721:39:38 <brion> So we have some interest in that protocol; and some talk about tweaking user props with a modified MW API method, and some talk about just adding a table still. Any other current main alternatives?
23821:39:56 <brion> :)
23921:40:11 <tgr> restbase I suppose?
24021:40:15 * robla notes that the "IETF one" is a draft that hasn't made it to "Proposed Standard", and that submitting an IETF draft is trivial
24121:40:28 <SMalyshev> tgr: does restbase authenticate?
24221:41:35 <Pchelolo> SMalyshev: restbase can authenticate
24321:42:02 <brion> Ok so that's another possibility yes :)
24421:42:24 <brion> Though I think gwicke would prefer restbase services to be better specified for their data schemas ?
24521:42:41 <brion> And this is still a young feature likely to change in details
24621:42:46 <Marybelle> dbrant: How common is a reading list of 7000 pages? Just one user? A dozen users?
24721:42:52 <dbrant> robla: to be clear, this isn't a blocker for our current goals this quarter. it would simply make our reading lists become a "complete" feature, the way we intended it to be (technically, two quarters ago :) ).
24821:42:55 <brion> Eventually hopefully merging into fancy watchlists
24921:43:25 <robla> dbrant: thanks for the clarification!
25021:43:34 <brion> That helps us set timelines :)
25121:43:38 <Marybelle> brion: I get pretty frustrated at how bad watchlists are, especially when I see people basically working around them instead of investing resources to fix them. :-/
25221:43:51 <brion> Understandable!
25321:44:13 <brion> I think we need to bring some more focus on that I agree
25421:44:46 <gwicke> added a comment re versioning & API stability at https://phabricator.wikimedia.org/T128602#2500338
25521:44:48 <Marybelle> For the general idea of private reading lists, it's kind of maddening that watchlists won't work.
25621:44:54 <Marybelle> Sigh.
25721:45:07 * anomie wouldn't mind working on watchlists, but there are currently enough other cooks in that kitchen and that's not the problem at hand here.
25821:45:10 * robla tries to remember what ori and Steven Walling were pushing for a few years ago on the watchlist front
25921:45:14 <DanielK_WMDE__> SMalyshev: the point was that client libraries and backend implementations exist, we wouldn't have to write them (if the current ones are good - which i don't know)
26021:45:39 <SMalyshev> DanielK_WMDE__: I don't think backend which we need (i.e. with mediawiki auth) exists?
26121:45:42 <DanielK_WMDE__> (sorry for jumping back to this, don't let me distract you9
26221:45:53 <brion> dbrant: does the enhanced user pref model sound like a good short term for you as a sync mechanism? I think we all still agree fancier watchlist integration would be great for future
26321:45:55 <DanielK_WMDE__> SMalyshev: backends with oath exist
26421:46:25 <tgr> with a page lists API it really helps if you have a good idea what features you'll need exactly
26521:46:37 <dbrant> Marybelle: there are over 100 users with at least 1000 pages in lists. Over 4000 users with at least 100 pages, etc...
26621:46:42 <tgr> that's one place where Gather dug itself into the ground
26721:46:55 <tgr> a k-v store is ideal for prototyping
26821:47:21 <SMalyshev> DanielK_WMDE__: yeah but oauth against what? we need something on mediawiki side to do actual r/w... even if oauth plugs seamlessly there. Maybe I just don't understand yet what that API does :)
26921:47:24 <tgr> for reading lists, and for a number of (non-list-related) future features I'd imagine
27021:47:39 <dbrant> brion: i think that can definitely work, as long as it can handle a large number of keys.
27121:48:07 <brion> Large number of keys should work with that model yes. You'd need a bulk lookup thiugh too?
27221:48:12 <tgr> DanielK_WMDE__: note that this seems to be using OAuth 2 which MediaWiki does not support
27321:48:18 <anomie> dbrant, brion: "as long as it can handle a large number of keys" is a good question. Again, https://phabricator.wikimedia.org/T128602#2476545
27421:48:22 <DanielK_WMDE__> tgr: hm, right...
27521:48:24 <brion> Heh
27621:48:37 <tgr> probably not a huge undertaking to fix but way larger than the one proposed in this RfC
27721:48:44 <Marybelle> robla: https://www.mediawiki.org/wiki/Requests_for_comment/Support_for_user-specific_page_lists_in_core
27821:49:18 <robla> ah, that's the one, thanks Marybelle
27921:49:25 <dbrant> brion: right, lookup too.
28021:50:16 <dbrant> but then, if we're talking about a short term solution, we can limit things on the client end, too.
28121:50:32 <Scott_WUaS> (anomie: just noticed that you're mentioned in this BBC article "Meet the 'bots' that edit Wikipedia" - http://www.bbc.com/news/magazine-18892510 :)
28221:51:26 <tgr> one thing that hasn't been discussed is how much effort it would take to prevent abuse / how afraid we are it would happen
28321:51:38 <brion> dbrant: ok let's maybe model a couple variants. Large blob, vs row per title? Then confirm whether they make sense on a tweaked user props table, and decide whether to look more at the alternatives?
28421:51:53 <tgr> pirates using it for movie distribution or whatnot
28521:52:22 <gwicke> a quota can take care of that
28621:52:36 <dbrant> brion: +1 niedzielski: what do you think of that? &
28721:52:45 <brion> tgr: good question. There's little we can do to prevent use of user props as a file sharing or DoS space usage against us, other than "it's inconvenient and there are probably easier ways to abuse the system"
28821:53:01 <niedzielski> brion dbrant: one of the problems we consider with row vs blob were race conditions between clients. user options didn't seem well designed to handle that
28921:53:07 <robla> tgr: the security considerations section of https://datatracker.ietf.org/doc/draft-dejong-remotestorage/?include_text=1 looks like a good start on a list
29021:53:13 <brion> Changing API migh make it a bit easier to abuse buy not much
29121:53:49 <niedzielski> brion dbrant: for example in the blob scenario, if two clients try to update the same list, the last client wins. there's also bandwidth concerns for the 7000 title person
29221:53:53 <brion> niedzielski: yeah, you'd have to detect conflicts through some other means like a signal value
29321:54:10 <brion> Goes smoother with smaller bits, but that complicates the filtering
29421:54:15 <tgr> brion: we could do all kinds of usage tracking, user agent filtering etc
29521:54:19 <gwicke> conflict resolution is somewhat orthogonal from storage strategy
29621:54:30 <tgr> but building it is little effort and those things probably aren't
29721:54:33 <brion> Ok were low on time :)
29821:54:48 <brion> robla: shall we plan next steps?
29921:55:06 * robla ponders what that would be
30021:55:25 <gwicke> overall, I'm honestly sceptical about the value of using a generic key-blob storage service for use cases like reading lists
30121:55:39 <brion> I think we want to bump pruoity on the more watchlist specific rfc!
30221:55:44 <gwicke> if the use case is so well defined, then I think it deserves a real API
30321:55:47 <brion> Priority
30421:56:00 <anomie> gwicke: "if"
30521:56:02 <brion> gwicke: agreed, medium to long term
30621:56:18 <robla> #action ArchCom needs to bump the priority on a watchlist specific RFC
30721:56:30 <Scott_WUaS> "if"
30821:56:35 <niedzielski> brion dbrant: IIRC, we also had concerns with a list of page IDs vs a page ID with a list of lists. i don't think we came up with a great way to handle that and had to use the list title as the ID
30921:56:58 <brion> Short term: I'll follow up with dbrant and niedzielski on using user prefs modified and see if that still makes sense
31021:57:29 <brion> And anyone else want to do more research on general user data storage with that protocol?
31121:57:45 <brion> It sounds potentially very useful for unofficial third-party tools and such
31221:57:52 <dbrant> brion: sounds good
31321:57:55 <anomie> brion: I'd suggest to run it by jcrespo too
31421:58:02 <brion> Ah yes good
31521:58:21 <robla> brion: I should probably take some of those action items from you, but yes, this all looks good
31621:58:28 <brion> Hehe ok
31721:58:30 <robla> (thank you for spelling this out!)
31821:58:45 <brion> :)
31921:59:09 <robla> anomie: dbrant - any last comments questions before we close this out?
32021:59:18 <brion> Good discussion folks!
32121:59:28 <Scott_WUaS> Yes!
32221:59:29 <dbrant> robla: nope, really glad to see this moving forward
32321:59:29 * anomie has no more comments at the moment
32421:59:40 <robla> great discussion indeed....thanks everyone!
32521:59:43 <niedzielski> \o
32621:59:45 <brion> Woohoo
32721:59:48 <robla> o/
32821:59:57 <brion> Ok I gotta run, catch y'all later
32922:00:02 <robla> #endmeeting

Other meetings

Architecture meetings
13:00 PT ArchCom Planning Meetingsupcomingall since 2016-03-30
14:00 PT ArchCom-RFC Meetingsupcomingall since 2015-09-09

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

RobLa-WMF renamed this event from ArchCom RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office).Jul 21 2016, 10:10 PM
RobLa-WMF renamed this event from ArchCom RFC Meeting: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting W30: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office).Jul 22 2016, 12:13 AM
RobLa-WMF renamed this event from ArchCom RFC Meeting W30: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office).Jul 27 2016, 10:40 PM

For the specific use case of reading lists (this is not for the general case, everything you mentioned regarding generic store solution still stands):

watchlist (
  user int
  nampespace int
  title varchar(255)
  notificationtimestamp varchar(14)

(digievolves into)

watchlists (
  list_id int PK,
  user_id int, -- can be global user id
  list_type enum ('default watchlist for backwards compatibility', 'mobile bookmark stuff that may be (?) also viewable from desktop', 'any new list type we can think in the future (e.g. articles you like)', 'user defined list')
  list_wiki ('global', 'enwiki', ...)
  list name varchar(255)
watchlist_items ( -- or watchlist_titles
  list_id int
  wiki enum('enwiki', ...) -- only for cross-wiki lists, if needed
  namespace int
  title title varchar(255)
  notificationtimestamp varchar(14)

Get all titles for a mobile list (sort of):

SELECT wiki, namespace, title
FROM watchlist_items wlt
JOIN watchlists wl ON wl=list_id = wli.list_id
WHERE wl.user_id = $user AND list_type = 'mobile bookmark stuff that may be (?) also viewable from desktop'
ORDER BY ns, title;

(ignore the types like enum -we do not really want to use that type-, but it is my way of drafting and getting understood easily). All current code for watchlists (API, etc.) can apply to watchlist_items, only the multiplexing and new functions for watchlists are needed. We already handle lists of 100.000 items for some bots.

If this is global, watchlist_items can be T126641, we fix 2 RFC at once. (yes, I know it doesn't cover *ALL* of this nor all of that). The more people working on similar features, the better, isn't it?

These global lists can be on x1 now and future needs (integration with local watchlists, potentially allowing several user watchlists, etc) can be done later.

daniel renamed this event from ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office).