ArchCom-RFC-2016W30-irc-E237.txt
ActivePublic
Actions

Authored by • RobLa-WMF on Jul 27 2016, 10:03 PM.

Tags

Referenced Files

	F4313227: ArchCom-RFC-2016W30-irc-E237.txt
	Jul 27 2016, 10:03 PM

Subscribers

None

	21:00:40 <robla> #startmeeting ArchCom 2016W30: authenticated key-value store
	21:00:40 <wm-labs-meetbot`> Meeting started Wed Jul 27 21:00:40 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
	21:00:40 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
	21:00:40 <wm-labs-meetbot`> The meeting name has been set to 'archcom_2016w30__authenticated_key_value_store'
	21:01:01 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) \| Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
	21:01:25 <Marybelle> tgr: When you say "large amounts of global data which are needed infrequently", what do you mean specifically?
	21:01:38 <robla> #link https://phabricator.wikimedia.org/E237 Phab event link
	21:02:09 <robla> #link https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC
	21:02:13 <tgr> the specific use case that resulted in this RfC was reading lists
	21:02:42 <tgr> ie lists of favorite articles which are synchronized across devices
	21:03:03 <Marybelle> > While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists.
	21:03:05 <robla> dbrant and anomie: thanks for your work on this!
	21:03:07 <Marybelle> This task makes no sense to me.
	21:03:28 <robla> I think maybe we can start with the list of questions in the task
	21:03:37 <Marybelle> If you want a private list of favorite articles, that sounds like a watchlist.
	21:03:41 <SMalyshev> question: does the store itself needs to be authenticated? I mean, we store sessions relying just on long random ID to be secure. Can't we use another long random ID to secure prefs?
	21:04:06 <Scott_WUaS> Hi All!
	21:04:16 <niedzielski> o/
	21:04:18 <brion> SMalyshev: by default you'd do it behind the API which is authenticated
	21:04:23 <anomie> SMalyshev: Sessions are that way because of the bootstrapping problem: you have to start somewhere.
	21:04:32 <brion> No need to invent new auth methods
	21:04:47 <robla> alright, I guess we'll start with SMalyshev 's question, and then move to "Should this be implemented as a MediaWiki action API endpoint or a restbase service?"
	21:04:51 <mdholloway> SMalyshev: at least for the apps' use case, the reading lists should be private and therefore we'd need authentication
	21:04:54 <mdholloway> hi all, btw
	21:04:55 <tgr> SMalyshev: not sure what you mean by the store itself being authenticated
	21:05:06 <SMalyshev> brion: well, yes, if it's mediawiki API then sure. but understand it's one of the options?
	21:05:10 <Marybelle> robla: I think focusing on implementation before defining use-cases is silly.
	21:05:11 <tgr> the API should authenticate, just like sessions do
	21:05:38 <robla> #info Question discussed: does the store need to be authenticated?
	21:05:56 <brion> Marybelle: the implementation of the store? Or the client side app feature that uses it?
	21:06:06 <anomie> If it's in the action API, it'll be authenticated just like anything else. If it's in restbase, I think that has support for the same sort of thing (but I don't know details).
	21:06:16 <tgr> how that's represented in the backend is an implementation details that is IMO low relevance compared to other factors for choosing backends
	21:06:20 <brion> The app is a thing that already exists and already has a reading list feature.
	21:06:23 <Marybelle> brion: The use-cases of having another authenticated store.
	21:06:43 <Marybelle> If this is for private lists of articles, that sounds like a watchlist.
	21:07:02 <brion> Marybelle ok. The use case is to store per user private data on the server and be able to retrieve and update it, with no versioning.
	21:07:23 <Marybelle> We do that already in several places. :-)
	21:07:47 <Marybelle> I'm not sure making yet another private place is needed.
	21:07:55 <brion> Marybelle: yes, in ways that don't match this use case as already replied on the bug.
	21:08:08 <Marybelle> Okay. I guess I'm the only person not getting it.
	21:08:09 <brion> Either we can change one of those methods to match, or make a new one.
	21:08:13 <tgr> note that the apps already store global authenticate private data, but they are currently using user_props for that, which is not terrible but suboptimal
	21:08:13 <dbrant> Marybelle: this is a sort of generalization of watchlists, where the user can have multiple lists, each with its own attributes (name, description, whether it's saved for offline reading, etc)
	21:08:26 <tgr> so reading lists are not in fact the only current use case
	21:08:47 <robla> #info conversation turned to discussion of use cases for Mobile App
	21:08:56 <niedzielski> tgr: we currently use user prefs to store theme info. we'd probably migrate that to this new store
	21:09:02 <Marybelle> brion: Right. I'd really like to avoid yet another special thing to maintain indefinitely.
	21:09:05 <niedzielski> (just as an example)
	21:09:21 <brion> nod keep em simple when possible :)
	21:09:22 <SMalyshev> reading https://phabricator.wikimedia.org/T128602#2499662 it looks like the delta vs. user_properties is mostly size/performance concerns?
	21:09:30 <Marybelle> niedzielski: You think MediaWiki's database should store a user's app theme preference?
	21:09:32 <gwicke> dbrant: do you see this to eventually covering collection (book) functionality as well?
	21:10:22 <tgr> SMalyshev: and keeping complexity down
	21:10:36 <tgr> user prefs need to be loaded as a bundle, for perf reasons
	21:10:50 <niedzielski> Marybelle: what i mean to say is that we currently use user options to store a theme preference. if we had a general purpose key / value store, that would be a more appropriate spot
	21:10:52 <tgr> reading lists would have to be loaded one by one, again for perf reasons
	21:11:02 <anomie> SMalyshev: On the back end, anyway, although there's a question of whether it should be cross-wiki or if users should just pick one "central" wiki. On the front end it'll support fetching individual keys instead of having to fetch the whole thing at once, and possibly some other bits.
	21:11:14 <Marybelle> niedzielski: I'm not sure I get why a mobile app/client gets to use MediaWiki's database to store its preferences.
	21:11:18 <tgr> mixing the two would probably result in more awkward code then having different services for them
	21:11:25 <Marybelle> That seems weird.
	21:11:28 <DanielK_WMDE__> from the RFC, it seems to me that one premise was "re-doing watchlists is hard, let's just a do a key/value store, that's easy".
	21:11:30 <DanielK_WMDE__> But perhaps it's not that easy to do right, and so we should perhaps re-do watchlists instead? After all, we discussed global watchlists last week! Perhaps we'll be doing that anyway...
	21:11:52 <niedzielski> Marybelle: do you mean why store preferences on a server vs on device? or mediawiki specifically?
	21:11:57 <dbrant> Marybelle: the MediaWiki db already stores plenty of user settings. Whether these settings apply to the desktop browsing experience or the app experience shouldn't matter.
	21:12:00 <brion> Marybelle: because it's an app for the site which you log in with your site credentials, and it's that or invent a new storage service ?
	21:12:15 <DanielK_WMDE__> We have several needs that drive changes to watchlists: global watchlists, multiple watchlists, automatic expiry...
	21:12:25 <anomie> DanielK_WMDE__: Part of the idea was prototyping their reading lists on top of a basic key-value store, instead of designing something then having to redo it when they find out they need different behavior.
	21:13:01 <SMalyshev> DanielK_WMDE__: I think it's bigger than just watchlists, watchlists would be one usecase for this?
	21:13:24 <tgr> DanielK_WMDE__: actually the original proposal was to the k-v story as a temporary solution that makes sense on its own as well and then eventually migrate to a dedicated lists API based on some sort of lists concept in core
	21:13:35 <brion> Right thisll be used for the other user data that's currently shoehorned into userjs prefs as I understand?
	21:13:36 <tgr> we might have given up on that by now, not sure
	21:13:39 <Marybelle> brion: I guess I think about non-Wikimedia Foundation apps. Would those clients also be using MediaWiki's/Wikimedia's database to store their user preferences and data?
	21:13:52 <robla> anomie: having a flexible solution sounds really nice. what if we find out it didn't work the way we wanted it to? How does this not become tech debt?
	21:13:53 <dbrant> gwicke: i wouldn't see why not; do you mean something like "turn my reading list into a pdf"?
	21:13:55 <brion> Marybelle: sure, why not?
	21:14:02 <anomie> DanielK_WMDE__: As for watchlists in particular, the apps want the ability to have multiple lists that each aren't limited to a single wiki, and extra metadata, but not (I think?) actually the recentchanges-filtering functionality watchlists have.
	21:14:05 <DanielK_WMDE__> SMalyshev: the driving use cases is (named) reading lists (aka bookmarks). which is very similar to watchlists. a k/v store could be used to cover this to some degree, as long as the lists don't become very large.
	21:14:13 <gwicke> dbrant: yes
	21:14:13 <Marybelle> It seems outside the scope and responsibility of MediaWiki a bit.
	21:14:57 <brion> Marybelle: no more than watchlist and user prefs for the web UI surely
	21:15:00 <anomie> robla: Are you asking about the key-value store, or a specialized reading-list service? If the former, that's the nice thing about a simple, generic key-value service.
	21:15:05 <DanielK_WMDE__> anomie: cross-wiki watchlists are a lot of fun, as discussed here last week (or was it the week before)?
	21:15:28 <Marybelle> brion: MediaWiki the application using the MediaWiki database isn't so crazy. Any random client application using the MediaWiki database seems a lot zanier.
	21:15:29 <tgr> DanielK_WMDE__: apart from lists in core being a big and long project, I think it would be much more sane to go into it after we have a good understanding of the use cases, and a key-value store is great for prototyping
	21:15:32 <brion> Those can all be implemented separately and could invent separate places to store their data, but I think it would not be super practical
	21:16:03 <brion> Depends how narrowly or broadly you view MediaWiki IMO
	21:16:15 <DanielK_WMDE__> i see two questions here, and we should perhaps pick one to discuss. a) do we want/need a generic k/v store, what needs does it address, what features does it need? and b) how do we best implement (or prototype) reading lists for mobile?
	21:16:20 <brion> And how narrowly or broadly you view Wikipedia as a site or product or place
	21:16:22 <DanielK_WMDE__> which of the two should we discuss?
	21:16:27 <tgr> Gather tried to build its own API from the start and maintain it across use-case changes, and pretty much ended up with a key-value store (JSON blobs in an SQL table) with lots of cruft around it
	21:16:33 <robla> anomie: more the latter. I'll simplify to 3 options: 1) wild success 2) questionable success 3) obvious failure. outcome 2 is where tech debt accrues
	21:16:42 <gwicke> one issue with a generic key-value store without schema enforcement is that any client side app could write any kind of blob to any key
	21:16:45 <anomie> And really, one of the open questions here is whether the key-value store should actually be in MediaWiki (action API) or should be a separate service for WMF app use (restbase).
	21:17:04 <gwicke> this would put the burden of schema checking / validation squarely on the client
	21:17:13 <Marybelle> brion: Do other big sites let client applications use their databases for arbitrary private data? Like Twitter and Facebook and friends?
	21:17:51 <anomie> gwicke: You say that like it's a disadvantage. It could as well be an advantage.
	21:17:57 <brion> Marybelle: are they platforms for sharing free knowledge?
	21:18:02 <brion> :)
	21:18:11 <gwicke> anomie: it's a trade-off
	21:18:21 <brion> Anyway, we already can store tons of arbitrary data as you point out Marybelle
	21:18:37 <brion> The question is can we do it in a way that's efficient and meets the needs of users
	21:18:41 <Marybelle> I mean, if I were making a regular bookmark application, I wouldn't expect MediaWiki to be my back-end off-hand.
	21:18:45 <gwicke> there is the related issue of schema migrations
	21:18:58 <anomie> robla: Yeah, a specialized reading-list service would certainly have the danger of falling into #2. That's why I personally don't want to build one, at least not without decent planning to make it more likely to hit #1.
	21:18:59 <SMalyshev> given that you can just create a wiki page and dump the data there I don't think it changes a lot
	21:19:06 <gwicke> and format versioning
	21:19:20 <brion> Schemas are out of scope for now, imo
	21:19:32 <Scott_WUaS> (gwicke: can you please clarify what "collection (book) functionality" is? Thanks)
	21:19:54 <anomie> Scott_WUaS: I'm guessing https://www.mediawiki.org/wiki/Extension:Collection
	21:19:58 <gwicke> brion: they are necessarily in scope, the question is just where you handle them
	21:20:03 <brion> Marybelle: maybe, or you might store it in one of the several places in the application servers user database that it makes available for that sort of thing
	21:20:04 <Marybelle> brion: This use-case seems a bit against sharing free knowledge, if these are per-user and private, FWIW.
	21:20:15 <Scott_WUaS> anomie: thnx
	21:20:20 <gwicke> if you say it's out of scope on the server, then that implicitly means that they will need to be handled on the client
	21:20:32 <Marybelle> Or a separate server.
	21:20:34 <DanielK_WMDE__> would it be an option to just expose an existing K/V system to the public (with the necesssary auth in place)?
	21:20:39 <brion> gwicke: yes it's explicitly in the clients sphere of responsibility
	21:21:00 <Marybelle> DanielK_WMDE__: Existing like Redis or something?
	21:21:06 <anomie> gwicke: Once you start shoving schemas and stuff into it, it's no longer a generic key-value store. The client is free to implement a schema on top of a generic key-value store if it wants, which makes the store itself more flexible.
	21:21:15 <gwicke> with multiple clients, this might be tricky to support
	21:21:19 <brion> DanielK_WMDE__: if we can query individual items and not send them to every view, user props would work.
	21:21:26 <SMalyshev> DanielK_WMDE__: I think that would be one of the solutions. If we have a suitable one
	21:21:27 <DanielK_WMDE__> Marybelle: yes. though redis explicitly says it's designed to be accessed by trusted clients only (i just checked) http://redis.io/topics/security
	21:21:31 <brion> That's basically the difference
	21:21:46 <robla> Marybelle is not likely to be convinced about use cases this hour, but other folks seem more interested in talking implementation, so let's focus on implementation
	21:21:53 <brion> :)
	21:21:58 <SMalyshev> DanielK_WMDE__: it doesn't have to be redis directly, can be redis (or other non-auth k/v) behind Mediawiki API front
	21:22:03 <tgr> anomie: I guess something similar to how EventLogging schemas are handled could be done, I doubt it's worth the effort though
	21:22:05 <Marybelle> I asked on the task about just using a separate key/namespace in user_props and just filtering.
	21:22:14 <Marybelle> You don't have to send every user option on every page load.
	21:22:17 <Marybelle> I'm not sure why we do.
	21:22:17 <DanielK_WMDE__> brion: ok, new plan: hack user props that keys starting with an underscore will be skipped when writing props into jsconfig.
	21:22:20 <SMalyshev> DanielK_WMDE__: which supports only API like "give me my data", not "give me her data"
	21:22:21 <brion> Marybelle might work
	21:22:29 <Marybelle> DanielK_WMDE__: +1
	21:22:47 <brion> :)
	21:23:05 <anomie> DanielK_WMDE__: Disadvantage: every existing user of user_props has to be updated to deal with the filtering, and we have to make sure the additional data doesn't have negative performance impact on the existing uses.
	21:23:10 <DanielK_WMDE__> filtering user_props by prefix seems the simples solution by far...
	21:23:21 <brion> Are there any low level probs with how that's stored?
	21:23:21 <niedzielski> would user options be able to hold thousands of arbitrary keys ok?
	21:23:30 <DanielK_WMDE__> anomie: yes. existing uses need to be surveyed
	21:23:35 <tgr> see https://phabricator.wikimedia.org/T128602#2499662 for a list of problems with user_props
	21:23:39 <SMalyshev> DanielK_WMDE__: if there's separate keys for up_property that would work I think
	21:24:01 <gwicke> setting up key-value storage is fairly easy, I would say quite a bit easier than handling format versioning & migrations correctly
	21:24:08 <anomie> Also, jcrespo didn't like the idea of putting it in the main database much in https://phabricator.wikimedia.org/T128602#2476545
	21:24:10 <brion> Ok there's a byte limit I think on those, is that a problem?
	21:24:16 <brion> Could be lifted with a schema tweak
	21:24:17 <SMalyshev> anomie: does existing API right now just dumps all keys for the user, or it chooses specific ones?
	21:24:32 <brion> Ah yes, and we did have req to move to separate db cluster
	21:24:36 <robla> #info Discussion turned to authentication possibilities, and then to using user_props
	21:24:38 <brion> Which is easy to do per table iirc
	21:24:45 <Marybelle> tgr: First three bullets seem trivially solvable.
	21:24:46 <anomie> SMalyshev: The existing API query only supports fetching all data for the user, unless I'm completely mistaken.
	21:25:01 <DanielK_WMDE__> hm, i just found this: https://remotestorage.io/
	21:25:02 <dbrant> brion: what's the limit, roughly?
	21:25:03 <brion> Yeah API needs enhancement for query
	21:25:07 <niedzielski> maybe it would be easier to drop a new generic key value store if the feature is unpopular than to clear user options
	21:25:15 <DanielK_WMDE__> no idea if it's good, but it seems worth a look.
	21:25:16 <brion> dbrant: 65534 bytes iirc
	21:25:17 <SMalyshev> anomie: right, but that's only one API. I don't think it should be too hard to make this API skip certain keys in DB?
	21:25:29 <brion> Should be a matter of changing column type
	21:25:54 <gwicke> niedzielski: if there are clear patterns in how keys are structured, or there is only a single use case using this service, yes
	21:25:56 <SMalyshev> anomie: also, if main DB is bad, we could make it two-stage - store opaque id in main db, store actual data in better storage
	21:26:03 <Marybelle> https://www.mediawiki.org/wiki/Manual:User_properties_table
	21:26:12 <anomie> SMalyshev: Then you're really making things complicated.
	21:26:30 <SMalyshev> it's not that complicated I think... just one more call
	21:26:31 <anomie> At that point, why not just use the actual-data storage?
	21:26:33 <Marybelle> How is a filter complicated?
	21:26:35 <anomie> directly
	21:26:41 <SMalyshev> anomie: because auth, etc.
	21:26:54 <tgr> again, if the goal is keeping things simple then having two separate systems with a very simple mode of operation seems better than having one that tries to be the mix of the two
	21:26:56 <SMalyshev> I think maintaining two auth systems in sync is worse
	21:26:58 <anomie> SMalyshev: No, not direct access to the backend.
	21:27:00 <DanielK_WMDE__> Huh. interesting. https://remotestorage.io/ is sponsored by the Wau Holland Stiftung? that indicates it's not industry bullshit. doesn't ell me if it's any good for our use case, but
	21:27:11 <DanielK_WMDE__> #link https://remotestorage.io/
	21:27:22 <brion> Ooh that's neat
	21:27:30 <anomie> SMalyshev: But the frontend directly accessing the better-storage backend instead of looking up in user_properties then in the better-storage.
	21:27:32 * brion bookmarks for later
	21:27:33 <SMalyshev> anomie: that means two APIs instead of one, otherwise the same. I'd rather have users learn one API :)
	21:28:05 <anomie> SMalyshev: Ah, the Gather approach?
	21:28:19 <TimStarling> my vote is to just add a table
	21:28:35 <gwicke> data like reading lists would likely be interesting to multiple applications; I would be surprised if the app would end up being the only user
	21:28:46 <TimStarling> avoid joins so that you can hack up some cross-server thing later if need be, query groups or something
	21:29:01 * anomie is actually more interested in "BagOStuff vs some custom abstraction" over "where exactly do we hide the database table"
	21:29:06 <TimStarling> this whole project seems like something that could be done in hardly any more time than it takes to have this meeting
	21:29:16 <brion> Hehe
	21:29:32 <gwicke> simple key-value starage, sure
	21:29:34 <SMalyshev> defining what to do is often longer than doing it... it's normal :)
	21:29:42 <gwicke> but that's hardly a solution
	21:29:47 <dbrant> +1 to gwicke -- i can easily see mobile web and/or desktop surfacing reading lists for logged-in users.
	21:29:50 <robla> anomie: should we try to answer that question now?
	21:29:52 <Scott_WUaS> :)
	21:30:25 <DanielK_WMDE__> Is this spec in line with what we need? https://datatracker.ietf.org/doc/draft-dejong-remotestorage/
	21:30:29 <brion> That might be a more interesting question for the watchlist cross wiki magic future discussion :)
	21:30:39 <anomie> robla: Wouldn't hurt. Although "action API vs restbase" is an even more interesting question, since it's the difference between me writing it and me not ;)
	21:30:49 <DanielK_WMDE__> if so, perhaps we can rely on some ready-to-go solution.
	21:31:07 <brion> I just don't want mobile apps to get bogged down in the meantime while we bikeshed reading lists
	21:31:17 <gwicke> dbrant: once we have multiple clients, it might make more sense to provide an API that gives a bit more guarantees than just "it's a blob of bytes"
	21:31:25 <SMalyshev> DanielK_WMDE__: does it have TLDR description?
	21:31:47 <DanielK_WMDE__> SMalyshev: https://remotestorage.io/
	21:32:48 <gwicke> updating the schema when all your schema handling is implemented in $n clients sounds hard
	21:32:58 <SMalyshev> DanielK_WMDE__: this seems to be client-side storage?
	21:33:26 <tgr> DanielK_WMDE__: how would auth work?
	21:33:30 <SMalyshev> or the picture is misleading
	21:33:41 <DanielK_WMDE__> SMalyshev: no, it's not.
	21:33:50 <tgr> DanielK_WMDE__: "If <auth-dialog> is a URL, the user can supply their credentials for accessing the account (how, is out of scope)
	21:33:58 <tgr> - not so promising
	21:33:59 <DanielK_WMDE__> SMalyshev: the client picks it's storage provider. local is one option. all providers implement the same protocol
	21:34:00 <anomie> DanielK_WMDE__: At a glance, that looks like it would serve the purpose. It wouldn't be able to be done in the context of the action API, but that's not a requirement (assuming it's not me writing it, anyway).
	21:34:00 <Marybelle> Is 65,535 bytes really insufficient?
	21:34:25 <brion> Long names add up and it's hard to make guarantees :)
	21:34:25 <DanielK_WMDE__> tgr: they seem to use oauth
	21:34:29 <Marybelle> That sounds like a lot of page IDs.
	21:34:43 <brion> 64k would likely be enough in practice until it breaks on some extreme case
	21:34:43 <robla> SMalyshev: it looks like the storage is OAuth protected server storage (that basically provides similar constraints to most client storage solutions)
	21:35:00 <tgr> forcing all apps to go through an extra oauth authorization screen might be suboptimal
	21:35:14 <DanielK_WMDE__> using an actual standard to implement this would be nice.
	21:35:27 <DanielK_WMDE__> forhtering the idea of "clients pick where they store their data" is even nicer
	21:35:29 <robla> DanielK_WMDE__ agreed
	21:35:32 <tgr> anyway, that protocol seems like it might be a better solution but evaluating it within this IRC meeting is not realistic
	21:35:49 <brion> Well ideally you'd probably want the same login for login and storage in a case like this
	21:35:52 <DanielK_WMDE__> even if we implement this ourselves, perhaps we should implement the protocol defined there.
	21:36:20 <DanielK_WMDE__> tgr: i agree. i'll link to it from the ticket
	21:36:23 <brion> I am curious about it and it smells useful for unofficial apps and such, do bring it up in future :)
	21:36:24 <tgr> and worth mentioning again that building a k-v store that relies on the action API is super simple
	21:36:27 <dbrant> Marybelle: brion: for reference, the current record holder for the most pages in reading lists in our app is over 7000.
	21:36:50 <tgr> bulding a new API that should match a draft protocol with OAuth 2 and whatnot is no
	21:36:52 <niedzielski> i don't think 64k would be too small personally. it would encourage favoring distinct keys instead of json blobs
	21:36:54 <tgr> not
	21:37:08 <robla> #info DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC
	21:37:12 <brion> 7000 dang! Might fit in int IDs bit maybe not titles yeah :)
	21:37:38 <brion> Anyway that bits changeable if we need it
	21:38:01 <brion> Easy to bump the column type up one
	21:38:32 <SMalyshev> I understand that if we use IETF one we'd need a client for remoteStorage protocol and a backend. which should rely on some kind of k'v storage inside mediawiki
	21:39:07 <SMalyshev> so we're kind of back to sq. 1? maybe with more standard client API
	21:39:20 <robla> so...my understanding is that the Mobile Apps team plans to prototype something this quarter; should they use one of our existing tech, or should they explore something new?
	21:39:38 <brion> So we have some interest in that protocol; and some talk about tweaking user props with a modified MW API method, and some talk about just adding a table still. Any other current main alternatives?
	21:39:56 <brion> :)
	21:40:11 <tgr> restbase I suppose?
	21:40:15 * robla notes that the "IETF one" is a draft that hasn't made it to "Proposed Standard", and that submitting an IETF draft is trivial
	21:40:28 <SMalyshev> tgr: does restbase authenticate?
	21:41:35 <Pchelolo> SMalyshev: restbase can authenticate
	21:42:02 <brion> Ok so that's another possibility yes :)
	21:42:24 <brion> Though I think gwicke would prefer restbase services to be better specified for their data schemas ?
	21:42:41 <brion> And this is still a young feature likely to change in details
	21:42:46 <Marybelle> dbrant: How common is a reading list of 7000 pages? Just one user? A dozen users?
	21:42:52 <dbrant> robla: to be clear, this isn't a blocker for our current goals this quarter. it would simply make our reading lists become a "complete" feature, the way we intended it to be (technically, two quarters ago :) ).
	21:42:55 <brion> Eventually hopefully merging into fancy watchlists
	21:43:25 <robla> dbrant: thanks for the clarification!
	21:43:34 <brion> That helps us set timelines :)
	21:43:38 <Marybelle> brion: I get pretty frustrated at how bad watchlists are, especially when I see people basically working around them instead of investing resources to fix them. :-/
	21:43:51 <brion> Understandable!
	21:44:13 <brion> I think we need to bring some more focus on that I agree
	21:44:46 <gwicke> added a comment re versioning & API stability at https://phabricator.wikimedia.org/T128602#2500338
	21:44:48 <Marybelle> For the general idea of private reading lists, it's kind of maddening that watchlists won't work.
	21:44:54 <Marybelle> Sigh.
	21:45:07 * anomie wouldn't mind working on watchlists, but there are currently enough other cooks in that kitchen and that's not the problem at hand here.
	21:45:10 * robla tries to remember what ori and Steven Walling were pushing for a few years ago on the watchlist front
	21:45:14 <DanielK_WMDE__> SMalyshev: the point was that client libraries and backend implementations exist, we wouldn't have to write them (if the current ones are good - which i don't know)
	21:45:39 <SMalyshev> DanielK_WMDE__: I don't think backend which we need (i.e. with mediawiki auth) exists?
	21:45:42 <DanielK_WMDE__> (sorry for jumping back to this, don't let me distract you9
	21:45:53 <brion> dbrant: does the enhanced user pref model sound like a good short term for you as a sync mechanism? I think we all still agree fancier watchlist integration would be great for future
	21:45:55 <DanielK_WMDE__> SMalyshev: backends with oath exist
	21:46:25 <tgr> with a page lists API it really helps if you have a good idea what features you'll need exactly
	21:46:37 <dbrant> Marybelle: there are over 100 users with at least 1000 pages in lists. Over 4000 users with at least 100 pages, etc...
	21:46:42 <tgr> that's one place where Gather dug itself into the ground
	21:46:55 <tgr> a k-v store is ideal for prototyping
	21:47:21 <SMalyshev> DanielK_WMDE__: yeah but oauth against what? we need something on mediawiki side to do actual r/w... even if oauth plugs seamlessly there. Maybe I just don't understand yet what that API does :)
	21:47:24 <tgr> for reading lists, and for a number of (non-list-related) future features I'd imagine
	21:47:39 <dbrant> brion: i think that can definitely work, as long as it can handle a large number of keys.
	21:48:07 <brion> Large number of keys should work with that model yes. You'd need a bulk lookup thiugh too?
	21:48:12 <tgr> DanielK_WMDE__: note that this seems to be using OAuth 2 which MediaWiki does not support
	21:48:18 <anomie> dbrant, brion: "as long as it can handle a large number of keys" is a good question. Again, https://phabricator.wikimedia.org/T128602#2476545
	21:48:22 <DanielK_WMDE__> tgr: hm, right...
	21:48:24 <brion> Heh
	21:48:37 <tgr> probably not a huge undertaking to fix but way larger than the one proposed in this RfC
	21:48:44 <Marybelle> robla: https://www.mediawiki.org/wiki/Requests_for_comment/Support_for_user-specific_page_lists_in_core
	21:49:18 <robla> ah, that's the one, thanks Marybelle
	21:49:25 <dbrant> brion: right, lookup too.
	21:50:16 <dbrant> but then, if we're talking about a short term solution, we can limit things on the client end, too.
	21:50:32 <Scott_WUaS> (anomie: just noticed that you're mentioned in this BBC article "Meet the 'bots' that edit Wikipedia" - http://www.bbc.com/news/magazine-18892510 :)
	21:51:26 <tgr> one thing that hasn't been discussed is how much effort it would take to prevent abuse / how afraid we are it would happen
	21:51:38 <brion> dbrant: ok let's maybe model a couple variants. Large blob, vs row per title? Then confirm whether they make sense on a tweaked user props table, and decide whether to look more at the alternatives?
	21:51:53 <tgr> pirates using it for movie distribution or whatnot
	21:52:22 <gwicke> a quota can take care of that
	21:52:36 <dbrant> brion: +1 niedzielski: what do you think of that? &
	21:52:45 <brion> tgr: good question. There's little we can do to prevent use of user props as a file sharing or DoS space usage against us, other than "it's inconvenient and there are probably easier ways to abuse the system"
	21:53:01 <niedzielski> brion dbrant: one of the problems we consider with row vs blob were race conditions between clients. user options didn't seem well designed to handle that
	21:53:07 <robla> tgr: the security considerations section of https://datatracker.ietf.org/doc/draft-dejong-remotestorage/?include_text=1 looks like a good start on a list
	21:53:13 <brion> Changing API migh make it a bit easier to abuse buy not much
	21:53:49 <niedzielski> brion dbrant: for example in the blob scenario, if two clients try to update the same list, the last client wins. there's also bandwidth concerns for the 7000 title person
	21:53:53 <brion> niedzielski: yeah, you'd have to detect conflicts through some other means like a signal value
	21:54:10 <brion> Goes smoother with smaller bits, but that complicates the filtering
	21:54:15 <tgr> brion: we could do all kinds of usage tracking, user agent filtering etc
	21:54:19 <gwicke> conflict resolution is somewhat orthogonal from storage strategy
	21:54:30 <tgr> but building it is little effort and those things probably aren't
	21:54:33 <brion> Ok were low on time :)
	21:54:48 <brion> robla: shall we plan next steps?
	21:55:06 * robla ponders what that would be
	21:55:25 <gwicke> overall, I'm honestly sceptical about the value of using a generic key-blob storage service for use cases like reading lists
	21:55:39 <brion> I think we want to bump pruoity on the more watchlist specific rfc!
	21:55:44 <gwicke> if the use case is so well defined, then I think it deserves a real API
	21:55:47 <brion> Priority
	21:56:00 <anomie> gwicke: "if"
	21:56:02 <brion> gwicke: agreed, medium to long term
	21:56:18 <robla> #action ArchCom needs to bump the priority on a watchlist specific RFC
	21:56:30 <Scott_WUaS> "if"
	21:56:35 <niedzielski> brion dbrant: IIRC, we also had concerns with a list of page IDs vs a page ID with a list of lists. i don't think we came up with a great way to handle that and had to use the list title as the ID
	21:56:58 <brion> Short term: I'll follow up with dbrant and niedzielski on using user prefs modified and see if that still makes sense
	21:57:29 <brion> And anyone else want to do more research on general user data storage with that protocol?
	21:57:45 <brion> It sounds potentially very useful for unofficial third-party tools and such
	21:57:52 <dbrant> brion: sounds good
	21:57:55 <anomie> brion: I'd suggest to run it by jcrespo too
	21:58:02 <brion> Ah yes good
	21:58:21 <robla> brion: I should probably take some of those action items from you, but yes, this all looks good
	21:58:28 <brion> Hehe ok
	21:58:30 <robla> (thank you for spelling this out!)
	21:58:45 <brion> :)
	21:59:09 <robla> anomie: dbrant - any last comments questions before we close this out?
	21:59:18 <brion> Good discussion folks!
	21:59:28 <Scott_WUaS> Yes!
	21:59:29 <dbrant> robla: nope, really glad to see this moving forward
	21:59:29 * anomie has no more comments at the moment
	21:59:40 <robla> great discussion indeed....thanks everyone!
	21:59:43 <niedzielski> \o
	21:59:45 <brion> Woohoo
	21:59:48 <robla> o/
	21:59:57 <brion> Ok I gotta run, catch y'all later
	22:00:02 <robla> #endmeeting

Event Timeline

Discussed in E237: ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office)

daniel mentioned this in E237: ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office).Dec 9 2016, 7:42 AM

ArchCom-RFC-2016W30-irc-E237.txtActivePublicActions

Event Timeline

ArchCom-RFC-2016W30-irc-E237.txt
ActivePublic
Actions