Page MenuHomePhabricator
Paste P3254

ArchCom-RFC-2016W24-irc-E213.txt
ActivePublic

Authored by RobLa-WMF on Jun 15 2016, 10:09 PM.
Tags
None
Referenced Files
F4169828: ArchCom-RFC-2016W24-irc-E213.txt
Jun 15 2016, 10:09 PM
Subscribers
None
21:02:08 <robla> #startmeeting T120452 Technical aspects of Data namespace blob storage
21:02:08 <wm-labs-meetbot`> Meeting started Wed Jun 15 21:02:08 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:08 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:08 <wm-labs-meetbot`> The meeting name has been set to 't120452_technical_aspects_of_data_namespace_blob_storage'
21:02:08 <stashbot> T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
21:02:09 <yurik> hi everyone, thx for making it to the discussion that shall change the face of the earth... again
21:02:35 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
21:02:45 <cscott> let's change all the things!
21:02:58 <yurik> cscott, oh no, not again!
21:04:12 <SMalyshev> question: so this thing is using content handler from JsonConfig?
21:04:38 <robla> so, Yuri, things ArchCom just discussed in our last meeting (E212) is "is this a question for ArchCom?" and "should this be a formal RFC?"
21:04:56 <yurik> SMalyshev, correct
21:05:34 <DanielK_WMDE> I propose that we don't discuss "on which domain shall it live" today. that's a community/product question which should be discussed elsewhere, i think
21:05:53 <yurik> robla, there are clearly two topics, just as brion said. The social aspect - should this be on commons - should probably be discussed with the community, and so far the majority of commons seems to be in favour. What I do want this meeting to address is the technical aspects
21:06:00 <DanielK_WMDE> that is to say, that's the part of this that i think is not a question for archcom
21:06:21 <yurik> ok, seems like we are in an agreement on that one
21:06:21 <SMalyshev> yurik: ok. It may be worth it to make it more search-friendly, but I guess that is generic for all JsonConfig things then
21:06:49 <yurik> SMalyshev, sure, search friendliness is definitly on the todo list
21:06:51 <DanielK_WMDE> SMalyshev: how well do you think will the structured search interface you are working on work with tabluar data?
21:07:20 <DanielK_WMDE> would it be possible to search for values in spoecific columns, for example? if the content handler does it right, i mean
21:07:23 <SMalyshev> DanielK_WMDE: it might, but I wouldn't go that far for starters. I'd just start with something like being able to search the description of the dataset
21:07:28 <cscott> (i personally like the idea of making a namespace available from every wiki, then letting social pressures move stuff around as necessary. eg if i want to index the winners of the US "Dancing with the Stars" maybe that live on enwiki in Data:DancingWithTheStars.json, since that's US-specific and there's a different "dancing with the stars" in basically every country.)
21:07:36 <SMalyshev> DanielK_WMDE: search inside the data is much bigger fish to fry
21:08:12 <cscott> hm. i agree that search can be done later.
21:08:20 <DanielK_WMDE> it actually shouldn't be much harder than making the description searchable. but yea, let's start6 small
21:08:21 <SMalyshev> yurik: my next question: is it attached to specific namespace or can be on any namespace?
21:08:27 <cscott> like search of wiki articles is essentially separate from storage/viewing those articles
21:08:31 <brion> of course we'll have articles on several different countries' shows in each language...
21:08:49 <SMalyshev> cscott: not anymore :)
21:09:06 <cscott> brion: right, at that point it starts to make sense to move it to commons. but the community can decide this themselves organically.
21:09:12 <SMalyshev> cscott: at least not completely. I'm working on a patch that makes ContentHandler handle both
21:09:14 <brion> hmmm, interesting
21:09:35 <cscott> SMalyshev: "not anymore" re search? not sure what you were responding to.
21:09:36 <DanielK_WMDE> SMalyshev: i suggest to rely on the content model, not the namespace, for any special processing. currently, content models are mostly bound to namespaces, but that can change
21:09:38 <brion> cscott: so put the system on *every* wiki that doesn't opt-out, plus a common backing, like File: uploads?
21:09:50 <SMalyshev> cscott: yes, re search. But let's not get too offtopic :)
21:09:58 <cscott> brion: that would be my suggestion. it also ensures we're running the same software everywhere.
21:10:12 <cscott> i have a proposal for a new template system that could really use data blob storage. for example.
21:10:13 <SMalyshev> DanielK_WMDE: yes, I agree. But how you specify that you want to create page in certain content model?
21:10:19 <brion> *nod* sensible :D
21:10:27 <yurik> cscott, i am not too happy about allowing it everywhere. We already got burnt by it waaay too many times - i would much rather keep it in one place, allow multiple wikis to reuse it, and name it accordingly - "dancing with the stars US 2015"
21:10:28 <cscott> https://phabricator.wikimedia.org/T114454
21:10:50 <DanielK_WMDE> SMalyshev: currently, via the namespace or a title suffix (like .css or .js)
21:10:56 <cscott> yurik: sure, default should be commons, but that should be social pressure not technical restriction.
21:11:02 <SMalyshev> DanielK_WMDE: ok, got it
21:11:11 <SMalyshev> sounds reasonable
21:11:17 <DanielK_WMDE> SMalyshev: but that'S just at the time of creation. after that, we should just look at the model associated with the page
21:11:26 <robla> #chair DanielK_WMDE brion robla TimStarling Krinkle
21:11:26 <wm-labs-meetbot`> Current chairs: DanielK_WMDE Krinkle TimStarling brion robla
21:11:32 <DanielK_WMDE> it's actually stored separately for every revision
21:11:41 <yurik> cscott, i would wait until community explicitly demands that feature. It is much easier to enable than to disable
21:11:55 <yurik> we can always enable it later if needed, but disabling is next to impossible
21:12:05 <yurik> so for the step 1 i would really like just 1 wiki
21:12:09 <DanielK_WMDE> yurik: T120452 says "CSV, TSV, JSON, XML". But we are down to just a specific JSON format now, right?
21:12:10 <stashbot> T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
21:12:11 <yurik> and wait for the feedback
21:12:18 <SMalyshev> only thing is that now we are using model that's called JsonConfig for some things that are definitely not config... but I guess it's too late to rename...
21:12:30 <yurik> DanielK_WMDE, my first goal is to get two types of data out : tabular and maps
21:12:41 <cscott> yurik: will https://phabricator.wikimedia.org/T91162 let me refer to the data store on commons as if it were local? (like instantcommons does)?
21:12:50 <yurik> maps would allow geojson storage (so that all maps can overlay with extra stuff), and tabular - .... is tabular :)
21:13:00 <DanielK_WMDE> yurik: maps as in geo-shapes?
21:13:05 <yurik> DanielK_WMDE, yes
21:13:06 <DanielK_WMDE> or full maps?
21:13:08 * robla will brb
21:13:08 <cscott> yurik: one issue might be (for example) country-specific data, which then gets hung up on different countries/wiki's ideas of which are valid countries
21:13:09 <yurik> and pushpins
21:13:12 <cscott> ie, is "taiwan" a country?
21:13:37 <cscott> yurik: allowing zhwiki to override the table from commons with a zhwiki-specific table is a valuable way to defuse that situation
21:13:50 <cscott> and i think wikidata already has something like this, where certain facts are only true for certain wikis?
21:13:59 <DanielK_WMDE> cscott: they can just choose to use a different table. much simpler solution.
21:14:13 <cscott> DanielK_WMDE: how does wikidata handle this?
21:14:15 <yurik> cscott, no, not via shadow. The current implementation will only target Lua and Graph users at first, which means Lua will simply say mw.data.get('Page.tab') - and use that data
21:14:23 <SMalyshev> yurik: I wonder if we need a higher level API to operate with such data. I.e. if I want to store a tabular data set, I don't want to need to know specific JSON scheme (which could also change)...
21:14:23 <DanielK_WMDE> but i thin the "where does it live" question is out of scope here.
21:14:28 <yurik> we can add shadow later if requested
21:15:10 <SMalyshev> yurik: and by operate I mean not just read (Lua probably covers that) but also write
21:15:15 <TimStarling> presumably Lua will see the decoded object, not the JSON-encoded format?
21:15:22 <cscott> i'm fine with deploying first without instantcommons/shadow and on a single wiki, but i'd like to state for the record that, if this functionality turns out to be useful, we'll eventually need that functionality. we should ensure that we're not *foreclosing* that possibility, even if we're not initially enabling it.
21:15:29 <DanielK_WMDE> cscott: in theory, wikis can pick specific statements over the default for a specific property, or they can filter by well-known authorities being cited as sources. i don't think this is actually being done, but the data model is specifically designed to allow this
21:15:29 <yurik> cscott, the idea here is to provide the most basic usage that will cover 80%. If we get strong desire for 1) multi-wiki storage, or 2) multi-wiki overrides, we can always add that
21:16:02 <yurik> TimStarling, correct
21:16:17 <yurik> TimStarling, more specifically, Lua will get the json as table
21:16:40 <DanielK_WMDE> yurik: i see some overlap between the use cases of wikidata queries and tabular data. it would be nice if the formats and interfaces would be very similar, of not identical.
21:16:40 <yurik> so that it has access to all the meta fields
21:16:47 <yurik> BUT, we can provide additional helpers to resolve the multi-lingual resolution
21:16:47 <cscott> again, not many people have grokked T114454 yet, but the basic idea there is to separate code, content, and presentation, so every template potentially will have a "data" component, along with the "code" and the "presentation" component.
21:16:47 <stashbot> T114454: [RFC] Visual Templates: Authoring templates with Visual Editor - https://phabricator.wikimedia.org/T114454
21:17:02 <yurik> DanielK_WMDE, agree - we have discussed it briefly with Lydia_WMDE
21:17:41 <cscott> assuming some basic separation of that happens in the future, we'll want the data namespace to be roughly on par with the template namespace. ie, shadowed from a default on commons, override-able from specific wikis.
21:17:56 <YairRand_> maybe have data blobs be a wikidata datatype, like commons file
21:18:05 <DanielK_WMDE> yurik: to be compatible with wikidata, the representation of data values would have to become more complex. but i'm not sure whether we should require that, or offer it as an optional ferature.
21:18:12 <DanielK_WMDE> yurik: btw, multilingual is pretty complex when it gets to the nitty gritty. do you think it's really needed from the start?
21:18:14 <yurik> DanielK_WMDE, lets sync up afterwards to see if we can match wikidata api with this, or if they should go different routes
21:18:46 <brion> could also do things like instead of multilingual text, refer to a wikidata item and then look up its data by name .... though that may have perf issues with large batches
21:18:47 <brion> :D
21:18:49 <DanielK_WMDE> yurik: not tonight, i'm going to bed after this :) will you be at wikimania?
21:19:06 <yurik> DanielK_WMDE, absolutelly - i really think simple "multi-lingual" feature that will allow a fallback is something we need from the start
21:19:07 <TimStarling> ideally Lua would see a read-only wrapper object like what is returned from mw.loadData()
21:19:12 <DanielK_WMDE> brion: yes, wikidata Q-id is a very useful datatype to have
21:19:16 <yurik> DanielK_WMDE, sadly, no Wikimania for me - no budget :(
21:19:28 <TimStarling> that way you don't have to clone it for each #invoke instance
21:19:39 <DanielK_WMDE> yurik: sad. let's find another time and place then.
21:20:01 * aude wavez
21:20:10 <DanielK_WMDE> yurik: it would be nice to re-use the language-fallback stuff we have in wikidata. we should factor it out into a library, i guess.
21:20:15 <yurik> DanielK_WMDE, indeed it is :( Yes, lets. I will schedule a hangout with you. Anyone else - pls poke me if you want to parttake-
21:20:19 <DanielK_WMDE> hey aude
21:20:33 <yurik> DanielK_WMDE, sure, but i already have something like that for the zero banners that i'm reusing
21:20:35 <yurik> but sure
21:20:58 * robla returns to the meeting he's allegedly chairing :-)
21:21:11 <SMalyshev> yurik: so what do you think about having higher-level API to manipulate specifically the tabular data?
21:21:17 * yurik thinks chairing !== participating ;)
21:21:40 <yurik> SMalyshev, "manipulate" is out of the scope i think at this point. I'm all for it though :)
21:21:54 <DanielK_WMDE> yurik: do you want to start out with your own (simpler) data types for now (and spend tiem to specify them properly)? or do you want to go with the representation that is used by the wikidata? that's already in reusable libraries
21:21:55 <SMalyshev> yurik: thinking forward :)
21:21:57 <yurik> especially because i can totally see some pages being custom-defined to store data in the backing SQL
21:22:03 <yurik> SMalyshev, ^
21:22:38 <yurik> DanielK_WMDE, i would like to match datatypes in wikidata as much as possible, but probably only provide a subset of them from the start
21:23:04 <SMalyshev> yurik: well, there's two venues here: a) run external query, store results on wiki (I don't want to know too much details about how wiki stores it)
21:23:14 <DanielK_WMDE> yurik: yea, an "table aware" storage backend is an interesting idea, and fits in with the blob-store refactoring i'm thinking about. but it's for later.
21:23:14 <yurik> if that means re-implementing some of it first - lets, because otherwise we might spend years making it perfect only to realize that community needs are totally orthogonal to wikidata usage
21:23:25 <SMalyshev> yurik: and b) represent internal query as data set on wiki (e.g. WDQS query)
21:23:47 <yurik> SMalyshev, yep, that's what DanielK_WMDE is talking about i think. But again, lets not discuss it now :)
21:23:49 <SMalyshev> all that needs clean API so that clients don't know too much
21:23:57 <yurik> otherwise we might be redesigning SQL engine next ;)
21:24:16 <SMalyshev> that's why I mention it - if we make it too specific now, it'd be hard to change it ater
21:24:18 <SMalyshev> *later
21:24:47 <yurik> SMalyshev, that i agree. But remember, the use case here is for Lua to GET EVERYTHING, and deal with it. If we say we want SQL-like GET EVERYTHING THAT MATCHES THE WHERE CLAUSE, we might get into all sorts of weird issues
21:25:13 <yurik> especially because we might go the route that is not needed (yet or ever)
21:25:26 <brion> huge-data-set needs are quite different yes
21:25:27 <DanielK_WMDE> yurik: ok then. have you looked at how wikidata represents data values? e.g. look at https://www.wikidata.org/wiki/Special:EntityData/Q42.json
21:25:38 <SMalyshev> yurik: Lua is a good enough API if we don't get too specific about the structure
21:26:08 <yurik> exactly brion - that's what we actually discussed in the task earlier - dealing with large datasets is a very different beast, with a different reqs
21:26:11 <aude> DanielK_WMDE: i'm not sur eabout duplicating all the json for each value
21:26:12 <DanielK_WMDE> yurik: e.g. we have something like {"snaktype":"value","property":"P577","datavalue":{"value":{"time":"+2002-01-01T00:00:00Z","timezone":0,"before":0,"after":0,"precision":9,"calendarmodel":"http://www.wikidata.org/entity/Q1985727"},"type":"time"},"datatype":"time"}
21:26:13 <stashbot> P577 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P577
21:26:24 <aude> if all the values have the same calendar, e.g.
21:26:30 <DanielK_WMDE> yurik: the "value" thing is what i think should be in your table fields.
21:26:54 <aude> then have calendar, before /after + array of timestamps?
21:27:15 <DanielK_WMDE> yurik: the "time" data type is akind of a nice nasty example. it'S json representation really isn't too great, and i'd love to change it... we'll probably have to use a new type id for the new version, not sure yet
21:27:24 <aude> maybe precision might vary though
21:27:37 <yurik> DanielK_WMDE, i would prefer to go with the tabular data as defined by the industry for tabular data (see the bug), but for specific datatypes like time - sure
21:28:01 <DanielK_WMDE> aude: yea, i at least wouldn't dublicate the time. we could have a "defaults" row, that gets merged into every value.
21:28:11 <aude> the json is very verbose
21:28:12 <DanielK_WMDE> a bit hacky, but woudl work...
21:28:21 <aude> just saying...
21:28:24 <yurik> DanielK_WMDE, btw, time is not part of this proposal just yet :)
21:28:32 <yurik> too complex to have it in ver1
21:28:42 * aude agrees with yurik
21:28:49 <aude> start simple
21:29:06 <yurik> it can be easily added later - simply add a new type, and make the value object mean as DanielK_WMDE described above
21:29:22 <DanielK_WMDE> yurik: yea, sure. for string-based types, sime literals work fine. for numbers, too. once we get into measured quantities, things get more complex
21:29:38 <yurik> DanielK_WMDE, default row is fairly complex - should it be the "null" that gets used as the default?
21:29:41 <aude> counts could be ok
21:29:45 <SMalyshev> oh let's not get into units :)
21:30:07 <yurik> agree, lets get back to overall strategy :)
21:30:13 <DanielK_WMDE> yurik: we should make sure to avoid naming conflicts. if you define a type name and use it with a different format than wikibase does, that will become annoying
21:30:21 <robla> #info much of the first half of the discussion was about defining datatypes
21:30:24 <yurik> DanielK_WMDE, agree
21:31:14 <DanielK_WMDE> yurik: the idea with the defaults row wats that you can e.g. say that all dates in a column use the same caledar, or all coordinates refer to earth, without havign to repeat that info for every field. but that's an optimization that can be added later.
21:31:49 <yurik> ver1 datatypes: strting, numbers, multilingual strings. I don't even know if i want to allow bools for now.
21:32:11 <robla> #info DanielK_WMDE and yurik agree to try to avoid naming conflicts (e.g. with wikibase types)
21:32:13 <aude> multilingual gets somewhat complex
21:32:23 <yurik> these three types should cover almost 90% of the usecases from the start - simply because it will be Lua doing the processing and presenting of the dat
21:32:24 <yurik> data
21:32:25 <SMalyshev> yurik: bools are just numbers 1/0. Or strings yes/no :)
21:32:33 <aude> and numbers (units? no units / counts?)
21:32:34 <DanielK_WMDE> yurik: i think the way you represent multilingual is different from what the DataValues lib does. but wikidata doesn't use multilingual yet, so it can be changed
21:32:44 <yurik> aude, simple JSON numbers
21:32:55 <yurik> which means if you want units, you add a string column
21:32:56 <aude> yurik: like counts
21:33:00 <SMalyshev> aude: I don't think we need units and associated headache. We havent' properly figured them on wikidata even
21:33:14 <aude> SMalyshev: that's why i am asking :)
21:33:16 <DanielK_WMDE> robla: i think data types are a crucial issue. but i agree that we should leave some room for other topics ;)
21:33:41 <yurik> DanielK_WMDE, lets make this part of our wikidata-jsonconfig sync up meeting
21:34:06 <yurik> are there any other issues that people are concerned about?
21:34:22 <aude> yurik: btw, maybe you can visit berlin before SOTM in belgium?
21:34:27 * DanielK_WMDE thinks that values from a query api will actually be full "snaks"...
21:34:30 <SMalyshev> yurik: are there any limits on how big it can get?
21:34:36 <aude> and we can talk more of the details
21:34:44 <yurik> SMalyshev, 2mb - same as a wiki page
21:34:50 <DanielK_WMDE> oh, a visit sounds nice!
21:34:53 <SMalyshev> ok
21:34:55 <yurik> because it uses storage engine
21:35:36 <DanielK_WMDE> do you think we will want to expand to very large data sets later?`
21:35:37 <robla> yurik: can/should you formalize T120452 as an ArchCom-RFC?
21:35:38 <stashbot> T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
21:35:53 <yurik> DanielK_WMDE, aude, do we really want to wait until september to deploy this? JsonConfig has been in production for the past 2 years, for all wikis (as part of the zero system)
21:35:55 <robla> (at least the technical side)?
21:36:28 <yurik> DanielK_WMDE, i don't want to tackle large datasets until after this thing has had some usage, e.g. half a year
21:36:30 <robla> (perhaps T134426 is the right one to focus on)
21:36:30 <stashbot> T134426: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426
21:36:51 <DanielK_WMDE> yurik: that's a pretty brisk pace ;)
21:37:04 <yurik> DanielK_WMDE, agree, i will wait a year until large datasets :D
21:37:05 <DanielK_WMDE> if you really want to support data sets by then, you better start thinking abotu that early
21:37:33 <yurik> but yes, it should be in the back of our minds, but shouldn't be fully speced until later
21:38:03 <yurik> robla, i think there is another task that formalizes how the system works
21:38:05 * yurik looks
21:39:13 * robla waits patiently
21:40:23 <DanielK_WMDE> how about directly transclusing a table into a wiki page? how would that work? do we newed that? or do we rely on lua for that?
21:40:29 <yurik> robla, i think its in https://www.mediawiki.org/wiki/Extension:JsonConfig/Tabular
21:40:55 <yurik> DanielK_WMDE, even though i do have it implemented (as a template expansion), i don't think its a usecase
21:41:02 <yurik> simply because there is really no big reason for it
21:41:18 * robla notes that the Extension:JsonConfig points to T120452
21:41:18 <yurik> it is always very usage dependent - e.g. show a list generated from a table
21:41:36 <DanielK_WMDE> yea, you'd always want som custom stuff anyway
21:41:39 <robla> yurik: is T120452 the right Phab task?
21:41:40 <stashbot> T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML) - https://phabricator.wikimedia.org/T120452
21:42:35 <DanielK_WMDE> can we remove the formats from the title? i think they are misleading now.
21:43:09 <yurik> DanielK_WMDE, agree, but please keep in mind that as part of this discussion i would really like geojson (maps overlays) to be agreed on as well
21:43:20 <yurik> actually geojson is much simpler than tabular
21:43:33 <yurik> it is a well established format, and we are already heavily using it in maps
21:43:36 <robla> yurik: what Phab task do you want to declare as an ArchCom-RFC?
21:43:55 <yurik> robla, that one is fine i think - we some refining to the title and description
21:43:55 <cscott> I wonder if extensions are really the right way to select data type?
21:44:06 <yurik> cscott, what do you mean?
21:44:22 <cscott> since there is some discussion of types, for instance, it might be that we start with a very simple "json" but later have a more typeful "json" with the date-type figured out, etc.
21:44:31 <cscott> .json is going to get overloaded quickly
21:44:40 <cscott> mime types would be much nicer
21:44:47 <cscott> but then that begs the question of where they get stored
21:44:51 <yurik> cscott, i actually don't want .json - it will be heavily misused from the begining, no?
21:44:58 <cscott> still, storing the data type separately from the data/article name is not a bad thing.
21:44:59 <yurik> and we won't be able to do some proper editing
21:45:15 <DanielK_WMDE> cscott: internally, it will be represented as a content model id. the extension is one way to indicate that.
21:45:17 <yurik> if we from the begining define a rigid structure, we can add useful tools
21:45:33 <yurik> so for tabular, VE can have a nice editor of values (like a spreadsheet)
21:45:39 <SMalyshev> .json is way too generic... I think it'd be better if tables and geojson had their own spaces
21:45:40 <yurik> actually it won't even be a VE on commons
21:45:40 <cscott> DanielK_WMDE: will we be able to eventually just associate a mime type with the content?
21:45:43 <DanielK_WMDE> hm... will geo-shapes and tables live in the same namespace? with different suffixes/extensions?
21:45:53 <robla> yurik: If that becomes an ArchCom-RFC, then you won't be the assignee, and Danny Horn will be the author. Is that the desired outcome?
21:45:56 <yurik> DanielK_WMDE, yes
21:46:05 <cscott> i don't mind Data: as the namespace. i'd rather have that than GeoJson: Tables: etc etc
21:46:14 <yurik> robla, i hear you, ok, i will create a new task
21:46:37 <yurik> DanielK_WMDE, example: Data:Don Qixote Trip in Spain.geojson
21:46:37 <DanielK_WMDE> cscott: a content model id, not a mime type. the mime type specifies a serialization format, like json or xml. that's also stored, but kind of redundant. the important info is what model/vocabulary/scheme the data is using.
21:46:45 <cscott> fwiw Scribunto/JS has this same issue -- there's no way to specify which *language* the module is in, in the Module: namespace.
21:46:49 <MaxSem> I agree with cscott
21:46:51 <DanielK_WMDE> cscott: we already do that. that's how contenthandler works.
21:46:57 <brion> my one concern about separating type is that if a table changes type, will that break usage? :)
21:47:11 <yurik> that's why we from the begining define extensions
21:47:13 <robla> #info yurik agrees to create a Phab task for use as an ArchCom-RFC
21:47:19 <brion> (eg if you change an image from .png to .svg you can still use it the same from wiki side, but for tables it may matter more)
21:47:22 <cscott> brion: possibly, but that's no different from a rename breaking usage, or any other edit breaking usage.
21:47:28 <yurik> JsonConfig will be set up to only allow pages that match REGEX
21:47:32 <brion> *nod*
21:47:36 <DanielK_WMDE> cscott: you could indeed use a file extension to indicate whether a modules is JS or Lua. Just add .js or .lua
21:47:40 <brion> and we really should rename JsonConfig ;)
21:47:44 <yurik> so it will be Data:.*\.tab
21:47:54 * cscott is not a fan of file name extensions
21:47:56 <yurik> no other pages will be creatable in the data namespace
21:48:00 <cscott> not i18n friendly
21:48:04 <DanielK_WMDE> cscott: internally, that would just define the content model to use when creating the page
21:48:06 <cscott> not human friendly, really
21:48:18 <yurik> cscott, the only other option is to have multiple namespaces - and the community (and i personally) really hate that
21:48:35 <brion> well, the other option is to have some sort of content model selection in the creation process
21:48:40 <DanielK_WMDE> i kind of like to have that info in the title, cscott
21:48:41 <brion> which implies UI etc
21:48:44 <cscott> no, i'm just saying that the content model should be defined separately (as DanielK_WMDE indicates is already the case under the covers) and not rely on filename extensions
21:48:45 <yurik> brion, sure, that can also work
21:49:15 <cscott> DanielK_WMDE: but the info in the title doesn't mean anything unless you speak english -- or "hacker english" at least
21:49:29 <cscott> and "geojson" doesn't really mean anything to even english speakers
21:49:41 <DanielK_WMDE> cscott: i like to do both. we *can* handle different models without any indicator in the title, but it's *nice* to have that indicator there. we already do this for .css and .js in the MediaWiki and User namespaces
21:49:56 <yurik> cscott, brion, we could create an elaborate system for model selection - is that an absolute blocker/requirement? I really feel that since data will be very technically oriented, people will actually find it better usable
21:50:01 <cscott> DanielK_WMDE: that will probably have to be good enough for now.
21:50:02 <yurik> just like we have File:Blah.json
21:50:16 <yurik> exactly
21:50:21 <yurik> i really like that indicator
21:50:44 <TimStarling> "geojson" hopefully means something to the people who are writing lua modules
21:50:45 <cscott> yurik: i'd just like it clear during the document/evangelization process that filename extensions may be a convenient *shortcut* for specifying the data type, they are only a stopgap and not strictly speaking required. (especially if your native language is not english)
21:50:47 <brion> i'm happy enough with extensions given the existing ecosystem
21:51:09 <cscott> hopefully we'll eventually have more robust article metadata editors, so you can just directly edit the content model
21:51:12 <yurik> remember that we are targeting a very tech savvy community with this until a nice editor system is in place. And when it is, I wouldn't mind a VE to edit the data remotely, without even switching to commons (like we do in Wikidata)
21:51:28 <brion> mmmm, spreadsheet editor
21:51:30 <cscott> brion: and i'm lobbying against them based on where i'd like to see the ecosystem eventually go. ;)
21:51:31 <robla> cscott: file extensions and file types are tied up with one another, despite years of standards bodies trying to make that not be true
21:51:33 <yurik> brion, exactly
21:51:37 <DanielK_WMDE> i notice we are getting close to the end of the meeting.
21:51:46 <DanielK_WMDE> are there any thoughts or comemnts about geojson?
21:51:53 <yurik> brion, T134618
21:51:53 <stashbot> T134618: Implement spreadsheet-like cell editing for tabular data - https://phabricator.wikimedia.org/T134618
21:51:54 <DanielK_WMDE> yurik: how do youo render geo shapes?
21:51:56 <cscott> "tech savvy community" == we systematically exclude potential community members who are not tech savvy
21:51:59 <cscott> that's what i hear, at least
21:52:04 <brion> and if you really want to have fun with file extensions <-> type, try dealing with video containers vs codecs! </runs away>
21:52:19 <yurik> DanielK_WMDE, easy - you just put that geojson inside the <mapframe>...</> wikitext element :)
21:52:34 <robla> brion: amen
21:53:06 <brion> cscott: a legit concern, yes
21:53:15 <DanielK_WMDE> yurik: so there is a hard dependency on the maps extension?
21:53:16 <yurik> cscott, i am by no mean trying to exclude them, but rather understand the users. Non-tech savvy community is the ones that will provide the most value (simply because there is probably a bigger nontechsavy community there), but we should make it nicer and easier for them.
21:53:17 <brion> usability will become a bigger concern once there are tools built up on top of this system
21:53:26 <yurik> DanielK_WMDE, when supporting geojson as storage - yes
21:53:29 <cscott> so long as the file extensions aren't baked hard into the design, i'm happy.
21:53:32 <brion> eg if you already have graphing/table-formatting templates+lua modules ready to use
21:53:34 <DanielK_WMDE> yurik: what do you do if it's not there? just show json as text?
21:53:36 <brion> and a good editor
21:53:41 <DanielK_WMDE> would be ok-ish, i guess
21:53:42 <cscott> just like i'm happy so long as we can *eventually* enable shadow namespaces or instantcommons on this
21:53:44 <yurik> DanielK_WMDE, we could - as a backup
21:54:03 <yurik> cscott, i am having very big doubts about shadow namespaces to be honest
21:54:05 <DanielK_WMDE> cscott: +1
21:54:06 <robla> I think the file extension issue needs to go to wikitech-l
21:54:09 <yurik> but that's a separe discussion :)
21:54:22 <cscott> yurik: instantcommons then. or data: namespaces on every wiki. what you will.
21:54:55 <yurik> cscott, i'm not against it, just doubting the long term viability of it ;)
21:55:10 <cscott> i have faith in kartik ;)
21:55:11 <yurik> but again, we can totally support it if we decide that's the way forward
21:55:18 <DanielK_WMDE> can we confirm that geojson is good to go? i have no objection, but i also know next to nothing about it
21:55:32 <robla> so yurik, thanks for bringing this conversation up on wikitech-l generally. I think there's a lot more to discuss here...and I'm not sure how to do it
21:55:33 <DanielK_WMDE> is anyone around who aqctually knows something about geojson?
21:55:37 <cscott> sure. that's all i'm lobbying for at the moment. leave space for the future, don't do anything that would make it impossible later.
21:55:39 <yurik> DanielK_WMDE, https://www.mediawiki.org/wiki/Help:Extension:Kartographer
21:55:40 <brion> so the alternative on the extension is probably "don't enforce an extension, have everything in the Data: namespace be this tabular format _for now_"
21:55:42 <MaxSem> DanielK_WMDE, /me
21:55:43 <yurik> it has a geojson sample
21:55:43 <robla> DanielK_WMDE: I don't want to confirm anything in this meeting
21:55:51 <brion> with an eventual UI/API extension for picking different content model
21:56:14 <SMalyshev> DanielK_WMDE: well, I know a little about it... nothing that would prevent us from having it on wiki as format :)
21:56:17 <DanielK_WMDE> robla: ok, check that there are no objections at this time ;)
21:56:28 <yurik> brion, i'm not too happy about that - i would much rather say "for now, lets only allow pages in the Data: that match the extension"
21:56:39 <yurik> this way we can put geojson there as well
21:56:43 <yurik> and other formats
21:56:49 <brion> is geojson ready to go?
21:56:54 <yurik> brion, yep
21:56:57 <brion> ah fun
21:56:57 <yurik> it is much easier
21:57:02 <DanielK_WMDE> robla: as in humming ;)
21:57:06 * robla doesn't feel like he understands what's being proposed to have had a chance to object
21:57:08 * aude and soem other people implemented geojson content handler in zurihc
21:57:14 <aude> 2 years ago?
21:57:14 <yurik> geojson is very straight forward - we already have it as part of kartograhper ext
21:57:27 <cscott> i'd say "the content model of the page is defined at page creation type by the extension. but nothing after that point tries to parse the article title for an extension"
21:57:37 <yurik> cscott, agree
21:57:43 <aude> not sure exactly how it would work now, but think it's not too complex
21:57:53 <DanielK_WMDE> cscott: yes, absolutely.
21:57:58 <brion> cscott: that seems sensible yeah
21:58:04 <cscott> it also potentially means you could work around the need for an extension by sneaky renames. ;)
21:58:10 <brion> and allows for the future to drop the extension at creation time
21:58:12 <yurik> cscott, the only limitation - jsonconfig will not allow renaming if the target page name does not match the original regex
21:58:13 <DanielK_WMDE> that's how it works for .js and friends
21:58:25 <cscott> in lieu of having a proper direct edit mechanism for the content model
21:58:31 <yurik> yep
21:58:55 <cscott> yurik: yeah, i'm okay with the rename limitation for now. i just don't want to code to have regexp matches against the page title scattered everywhere.
21:58:59 <robla> we're running out of time. very good discussion; I think I know how to pull open questions out, but I'm not volunteering to do it.
21:59:13 <aude> not sure we need hard dependency on kartographer
21:59:15 <yurik> cscott, oh, thats not there . The content id is stored with the page
21:59:18 <brion> for the .js/.css subpages we also just have predictable naming which is what the exts are for, something not relevant for the primary item
21:59:20 <DanielK_WMDE> yurik: people got upset when they couldn't rename a misnamed foo.jd to foo.js, because the content model mismatched ;) so now they can re-decalre the content. a bit scary...
21:59:22 <yurik> aude, its a soft dep
21:59:36 <aude> yurik: or some generic stuff could be seaprated and used by both things
21:59:38 <yurik> just like many extensions depend on syntax highlighter
21:59:38 <robla> yurik: where should people who are interested continue this discussion?
22:00:47 <yurik> robla, i guess I should create a new task "deploy" ?
22:00:52 * robla plans to type "#endmeeting" by 22:05 UTC
22:00:54 <yurik> as we discussed earlier ?
22:01:05 <cscott> (brion: .js/.css subpages are a bit weird since browsers and webservers still do content-type sniffing based on url extension and other factors; that shouldn't be relevant to the data namespace which is for internal mediawiki use, not for directly serving to web browsers)
22:01:11 <robla> yurik: could you file a quick placeholder task?
22:01:19 <yurik> sec
22:01:32 * robla wishs Phab allowed reassigning the submitter
22:01:51 <brion> cscott: when we serve them as JS/CSS content it's through RL's load.php; their URLs don't end in .js or .css at all :)
22:02:52 <yurik> https://phabricator.wikimedia.org/T137929
22:03:00 <robla> thanks!
22:03:22 <robla> #info conversation will continue at https://phabricator.wikimedia.org/T137929
22:03:24 <yurik> https://phabricator.wikimedia.org/T137930
22:03:31 <yurik> robla, ^ geojson
22:03:37 <yurik> should there be one common one?
22:03:51 <yurik> that discusses the underlying tech? like extensions, etc
22:03:56 <robla> #link https://phabricator.wikimedia.org/T137930 geojson
22:04:08 <aude> thanks yurik :)
22:04:22 <aude> suppose maybe we can also talk at SOTM US :)
22:04:36 <yurik> ok, if needed, i will create another task later
22:04:41 <robla> let's treat T137929 as the parent task
22:04:41 <stashbot> T137929: Enable shared tabular data storage on a shared wiki - https://phabricator.wikimedia.org/T137929
22:04:54 <robla> ok....let's end the meeting
22:04:57 <yurik> :)
22:05:04 <robla> thanks all!
22:05:08 <yurik> thanks robla!
22:05:08 <robla> #endmeeting