Page MenuHomePhabricator
Paste P2968

2016-04-27 ArchCom-RFC meeting: Support language variants in the REST API (E168, T122942)
ActivePublic

Authored by RobLa-WMF on Apr 27 2016, 10:06 PM.
21:00:24 <TimStarling> #startmeeting RFC meeting
21:00:24 <wm-labs-meetbot> Meeting started Wed Apr 27 21:00:24 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:24 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:24 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
21:00:35 <cscott> ah, i was wondering if I was in the right place
21:00:45 <TimStarling> #topic RFC: Support language variants in the REST API | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
21:01:21 <gwicke> hi
21:01:32 <Zppix> hello
21:01:49 <robla> o/
21:01:49 <Scott_WUaS> Hello
21:02:05 <gwicke> so, we have been thinking about the best way of exposing different languages and their variants in the REST API
21:02:07 <Scott_WUaS> Congratulations, Rob re Architecture Committee!!!
21:03:10 <gwicke> while the focus is on the REST API (which brings some requirements and constraints like caching), it is closely related to the bigger question of how we represent different language selections in URLs / requests
21:03:35 <robla> phab meeting: https://phabricator.wikimedia.org/E168 rfc task: T122942
21:03:35 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
21:03:49 <gwicke> DanielK_WMDE__ has started a related non-API discussion at https://phabricator.wikimedia.org/T114662
21:04:42 <gwicke> in that thread, one of the key questions that emerged was about the desired granularity of language selection
21:04:52 <cscott> i think we can come up with a reasonable consensus for the REST API. I don't know about settling any broader questions.
21:05:09 <gwicke> purodha summarizes this question well in https://phabricator.wikimedia.org/T114662#2005122
21:05:14 <TimStarling> can we have a link to the current REST API documentation?
21:05:33 <brion> to clarify, this rfc settles the URL interface for specifying when you're pulling a particular variant, which then opens the further separate question of how to implement the conversion in parsoid etc. correct?
21:05:35 <gwicke> https://en.wikipedia.org/api/rest_v1/?doc
21:05:54 <robla> #link https://en.wikipedia.org/api/rest_v1/?doc current REST API documentation
21:05:56 <gwicke> this RFC is about selecting content languages in the REST API
21:06:50 <cscott> T114662 is about URLs for mediawiki articles; I'm not sure that's strictly related. that is, we can pick some solution for the REST API w/o changing how we do article URLs, or vice-versa.
21:06:50 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
21:07:04 <robla> I made an attempt to enumerate the options under consideration in T122942 . cscott tried to narrow it down
21:07:04 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
21:07:12 <gwicke> another aspect related to the granularity is whether we should expose whether something is a variant / auto-translated vs. a separate project
21:07:57 <cscott> is that really a question?
21:08:01 <gwicke> I personally think that we should have a consistent plan for selecting languages, and an idea for the granularity we are shooting for
21:08:35 <cscott> i think projects and variants are distinct. I don't see any reason to conflate them, or to obscure which project is responsible for a given bit of content.
21:08:42 <brion> to clarify further: we already have domains for projects, correct?
21:09:04 <cscott> brion: arguably too many, if I understand ops correctly.
21:09:08 <brion> :)
21:09:11 <gwicke> yes, we have domains for projects, but some of the projects are actually variants
21:09:14 <TimStarling> I agree with cscott
21:09:15 <gwicke> like zh-yue
21:09:22 <brion> just wondering how one would add a second layer of subdomains without ops killing us
21:09:23 <cscott> gwicke: that is not correct.
21:09:35 <gwicke> zh-yue is a variant of zh
21:09:43 <SMalyshev> but some projects have multiple languages?
21:09:43 <gwicke> in the language sense
21:09:44 <TimStarling> yeah yeah
21:09:49 <TimStarling> and italian is a variant of latin
21:09:51 <cscott> cantonese is a distinct language. the cantonese wiki is a distinct project.
21:10:17 <cscott> importantly, the decisions about which languages (or variants) get their own projects is not a technical matter, it's an issue decided by the community.
21:10:26 <TimStarling> but it's not really about linguistics, like cscott says, we have a very clear user-facing concept of a wiki and there's no sense conflating it with automatic translation
21:10:50 <TimStarling> trying to do so would potentially cause conflicts
21:10:55 <brion> secondarily we have commons, meta, mediawiki.org, etc which may carry pages of multiple different languages, have templates that render differently in different lanugages etc
21:11:05 <TimStarling> it would be quite possible to have a zh-yue variant of zh, and simultaneously a zh-yue wiki
21:11:20 <brion> i would tend towards a solution that treats language variants and alternate language renderings of the same page on the same project similarly
21:11:31 <cscott> brion: yes, that's more what T114662 is discussing. i'm not convinced it is best to discuss that at the same time as T112942.
21:11:31 <stashbot> T112942: [Regression] PHP version check broken in load.php and api.php - https://phabricator.wikimedia.org/T112942
21:11:31 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
21:11:35 <gwicke> cscott brought up the prospect of conflicts, but I think it's not clear to me that we actually want to have many different ways of accessing content in a given language variant
21:11:36 <brion> while treating "project id" as totally separate
21:11:47 <brion> cscott: how would they differ?
21:12:23 <SMalyshev> I'd say we should treat zh-yue variant on zh wiki the same way as we treat zh language on commons - one way of many of representing same wiki's data
21:12:31 <cscott> brion: the commons issue introduces the notion of "interface language" as concept distinct from "source language" or "rendered variant".
21:13:07 <DanielK_WMDE> brion: +1
21:13:08 <cscott> the language-neutral parts of commons or metawiki is presented in an "interface language". the actual content has a "source language" which may (or may not) be rendered into a particular variant.
21:13:19 <brion> hmm
21:13:31 <DanielK_WMDE> brion: confating project ids with language ids causes no end of pain for wikidata & co.
21:13:34 <cscott> templates are part of the "interface", and so separating "content" from "interface" is not always straightforward.
21:13:35 <DanielK_WMDE> it's a *bad* idea
21:13:45 <cscott> it's a very interesting discussion, i just don't think it's strictly related to the REST API discussion.
21:13:47 <SMalyshev> cscott: it's not only interface, as GUI - the data itself can have different language representations
21:13:58 <brion> are template localizations selected against $wgUserLang or something else?
21:13:59 <cscott> SMalyshev: right. wikidata has that issue as well.
21:14:15 <gwicke> you are discussing the granularity bit
21:14:18 <brion> i think it's related insofar as if they're not the same thing they're almost identical and need to be treated similarly
21:14:24 <DanielK_WMDE> some wikidata api meodules allow language filters to be specified
21:14:25 <cscott> brion: i don't think that was quite decided. we discussed that at the last dev summit, w/o reaching consensus.
21:14:47 <DanielK_WMDE> but i don't see a good way to generalize this. the semantics and specifics really depend on the module
21:14:51 <brion> so localized templates today are done based on user language, so far as i know, as there is no other mechanism yet
21:15:11 <DanielK_WMDE> in some cases, you have a target language, in others, you specify a fallback chain. in some cases, the languages act as a filter, in others they trigger translitteration
21:15:21 <cscott> T114640 is also related to the interface language question.
21:15:21 <stashbot> T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640
21:15:50 <cscott> and DanielK_WMDE has been the lead on those issues (just establishing context for others new to the discussion)
21:15:50 <DanielK_WMDE> cscott: btw, the api also has an interface message, for error messages.
21:15:56 <gwicke> generally, the REST API is exposing content in a given language, and optimizes for cacheability
21:15:58 <brion> so it sounds like we have two distinct language settings: content language and UI language, each of which may have a variant
21:16:11 <brion> *and* the namespace of variants overlaps with the namespace of languages potentially
21:16:19 <brion> can we verify that last point as true/false?
21:16:28 <cscott> brion: strictly speaking, you also have various fallbacks based on logged-in user preferences as well.
21:16:41 <cscott> brion: consider an en-gb variant on enwiki and simplewiki.
21:16:48 <gwicke> yes, variants can be either auto-translated, or separate projects as with zh-yue
21:17:11 <brion> cscott: right, so en-gb may be either a standalone language or a variant of en
21:17:12 <cscott> we *could* rename things as necessary to ensure projects and variants never overlap, but that's never been necessary before.
21:17:20 <subbu> brion, ah interesting reg. content and ui language and each of them having variants ... i hadn't realized that additional complexity. so, ui language only affects ui messages and content language affects content represented in wikitext?
21:17:23 <TimStarling> is there anyone other than gwicke who supports option #1?
21:17:33 <TimStarling> everyone else seems to have spoken against it
21:18:03 <brion> ui language affects any wikitext content that varies based on {{#userlang}} (is that the right function name?) or whatever equivalent lua magic, i suppose
21:18:24 <cscott> subbu: it should be said that "ui language" is partially a fiction at this point. that is, it exists in our minds but doesn't have a clear expression in mediawiki code... yet.
21:18:24 <TimStarling> I would like to propose that we reject option #1 and move on to discussing the other options
21:18:29 <gwicke> TimStarling, I think you are a bit premature with your question
21:18:52 <cscott> T114640 and T114662 are attempts to codify "ui language" in the codebase (among other things)
21:18:53 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
21:18:53 <stashbot> T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640
21:18:56 <robla> #info question discussed "do the namespace of variants overlap with the namespace of languages?"
21:18:58 <brion> i would reject option 1 (adding domains)
21:19:31 <brion> domains should map directly to high-level projects, which are separate and distinct places you can interact with
21:19:37 <brion> which may, or may not, have anything to do with languages
21:19:42 <cscott> brion: +1
21:19:54 <brion> eg meta and commons are not languages :)
21:20:03 <gwicke> those are different levels of the domain
21:20:03 <robla> #info TimStarling and brion propose rejecting the "adding domains" option
21:20:33 <brion> wikisource.org is one site (multilingual), de.wikisource.org is another (which happens to be centered on one content language)
21:20:40 <robla> I attempted to clarify what the 4 options under consideration are here: https://phabricator.wikimedia.org/T122942#2244988
21:21:34 <cscott> can i suggest renarrowing the discussion to the REST APIs? Or do we think that it's worthwhile to discuss DanielK_WMDE's more general questions about article URL paths? (T114*)
21:21:34 <stashbot> T114: The order of tasks in Phabricator Boards doesn't always save - https://phabricator.wikimedia.org/T114
21:21:37 <SMalyshev> I agree, I think domain should be project,. for some projects it defines language, but for others it doens't so putting it there will only confuse matters
21:21:49 <cscott> ah, i was trying to shut up stashbot by using the wildcard. didn't work...
21:22:13 <TimStarling> ok, so can we discuss option #2 versus option #3?
21:22:31 <DanielK_WMDE> cscott: we should at least answer the question why the two should be different.
21:22:38 <subbu> in https://phabricator.wikimedia.org/T122942#2144512 .. bianjiang proposes http headers in case that is a candidate worth considering ...
21:22:38 <brion> cscott: article url paths are distinct from rest api urls, but language selection for on-wiki translations seems roughly identical to this problem in scope and rules and should probably be treated together
21:23:00 <DanielK_WMDE> or whether they should follow the same pattern. having two different solutions to the same probloem isn't nice. if it is the same problem.
21:23:03 <DanielK_WMDE> that's the question
21:23:32 <cscott> DanielK_WMDE: the API paths in https://en.wikipedia.org/api/rest_v1/?doc don't look (to me) anything like article paths.
21:23:35 <gwicke> fwiw, we are using domains heavily internally in the REST API, and will very likely continue to use them for variants as well
21:23:51 <gwicke> the question is whether this should be hidden from view (only internal), or exposed
21:24:01 <cscott> domains specify the project aka database
21:24:10 <TimStarling> I don't think you should use domains internally for variants
21:24:16 <TimStarling> you could always fix that
21:24:24 <DanielK_WMDE> cscott: no, but if we decide to use subdomains for content variants, we probably should do the same for the api, no?
21:24:34 <gwicke> and whether we really want both https://zh.wikipedia.org/zh-yue/ * and https://zh-yue.wikipedia.org/zh-yue/*
21:24:53 * DanielK_WMDE does not think we should have variants in the subdomains
21:25:05 <gwicke> TimStarling, domains are unique ids for projects
21:25:06 <cscott> DanielK_WMDE: not sure. the article paths have all sorts of human-friendly features, like if you're logged in and hit the generic /wiki/{title} path you get content according to your user preferences.
21:25:36 <cscott> DanielK_WMDE: that should probably be a redirect or something eventually to preserve cacheability. but the point is that article URLs are meant for people to consume. the REST URLs are not.
21:25:39 <DanielK_WMDE> gwicke: to me, they mean different things: the first one is denotes a variant transformation, the second one separate content.
21:26:05 <TimStarling> option 2 would be something like /en-gb/page/html/Australia , right?
21:26:08 <cscott> gwicke: i don't think you can decide "whether we really want both https://zh.wikipedia.org/zh-yue/ * and https://zh-yue.wikipedia.org/zh-yue/*". that's a matter for the communities of those wikis to decide.
21:26:19 <DanielK_WMDE> cscott: true, but why not follow the same patterns, and use the same mechanims?
21:26:22 <brion> gwicke: if they exist, they exist
21:26:28 <gwicke> cscott, it is also a product question for wmf
21:26:32 <DanielK_WMDE> cscott: note that API responses *are* specific to the user language
21:26:42 <TimStarling> and option 3 something like /page/variant/en-gb/html/Australia ?
21:26:43 <cscott> gwicke: it is not a question that can be settled in an RFC meeting.
21:26:57 <gwicke> DanielK_WMDE: no, api responses are based on the wiki's content language
21:27:08 <TimStarling> i.e. with option 3 you avoid mixing language codes with API endpoint identifiers
21:27:20 <TimStarling> so you can have an Apiaka language or whatever
21:27:26 <TimStarling> which seems elegant to me
21:27:34 <DanielK_WMDE> gwicke: they supprt uselang. and i think it defaults to the user's ui language - but i could be wrong
21:27:45 <DanielK_WMDE> gwicke: i think it's only used for error messages, but still
21:27:49 <gwicke> DanielK_WMDE, the REST API does not support uselang
21:27:50 <cscott> i think your option 3 syntax is fine.
21:28:10 <gwicke> TimStarling, so the proposal is /zh-yue/api/rest_v1/...?
21:28:24 <DanielK_WMDE> gwicke: so, no localized error messages?
21:28:25 <gwicke> or /api/rest_v1/variant/zh-yue/...?
21:28:30 <gwicke> DanielK_WMDE, nope
21:28:38 <DanielK_WMDE> shame ;)
21:28:40 * gwicke shudders
21:28:40 <TimStarling> the second one
21:28:53 <cscott> /api/rest_v1/page/variant/en-gb for option 3. i'm not sure what the full path for option 2 is.
21:28:54 <TimStarling> /variant introduces the language code
21:29:02 <robla> #info question discussed: should we use the "option 3" syntax proposed in the meeting?
21:29:15 <DanielK_WMDE> i don't really like the "variant" but. in my mind, you specify the desired target language
21:29:31 <DanielK_WMDE> if the desired target language is a variant of the actual content language, we can transform the content
21:29:39 <SMalyshev> this would be solution only for variants, not multilingual content, right?
21:29:41 <TimStarling> but like cscott says, and like my example just now, I'm not saying it should be a prefix for the whole REST API like /api/rest_v1/variant/zh-yue
21:29:42 <DanielK_WMDE> if it's something else, well, then we can't transform
21:29:52 <cscott> instead of "variant" maybe "langconvert"? LanguageConverter is the name of the code which is doing the transformation.
21:29:56 <TimStarling> I think it could be under /page
21:30:08 <brion> ok, so on zh.wikipedia.org i can confirm I can both set my ui language *and* pick a content variant
21:30:08 <DanielK_WMDE> how about just "language" instead of "variant"?
21:30:22 <DanielK_WMDE> if the original content is multilingual or language neutral, all kinds of target languages could be supported
21:30:26 <DanielK_WMDE> think commons or wikidata
21:30:34 <gwicke> ui language is a separate concept, and largely irrelevant for the REST API
21:30:46 <gwicke> generally, the REST API is aimed at client-side UI composition
21:30:52 <cscott> To take DanielK_WMDE's side for a moment -- what if you want to specify user interface language, not just a variant conversion.
21:30:58 <brion> gwicke: rest api can serve HTML of rendered pages correct? if so, it must known UI language to pass it thorugh for rendering of templates
21:31:02 <gwicke> which means that clients can use whatever UI language they like, but consume data in another language
21:31:07 <brion> or else we need an alternate way to handle translatable templates
21:31:12 <TimStarling> ideally a REST API would have HATEOAS-style hyperlinks, right?
21:31:19 <cscott> unfortunately, the REST API *does* have UI stuff, in so far as there are templates on commons/etc which are part of the UX, not part of the "content" per se.
21:31:19 <DanielK_WMDE> SMalyshev: i really want variants and multilang content to work the same. translated content is different.
21:31:37 <TimStarling> so you should be able to get a listing of available variants with /variant/
21:31:46 <DanielK_WMDE> cscott: the target language isn't hte "ui" language. is the desired language for the content.
21:31:48 <SMalyshev> DanielK_WMDE: me too, but the proposed option 3 wouldn't do that, iiuc
21:32:15 <cscott> DanielK_WMDE/SMalyshev: right now I think we're agreed that translated content is handled as an article suffix, right? Foo is the article, Foo/en-gb is the translation.
21:32:25 <brion> TimStarling: if we have per-page content rev, need to make sure we can ask for list of variants per-page right?
21:32:40 <gwicke> cscott, that is already taken, so would break existing apis
21:32:51 <SMalyshev> cscott: it's not translation. I.e. commons description can be in English and German, they are not translations
21:33:06 <TimStarling> brion: yes, I suppose so
21:33:17 <SMalyshev> cscott: if you had API that gets commons image data, including description, wouldn't you want to specify which language description you want?
21:33:26 <cscott> DanielK_WMDE: you can be viewing zhwiki in the simplified variant, yet have your UX language set to english or german. the templates in File:* (in theory) should respect your UX language. as far as I understand it.
21:33:47 <subbu> I am getting confused by this discussion .. I thought the proposal that seemed that might work was "/page/variant/en-gb/html/Australia". Am I mistaken?
21:33:59 <gwicke> cscott, UI language is mostly irrelevant
21:34:03 <TimStarling> but having subpaths of articles is a bit awkward when they can contain slashes in the titles
21:34:25 <gwicke> TimStarling, as you know, slashes are encoded
21:34:26 <TimStarling> you would have to encode them as %2F
21:34:33 <cscott> subbu: i think the discussion on the REST api is pretty solid, Tim's suggestion seems good. but DanielK_WMDE wants to use a consistent mechanism for article URLs as well, and that's a harder problem.
21:34:55 <gwicke> but, as I said, the suffix path is already in use (for revision selection)
21:35:14 <DanielK_WMDE> cscott: yes, translated content uses suffixes. independent content uses subdomains. but the "target language" for multilang content should be as the "variant" for transformable trext
21:35:22 <cscott> i'm sympathetic to DanielK_WMDE's desire, of course. i think it's an interesting question. but it does make the structure of this meeting somewhat challenging. ;)
21:35:33 <cscott> gwicke: Foo%2Fen-gb
21:35:37 <TimStarling> ok, so if you have a /variant suffix then that can achieve brion's goal of listing variants on a per-page basis
21:36:11 <gwicke> the only thing I can see working so far is something like /api/rest_v1/page/variant/{something}/...
21:36:13 <brion> i'm not a big fan of overloading suffixes, but at least it's 100% distinguishable from revision numbers
21:36:25 <TimStarling> /page/Australia/variant/ could give a list of variants
21:36:27 <cscott> DanielK_WMDE: so in my example for zhwiki, how do you solve that problem? you can't localize text in a variant if your UX language is different?
21:36:43 <DanielK_WMDE> gwicke: i'm good with that if we replace "variant" with "language"
21:36:51 <TimStarling> /page/Australia/variant/en-gb/html could give the HTML in the en-gb variant
21:37:11 <brion> cscott, DanielK_WMDE: i tend to agree it'd be ideal to merge variant and language but i may be wildly incorrect ;)
21:37:14 <gwicke> ../page/html/Australia/12345 is already that revision of Australia
21:37:16 <brion> *UI language
21:37:34 <gwicke> and ../page/html/Australia/12345/ lists renders of that revision
21:37:35 <brion> gwicke: /\d+/ does not match "variant"
21:37:40 <cscott> brion: i'm just not sure how to actually render a zhwiki page if you set "en-gb" as your language.
21:37:40 <DanielK_WMDE> cscott: not if the target language is defined to be the UI language. this is the case for wikidata. for zhwiki, the target language is taken from the url path, so no problem, right?
21:37:49 <brion> it's very easy to distinguish those, though you may not wish to ;)
21:37:50 <TimStarling> gwicke: that's why you always need a keyword in the path, for extensibility
21:37:53 <gwicke> brion, that's.. ewww...
21:38:02 <brion> gwicke: yeah :)
21:38:11 <brion> there's a conflict between positional parameters and named parameters here
21:38:13 <DanielK_WMDE> cscott: for commons and wikidata, the target language should probably always be the user's ui language. but for zhwiki and co, perhaps it shouldn't. not sure
21:38:24 <brion> you can always add positional parameters but urls get ugly when there's a million empty ones
21:38:35 <brion> and named parameters in url path part pairs feel weird
21:38:48 <cscott> DanielK_WMDE: unfortunately, if you don't run languageconverter for *some* specific variant, you get text which is a mishmash of character sets which basically no one can read.
21:39:04 <TimStarling> it's not strictly named parameters, it's still hierarchical, there's a defined order
21:39:07 <brion> cscott: sure, in that case run to some reasonable default .... oh shit politics ;)
21:39:16 <cscott> DanielK_WMDE: hence my feeling that it's best to separate the "pick a variant" part from the "ux language" part.
21:39:26 <cscott> brion: yeah.
21:39:42 <DanielK_WMDE> brion: variant and target language are handled in the same place internally: Content::getParserOutput gets Content that is in language X (or multilang) and is asked for output in language Y. if Y is a variant of X, a transformation can be applied.
21:39:42 <brion> wow this is a way more controversial topic than i expected
21:39:55 <cscott> i mean, i could be convinced that we can just pick some behavior arbitrarily and this is a corner case and it won't matter in the end. i just haven't quite been convinced of that yet.
21:40:13 <subbu> brion, as far as i know this has always been a controversial topic.
21:40:17 <DanielK_WMDE> cscott: my point is that it's "pick a target language", not "pick a variant". the target language may or may not be tied to the ui language.
21:40:32 <gwicke> there's only two hard problems in computer science..
21:40:38 <TimStarling> for user language you can have /variant/en-gb/userlang/en-au
21:40:38 <subbu> brion, sorry misinterpreted .. you said: "way more" ..
21:40:44 <brion> DanielK_WMDE: i tend to like that model, but agree that we may not know what importance of corner cases will be
21:40:45 <cscott> DanielK_WMDE: yes, but isn't the point of the T114* bugs to try to separate those languages internally?
21:40:45 <stashbot> T114: The order of tasks in Phabricator Boards doesn't always save - https://phabricator.wikimedia.org/T114
21:40:54 <DanielK_WMDE> cscott: when viewing commons content, you want to specify the output language. that's not a variant. and it might be different from your ui language (though i find that a bit pointless)
21:40:58 <TimStarling> but it's hierarchical, it's not key-value, you can't have /userlang/en-au/variant/en-gb, it's not in the schema
21:41:03 <brion> (subbu: url structure for apis is usually boring stuff)
21:41:14 * subbu nods
21:41:34 <brion> the actual details of the converter yeah :DD
21:41:52 <gwicke> what is the use case for this userlang stuff?
21:41:53 <DanielK_WMDE> cscott: the point is to internally have a clear notion of the (stored) content language, the desired target language, and the effective output language.
21:42:00 <DanielK_WMDE> ...and the UI language
21:42:15 <cscott> gwicke: labels for commons and wikidata metadata, like field labels, etc.
21:42:20 <gwicke> remember that this is an API exposing data
21:42:22 <gwicke> not UX
21:42:25 <DanielK_WMDE> four languages instead of two-plus-odd-bits
21:42:33 <cscott> https://phabricator.wikimedia.org/T114662 describes some of the use cases
21:43:00 <DanielK_WMDE> gwicke: i'm not sure, i'm talking about a target language. i don't see how the user language playes into this.,
21:43:50 <gwicke> in MW terms, what we are interested here is the *content language*
21:43:54 <DanielK_WMDE> cscott: in wikidata, we would tie the target language to the UI language. but the api shouldn't know or care, and it could be different on other projects
21:44:00 <brion> the main reason to specify both would be to say 'i'm viewing in language X but need to look at content for language Y'... but i think in a world where UI is more separate from content things may change a bit in the semantics
21:44:28 <brion> eg is it ok for the template that links to translations to *not* be translated in french when i look at https://www.mediawiki.org/wiki/Manual:Extension_registration?uselang=fr ?
21:44:52 <brion> currently https://www.mediawiki.org/wiki/Manual:Extension_registration english and https://www.mediawiki.org/wiki/Manual:Extension_registration/fr french pages are distinct, but the template at the top localizes to whatever my uselang is
21:44:53 <cscott> gwicke: again, the problem is that some of our "content" contains "interface" elements. it sucks, but that's how it is.
21:45:06 <brion> is the template content? or is it meta-ui?
21:45:16 <gwicke> templates are content as far as I am concerned
21:45:21 <brion> even if we remove crap like labeling the "Table of contents" or "edit links" we still have those
21:45:22 <DanielK_WMDE> brion: yes, that's the question of when and how the target language should be tied to the ui language. it's an interresting one, but not one we need to answere in the context of todays rfc, i think
21:45:46 <brion> DanielK_WMDE: my concern is just that if we add "/variant" on the end do we have to scramble next week to add "/uselang" ?
21:45:55 <TimStarling> DanielK_WMDE: right, it doesn't need to be answered, and really a lot of your comments have been a distraction
21:46:01 <brion> hehe
21:46:04 <cscott> The {{int}} template/parser function is also interesting.
21:46:16 <brion> if we think it's ok to treat those at different times, then i withdraw much of my conversation for now :)
21:46:18 <TimStarling> what we need is to answer gwicke's actual implementation problem in a way that is reasonably forwards-compatible
21:46:39 <TimStarling> and we can discuss all the things we can do with that forwards-compatibility some other day
21:46:39 <cscott> i still like /page/variant/{foo}
21:46:56 <DanielK_WMDE> TimStarling: i'm sorry to hear that. all i want is really to not call it a variant, but a target language, and think in these terms. no further derailment intended
21:47:08 <gwicke> cscott, I think so far that's the only proposal that would not break existing apis
21:47:09 <cscott> sorry, /page/langconvert/{foo}
21:47:22 <cscott> that will be specific to "invoke the language converter as apost processor"
21:47:27 <gwicke> (apart from domains, which everybody seems to dislike)
21:47:32 <brion> does langconvert return html same as /page/html/{foo}?
21:47:33 <cscott> we can figure out some cool way to unify these later, maybe.
21:48:00 <cscott> brion: yeah, sorry. it should be like tim wrote it. /page/langconvert/en-gb/html/...
21:48:07 <gwicke> brion, it would be a mirror of the page hierarchy
21:48:21 <brion> hmm, that sounds ok for that
21:48:23 <cscott> or /page/langconvert/en-gb/page/html/... even.
21:48:24 <gwicke> so /api/rest_v1/page/variant/zh-yue/html/Foo
21:48:36 <brion> but if we add a second option, how do we reconcile the two tree prefixes?
21:49:19 <gwicke> a second option for language selection?
21:49:28 <brion> or is it safe to in future extend semantics of /page/variant/zh-yue/html/Foo to support /page/variant/fr/html/Foo ?
21:49:29 <cscott> brion: best case: I take everything after /langconvert/{code} and pass it back into REST, and do the language conversion on the output.
21:49:31 <gwicke> are you thinking about regions?
21:49:35 <brion> gwicke: for target language that isn't a variant
21:49:52 <cscott> so if /page/coolness/ is ever a thing, then /page/langconvert/en-gb/coolness/... will Just Work.
21:49:55 <gwicke> does it matter whether it's a variant?
21:50:03 * DanielK_WMDE is good with /page/langconvert/{foo}
21:50:08 <subbu> .. /langconvert/<content_lang>:<ui_lang>/ if ever that ui_lang needs to be added? otherwise /langconvert/<content_lang>/ works?
21:50:33 <subbu> .. /page/langconvert/... i mean
21:50:35 <cscott> subbu: ui_lang is actually part of template expansion, not language conversion. sadly.
21:50:36 <DanielK_WMDE> gwicke: what do you mean by "language selection" exactly?
21:50:42 <gwicke> zh.wikipedia.org/api/rest_v1/page/lang/en-gb/html/Foo
21:50:47 <cscott> ie, it influences how the {{int
21:50:53 <cscott> }} template is expanded.
21:51:04 <subbu> oye.
21:51:05 <gwicke> DanielK_WMDE, select the content language
21:51:19 <brion> langconvert feels like a very specialized filter, like mobile-text
21:51:33 <cscott> so it would be /page/langconvert/en-gb/ui_lang/de/html/ArticleTitle, in one version of the future.
21:51:39 <DanielK_WMDE> gwicke: does that select where the content is loaded from? i.e. the project?
21:51:40 <gwicke> one thing I'm concerned about with schemes like this is what it does to the API documentation
21:51:57 <gwicke> it will basically duplicate the bulk of the API docs in a second hierarchy
21:52:13 <TimStarling> I would be happy to approve a range of possible path-based schemes at this point
21:52:22 <cscott> gwicke: some of the api endpoints shouldn't be necessary for /langconvert/
21:52:26 <TimStarling> with the exact scheme at the discretion of the implementor
21:52:34 <cscott> ie, listing revisions. that can be done on the main /page endpoint.
21:53:04 <brion> cscott: revision comments need to be run through the converter don't they?
21:53:07 <gwicke> yeah, which makes it even more subtle
21:53:16 <cscott> i'd like to suggest that we discuss DanielK_WMDE's general language questions in a follow-up meeting, not too long from now.
21:53:33 <cscott> brion: those come from the action api, not from rest.
21:53:42 <robla> it sounds like there's a tradeoff between cachable URL schemes and ease of documentation with tools like Swagger
21:53:46 <brion> ugh
21:53:56 <cscott> brion: and parsoid doesn't really implement "revision comment" parsing, which differs from normal parsing in a bunch of obscure and painful ways.
21:53:59 <TimStarling> I don't want to bikeshed, I just want it to be done
21:54:20 <Scott_WUaS> cscott: sounds good - i'd like to suggest that we discuss DanielK_WMDE's general language questions
21:54:31 <gwicke> TimStarling, the reason we wrote this RFC is that we want to do this consistently with the general strategy of language selection
21:54:34 <gwicke> so lets not rush it
21:54:59 * subbu is happy with path-based schemes
21:55:27 <robla> I'm happy to help someone (cscott?) to come up with a concise list of open questions for this RFC
21:55:34 <cscott> well, i think that variant conversion is currently "next" on my plate, after balanced templates. but it will still be a while before any patch i write is actually ready to be deployed into production.
21:55:34 <gwicke> it wouldn't make sense to have several different path-based ways of selecting language variants, for example
21:56:02 <cscott> robla: i think we've got a reasonable consensus on an interim solution, but concern over the more general questions of DanielK_WMDE is preventing us from finalizing anything.
21:56:07 <cscott> (which i actually agree with)
21:56:29 <cscott> so i think the way to make further progress here is to actually grapple with the more general url scheme question, then return here and see if the solution to that problem bears on this one.
21:57:01 <gwicke> what we are looking for is basically option 2
21:57:08 <DanielK_WMDE> i don't want to derail or stonewall this or related rfcs.
21:57:18 <gwicke> a uniform path-based way of selecting language variants
21:57:21 <brion> are we otherwise happy with the notion of zh.wikipedia.org/api/rest_v1/page/lang/zh-hant/html/Foo with the open question of whether zh-hant can be replaced with en/fr/etc in a way that will be consistent?
21:57:29 <cscott> well, we're not holding anything up until i've actually got a patch in hand. which i don't yet.
21:57:43 <DanielK_WMDE> i just want to make sure we have a good concept of how we handle languages in general
21:57:54 <brion> or do we need to ponder more before committing to that model?
21:57:57 <subbu> brion, i think DanielK_WMDE preferred /langconvert/ over /lang/ i think unless i misunderstood it.
21:58:12 <brion> ist egal zu mir, as the germans say :D
21:58:15 <SMalyshev> I think /page/lang/ would be the most neutral one without overfocusing the semantics
21:58:15 <brion> i'll take langconvert
21:58:23 <gwicke> the issue with /langconvert/ et al is that it's a one-off solution for the REST API
21:58:24 <DanielK_WMDE> subbu, brion: i'm good with /lang/. "convert" is an implementation detail.
21:58:30 <brion> though lang is happy yeah
21:58:41 * brion "take it to #wikimedia-bikeshed!" ;)
21:58:42 <gwicke> rather than someting that will work for articles as well
21:58:50 <DanielK_WMDE> subbu: i just don't want /variant/, because i think it's too narrow
21:58:50 <robla> DanielK_WMDE: cscott : is there an action item for DanielK_WMDE to write up a generalized RFC for URL policy?
21:59:06 <subbu> DanielK_WMDE, ok .. thanks for clarifying.
21:59:16 <cscott> gwicke: yeah, but a one off solution might be enough for now. it might turn out that the more general /page/html/lang/foo/balh solution internally dispatches to /page/langconvert/ to do the actual language conversion part.
21:59:26 <brion> TimStarling: what say you? we're coming up on time
21:59:36 <TimStarling> yes, fine
21:59:38 <DanielK_WMDE> robla: i could wrinte an rfc that is just about terms and concepts, not about code at all.
21:59:41 <cscott> so maybe /page/langconvert doesn't actually have to be a part of the public api in the end. but it's a useful narrow solution to the immeditate implementation issue.
21:59:44 <gwicke> cscott, that doesn't make sense
21:59:53 <gwicke> the url you propose is already in use
22:00:19 <cscott> i don't like /lang/ specifically because it's more general than i'm happy with right now. i'm not convinced that language converter and the other languages involved can be unified in the end.
22:00:25 <brion> agh did i mean /api/rest_v1/lang/zh-hant/page/html/Foo ?
22:00:32 <cscott> maybe they can be. but at the moment i'd like a narrow solution to a specific problem.
22:00:42 <TimStarling> time's up now
22:01:02 <gwicke> brion, it might make sense to shift it up one or more levels, yes
22:01:23 <brion> [it may be worth considering it a filter like /page/mobile-text/{title} that might go away some day in favor of a more general solution]
22:01:25 <gwicke> also in the running: /zh-yue/api/rest_v1/...
22:01:32 <gwicke> and /zh-yue/wiki/...
22:01:33 <TimStarling> who is going to update the RFC page? gwicke or cscott?
22:01:44 <robla> May 4 meeting: https://phabricator.wikimedia.org/E169 about PSR-6
22:01:49 <cscott> i think it's gwicke's RFC
22:02:04 <gwicke> we should perhaps update DanielK_WMDE's RFC as well
22:02:18 <cscott> brion: "[it may be worth considering it"... yes, that's what i'm suggesting.
22:02:24 <DanielK_WMDE> gwicke: in what way?
22:02:26 <TimStarling> #action gwicke to update T122942 to summarise the options discussed here and remove the rejected option
22:02:27 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
22:02:37 <brion> cscott: +1
22:03:25 <TimStarling> #action DanielK_WMDE to write an RFC discussing the philosophical nature of language
22:03:37 <robla> lol
22:03:44 <DanielK_WMDE> TimStarling: hehe ;)
22:03:49 <Scott_WUaS> :) +1
22:03:53 <brion> haha
22:03:58 <TimStarling> #endmeeting