HomePhabricator

RFC Meeting: Support language variants in the REST API (2016-04-27, #wikimedia-office)
ActivePublic

Hosted by daniel on Apr 27 2016, 9:00 PM - 10:00 PM.

Description

  • Location: #wikimedia-office IRC channel
  • Meeting type: Field narrowing
  • Time: Weekly, Wednesday 21:00 UTC (2pm PDT, 23:00 CEST)
  • Agenda:
    • T122942: RFC: Support language variants in the REST API

See the Architecture meetings page for more general information about this meeting (also: Phab query: list of upcoming RFC meetings, Phab query: list of all RFC meetings).

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

RobLa-WMF renamed this event from RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Support language variants in the REST API (2016-04-27, #wikimedia-office).Apr 21 2016, 11:22 PM
RobLa-WMF updated the event description. (Show Details)

3:03 PM <wm-labs-meetbot> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-04-27-21.00.html
3:03 PM <wm-labs-meetbot> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-04-27-21.00.txt
3:03 PM <wm-labs-meetbot> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-04-27-21.00.wiki
3:03 PM <wm-labs-meetbot> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-04-27-21.00.log.html

Meeting summary

  • RFC: Support language variants in the REST API | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (TimStarling, 21:00:45)
    • LINK: https://en.wikipedia.org/api/rest_v1/?doc (gwicke, 21:05:35)
    • LINK: https://en.wikipedia.org/api/rest_v1/?doc current REST API documentation (robla, 21:05:54)
    • question discussed "do the namespace of variants overlap with the namespace of languages?" (robla, 21:18:56)
    • TimStarling and brion propose rejecting the "adding domains" option (robla, 21:20:03)
    • question discussed: should we use the "option 3" syntax proposed in the meeting? (robla, 21:29:02)
    • LINK: https://phabricator.wikimedia.org/T114662 describes some of the use cases (cscott, 21:42:33)
    • ACTION: gwicke to update T122942 to summarise the options discussed here and remove the rejected option (TimStarling, 22:02:26)
    • ACTION: DanielK_WMDE to write an RFC discussing the philosophical nature of language (TimStarling, 22:03:25)

Meeting ended at 22:03:58 UTC.

Action items, by person

  • DanielK_WMDE
    • DanielK_WMDE to write an RFC discussing the philosophical nature of language
  • gwicke
    • gwicke to update T122942 to summarise the options discussed here and remove the rejected option

People present (lines said)

  • cscott (81)
  • gwicke (73)
  • brion (69)
  • DanielK_WMDE (51)
  • TimStarling (48)
  • robla (13)
  • subbu (12)
  • stashbot (11)
  • SMalyshev (9)
  • Scott_WUaS (4)
  • wm-labs-meetbot (3)
  • Zppix (1)

Full minutes:

121:00:24 <TimStarling> #startmeeting RFC meeting
221:00:24 <wm-labs-meetbot> Meeting started Wed Apr 27 21:00:24 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:00:24 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:00:24 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
521:00:35 <cscott> ah, i was wondering if I was in the right place
621:00:45 <TimStarling> #topic RFC: Support language variants in the REST API | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
721:01:21 <gwicke> hi
821:01:32 <Zppix> hello
921:01:49 <robla> o/
1021:01:49 <Scott_WUaS> Hello
1121:02:05 <gwicke> so, we have been thinking about the best way of exposing different languages and their variants in the REST API
1221:02:07 <Scott_WUaS> Congratulations, Rob re Architecture Committee!!!
1321:03:10 <gwicke> while the focus is on the REST API (which brings some requirements and constraints like caching), it is closely related to the bigger question of how we represent different language selections in URLs / requests
1421:03:35 <robla> phab meeting: https://phabricator.wikimedia.org/E168 rfc task: T122942
1521:03:35 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
1621:03:49 <gwicke> DanielK_WMDE__ has started a related non-API discussion at https://phabricator.wikimedia.org/T114662
1721:04:42 <gwicke> in that thread, one of the key questions that emerged was about the desired granularity of language selection
1821:04:52 <cscott> i think we can come up with a reasonable consensus for the REST API. I don't know about settling any broader questions.
1921:05:09 <gwicke> purodha summarizes this question well in https://phabricator.wikimedia.org/T114662#2005122
2021:05:14 <TimStarling> can we have a link to the current REST API documentation?
2121:05:33 <brion> to clarify, this rfc settles the URL interface for specifying when you're pulling a particular variant, which then opens the further separate question of how to implement the conversion in parsoid etc. correct?
2221:05:35 <gwicke> https://en.wikipedia.org/api/rest_v1/?doc
2321:05:54 <robla> #link https://en.wikipedia.org/api/rest_v1/?doc current REST API documentation
2421:05:56 <gwicke> this RFC is about selecting content languages in the REST API
2521:06:50 <cscott> T114662 is about URLs for mediawiki articles; I'm not sure that's strictly related. that is, we can pick some solution for the REST API w/o changing how we do article URLs, or vice-versa.
2621:06:50 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
2721:07:04 <robla> I made an attempt to enumerate the options under consideration in T122942 . cscott tried to narrow it down
2821:07:04 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
2921:07:12 <gwicke> another aspect related to the granularity is whether we should expose whether something is a variant / auto-translated vs. a separate project
3021:07:57 <cscott> is that really a question?
3121:08:01 <gwicke> I personally think that we should have a consistent plan for selecting languages, and an idea for the granularity we are shooting for
3221:08:35 <cscott> i think projects and variants are distinct. I don't see any reason to conflate them, or to obscure which project is responsible for a given bit of content.
3321:08:42 <brion> to clarify further: we already have domains for projects, correct?
3421:09:04 <cscott> brion: arguably too many, if I understand ops correctly.
3521:09:08 <brion> :)
3621:09:11 <gwicke> yes, we have domains for projects, but some of the projects are actually variants
3721:09:14 <TimStarling> I agree with cscott
3821:09:15 <gwicke> like zh-yue
3921:09:22 <brion> just wondering how one would add a second layer of subdomains without ops killing us
4021:09:23 <cscott> gwicke: that is not correct.
4121:09:35 <gwicke> zh-yue is a variant of zh
4221:09:43 <SMalyshev> but some projects have multiple languages?
4321:09:43 <gwicke> in the language sense
4421:09:44 <TimStarling> yeah yeah
4521:09:49 <TimStarling> and italian is a variant of latin
4621:09:51 <cscott> cantonese is a distinct language. the cantonese wiki is a distinct project.
4721:10:17 <cscott> importantly, the decisions about which languages (or variants) get their own projects is not a technical matter, it's an issue decided by the community.
4821:10:26 <TimStarling> but it's not really about linguistics, like cscott says, we have a very clear user-facing concept of a wiki and there's no sense conflating it with automatic translation
4921:10:50 <TimStarling> trying to do so would potentially cause conflicts
5021:10:55 <brion> secondarily we have commons, meta, mediawiki.org, etc which may carry pages of multiple different languages, have templates that render differently in different lanugages etc
5121:11:05 <TimStarling> it would be quite possible to have a zh-yue variant of zh, and simultaneously a zh-yue wiki
5221:11:20 <brion> i would tend towards a solution that treats language variants and alternate language renderings of the same page on the same project similarly
5321:11:31 <cscott> brion: yes, that's more what T114662 is discussing. i'm not convinced it is best to discuss that at the same time as T112942.
5421:11:31 <stashbot> T112942: [Regression] PHP version check broken in load.php and api.php - https://phabricator.wikimedia.org/T112942
5521:11:31 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
5621:11:35 <gwicke> cscott brought up the prospect of conflicts, but I think it's not clear to me that we actually want to have many different ways of accessing content in a given language variant
5721:11:36 <brion> while treating "project id" as totally separate
5821:11:47 <brion> cscott: how would they differ?
5921:12:23 <SMalyshev> I'd say we should treat zh-yue variant on zh wiki the same way as we treat zh language on commons - one way of many of representing same wiki's data
6021:12:31 <cscott> brion: the commons issue introduces the notion of "interface language" as concept distinct from "source language" or "rendered variant".
6121:13:07 <DanielK_WMDE> brion: +1
6221:13:08 <cscott> the language-neutral parts of commons or metawiki is presented in an "interface language". the actual content has a "source language" which may (or may not) be rendered into a particular variant.
6321:13:19 <brion> hmm
6421:13:31 <DanielK_WMDE> brion: confating project ids with language ids causes no end of pain for wikidata & co.
6521:13:34 <cscott> templates are part of the "interface", and so separating "content" from "interface" is not always straightforward.
6621:13:35 <DanielK_WMDE> it's a *bad* idea
6721:13:45 <cscott> it's a very interesting discussion, i just don't think it's strictly related to the REST API discussion.
6821:13:47 <SMalyshev> cscott: it's not only interface, as GUI - the data itself can have different language representations
6921:13:58 <brion> are template localizations selected against $wgUserLang or something else?
7021:13:59 <cscott> SMalyshev: right. wikidata has that issue as well.
7121:14:15 <gwicke> you are discussing the granularity bit
7221:14:18 <brion> i think it's related insofar as if they're not the same thing they're almost identical and need to be treated similarly
7321:14:24 <DanielK_WMDE> some wikidata api meodules allow language filters to be specified
7421:14:25 <cscott> brion: i don't think that was quite decided. we discussed that at the last dev summit, w/o reaching consensus.
7521:14:47 <DanielK_WMDE> but i don't see a good way to generalize this. the semantics and specifics really depend on the module
7621:14:51 <brion> so localized templates today are done based on user language, so far as i know, as there is no other mechanism yet
7721:15:11 <DanielK_WMDE> in some cases, you have a target language, in others, you specify a fallback chain. in some cases, the languages act as a filter, in others they trigger translitteration
7821:15:21 <cscott> T114640 is also related to the interface language question.
7921:15:21 <stashbot> T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640
8021:15:50 <cscott> and DanielK_WMDE has been the lead on those issues (just establishing context for others new to the discussion)
8121:15:50 <DanielK_WMDE> cscott: btw, the api also has an interface message, for error messages.
8221:15:56 <gwicke> generally, the REST API is exposing content in a given language, and optimizes for cacheability
8321:15:58 <brion> so it sounds like we have two distinct language settings: content language and UI language, each of which may have a variant
8421:16:11 <brion> *and* the namespace of variants overlaps with the namespace of languages potentially
8521:16:19 <brion> can we verify that last point as true/false?
8621:16:28 <cscott> brion: strictly speaking, you also have various fallbacks based on logged-in user preferences as well.
8721:16:41 <cscott> brion: consider an en-gb variant on enwiki and simplewiki.
8821:16:48 <gwicke> yes, variants can be either auto-translated, or separate projects as with zh-yue
8921:17:11 <brion> cscott: right, so en-gb may be either a standalone language or a variant of en
9021:17:12 <cscott> we *could* rename things as necessary to ensure projects and variants never overlap, but that's never been necessary before.
9121:17:20 <subbu> brion, ah interesting reg. content and ui language and each of them having variants ... i hadn't realized that additional complexity. so, ui language only affects ui messages and content language affects content represented in wikitext?
9221:17:23 <TimStarling> is there anyone other than gwicke who supports option #1?
9321:17:33 <TimStarling> everyone else seems to have spoken against it
9421:18:03 <brion> ui language affects any wikitext content that varies based on {{#userlang}} (is that the right function name?) or whatever equivalent lua magic, i suppose
9521:18:24 <cscott> subbu: it should be said that "ui language" is partially a fiction at this point. that is, it exists in our minds but doesn't have a clear expression in mediawiki code... yet.
9621:18:24 <TimStarling> I would like to propose that we reject option #1 and move on to discussing the other options
9721:18:29 <gwicke> TimStarling, I think you are a bit premature with your question
9821:18:52 <cscott> T114640 and T114662 are attempts to codify "ui language" in the codebase (among other things)
9921:18:53 <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
10021:18:53 <stashbot> T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640
10121:18:56 <robla> #info question discussed "do the namespace of variants overlap with the namespace of languages?"
10221:18:58 <brion> i would reject option 1 (adding domains)
10321:19:31 <brion> domains should map directly to high-level projects, which are separate and distinct places you can interact with
10421:19:37 <brion> which may, or may not, have anything to do with languages
10521:19:42 <cscott> brion: +1
10621:19:54 <brion> eg meta and commons are not languages :)
10721:20:03 <gwicke> those are different levels of the domain
10821:20:03 <robla> #info TimStarling and brion propose rejecting the "adding domains" option
10921:20:33 <brion> wikisource.org is one site (multilingual), de.wikisource.org is another (which happens to be centered on one content language)
11021:20:40 <robla> I attempted to clarify what the 4 options under consideration are here: https://phabricator.wikimedia.org/T122942#2244988
11121:21:34 <cscott> can i suggest renarrowing the discussion to the REST APIs? Or do we think that it's worthwhile to discuss DanielK_WMDE's more general questions about article URL paths? (T114*)
11221:21:34 <stashbot> T114: The order of tasks in Phabricator Boards doesn't always save - https://phabricator.wikimedia.org/T114
11321:21:37 <SMalyshev> I agree, I think domain should be project,. for some projects it defines language, but for others it doens't so putting it there will only confuse matters
11421:21:49 <cscott> ah, i was trying to shut up stashbot by using the wildcard. didn't work...
11521:22:13 <TimStarling> ok, so can we discuss option #2 versus option #3?
11621:22:31 <DanielK_WMDE> cscott: we should at least answer the question why the two should be different.
11721:22:38 <subbu> in https://phabricator.wikimedia.org/T122942#2144512 .. bianjiang proposes http headers in case that is a candidate worth considering ...
11821:22:38 <brion> cscott: article url paths are distinct from rest api urls, but language selection for on-wiki translations seems roughly identical to this problem in scope and rules and should probably be treated together
11921:23:00 <DanielK_WMDE> or whether they should follow the same pattern. having two different solutions to the same probloem isn't nice. if it is the same problem.
12021:23:03 <DanielK_WMDE> that's the question
12121:23:32 <cscott> DanielK_WMDE: the API paths in https://en.wikipedia.org/api/rest_v1/?doc don't look (to me) anything like article paths.
12221:23:35 <gwicke> fwiw, we are using domains heavily internally in the REST API, and will very likely continue to use them for variants as well
12321:23:51 <gwicke> the question is whether this should be hidden from view (only internal), or exposed
12421:24:01 <cscott> domains specify the project aka database
12521:24:10 <TimStarling> I don't think you should use domains internally for variants
12621:24:16 <TimStarling> you could always fix that
12721:24:24 <DanielK_WMDE> cscott: no, but if we decide to use subdomains for content variants, we probably should do the same for the api, no?
12821:24:34 <gwicke> and whether we really want both https://zh.wikipedia.org/zh-yue/ * and https://zh-yue.wikipedia.org/zh-yue/*
12921:24:53 * DanielK_WMDE does not think we should have variants in the subdomains
13021:25:05 <gwicke> TimStarling, domains are unique ids for projects
13121:25:06 <cscott> DanielK_WMDE: not sure. the article paths have all sorts of human-friendly features, like if you're logged in and hit the generic /wiki/{title} path you get content according to your user preferences.
13221:25:36 <cscott> DanielK_WMDE: that should probably be a redirect or something eventually to preserve cacheability. but the point is that article URLs are meant for people to consume. the REST URLs are not.
13321:25:39 <DanielK_WMDE> gwicke: to me, they mean different things: the first one is denotes a variant transformation, the second one separate content.
13421:26:05 <TimStarling> option 2 would be something like /en-gb/page/html/Australia , right?
13521:26:08 <cscott> gwicke: i don't think you can decide "whether we really want both https://zh.wikipedia.org/zh-yue/ * and https://zh-yue.wikipedia.org/zh-yue/*". that's a matter for the communities of those wikis to decide.
13621:26:19 <DanielK_WMDE> cscott: true, but why not follow the same patterns, and use the same mechanims?
13721:26:22 <brion> gwicke: if they exist, they exist
13821:26:28 <gwicke> cscott, it is also a product question for wmf
13921:26:32 <DanielK_WMDE> cscott: note that API responses *are* specific to the user language
14021:26:42 <TimStarling> and option 3 something like /page/variant/en-gb/html/Australia ?
14121:26:43 <cscott> gwicke: it is not a question that can be settled in an RFC meeting.
14221:26:57 <gwicke> DanielK_WMDE: no, api responses are based on the wiki's content language
14321:27:08 <TimStarling> i.e. with option 3 you avoid mixing language codes with API endpoint identifiers
14421:27:20 <TimStarling> so you can have an Apiaka language or whatever
14521:27:26 <TimStarling> which seems elegant to me
14621:27:34 <DanielK_WMDE> gwicke: they supprt uselang. and i think it defaults to the user's ui language - but i could be wrong
14721:27:45 <DanielK_WMDE> gwicke: i think it's only used for error messages, but still
14821:27:49 <gwicke> DanielK_WMDE, the REST API does not support uselang
14921:27:50 <cscott> i think your option 3 syntax is fine.
15021:28:10 <gwicke> TimStarling, so the proposal is /zh-yue/api/rest_v1/...?
15121:28:24 <DanielK_WMDE> gwicke: so, no localized error messages?
15221:28:25 <gwicke> or /api/rest_v1/variant/zh-yue/...?
15321:28:30 <gwicke> DanielK_WMDE, nope
15421:28:38 <DanielK_WMDE> shame ;)
15521:28:40 * gwicke shudders
15621:28:40 <TimStarling> the second one
15721:28:53 <cscott> /api/rest_v1/page/variant/en-gb for option 3. i'm not sure what the full path for option 2 is.
15821:28:54 <TimStarling> /variant introduces the language code
15921:29:02 <robla> #info question discussed: should we use the "option 3" syntax proposed in the meeting?
16021:29:15 <DanielK_WMDE> i don't really like the "variant" but. in my mind, you specify the desired target language
16121:29:31 <DanielK_WMDE> if the desired target language is a variant of the actual content language, we can transform the content
16221:29:39 <SMalyshev> this would be solution only for variants, not multilingual content, right?
16321:29:41 <TimStarling> but like cscott says, and like my example just now, I'm not saying it should be a prefix for the whole REST API like /api/rest_v1/variant/zh-yue
16421:29:42 <DanielK_WMDE> if it's something else, well, then we can't transform
16521:29:52 <cscott> instead of "variant" maybe "langconvert"? LanguageConverter is the name of the code which is doing the transformation.
16621:29:56 <TimStarling> I think it could be under /page
16721:30:08 <brion> ok, so on zh.wikipedia.org i can confirm I can both set my ui language *and* pick a content variant
16821:30:08 <DanielK_WMDE> how about just "language" instead of "variant"?
16921:30:22 <DanielK_WMDE> if the original content is multilingual or language neutral, all kinds of target languages could be supported
17021:30:26 <DanielK_WMDE> think commons or wikidata
17121:30:34 <gwicke> ui language is a separate concept, and largely irrelevant for the REST API
17221:30:46 <gwicke> generally, the REST API is aimed at client-side UI composition
17321:30:52 <cscott> To take DanielK_WMDE's side for a moment -- what if you want to specify user interface language, not just a variant conversion.
17421:30:58 <brion> gwicke: rest api can serve HTML of rendered pages correct? if so, it must known UI language to pass it thorugh for rendering of templates
17521:31:02 <gwicke> which means that clients can use whatever UI language they like, but consume data in another language
17621:31:07 <brion> or else we need an alternate way to handle translatable templates
17721:31:12 <TimStarling> ideally a REST API would have HATEOAS-style hyperlinks, right?
17821:31:19 <cscott> unfortunately, the REST API *does* have UI stuff, in so far as there are templates on commons/etc which are part of the UX, not part of the "content" per se.
17921:31:19 <DanielK_WMDE> SMalyshev: i really want variants and multilang content to work the same. translated content is different.
18021:31:37 <TimStarling> so you should be able to get a listing of available variants with /variant/
18121:31:46 <DanielK_WMDE> cscott: the target language isn't hte "ui" language. is the desired language for the content.
18221:31:48 <SMalyshev> DanielK_WMDE: me too, but the proposed option 3 wouldn't do that, iiuc
18321:32:15 <cscott> DanielK_WMDE/SMalyshev: right now I think we're agreed that translated content is handled as an article suffix, right? Foo is the article, Foo/en-gb is the translation.
18421:32:25 <brion> TimStarling: if we have per-page content rev, need to make sure we can ask for list of variants per-page right?
18521:32:40 <gwicke> cscott, that is already taken, so would break existing apis
18621:32:51 <SMalyshev> cscott: it's not translation. I.e. commons description can be in English and German, they are not translations
18721:33:06 <TimStarling> brion: yes, I suppose so
18821:33:17 <SMalyshev> cscott: if you had API that gets commons image data, including description, wouldn't you want to specify which language description you want?
18921:33:26 <cscott> DanielK_WMDE: you can be viewing zhwiki in the simplified variant, yet have your UX language set to english or german. the templates in File:* (in theory) should respect your UX language. as far as I understand it.
19021:33:47 <subbu> I am getting confused by this discussion .. I thought the proposal that seemed that might work was "/page/variant/en-gb/html/Australia". Am I mistaken?
19121:33:59 <gwicke> cscott, UI language is mostly irrelevant
19221:34:03 <TimStarling> but having subpaths of articles is a bit awkward when they can contain slashes in the titles
19321:34:25 <gwicke> TimStarling, as you know, slashes are encoded
19421:34:26 <TimStarling> you would have to encode them as %2F
19521:34:33 <cscott> subbu: i think the discussion on the REST api is pretty solid, Tim's suggestion seems good. but DanielK_WMDE wants to use a consistent mechanism for article URLs as well, and that's a harder problem.
19621:34:55 <gwicke> but, as I said, the suffix path is already in use (for revision selection)
19721:35:14 <DanielK_WMDE> cscott: yes, translated content uses suffixes. independent content uses subdomains. but the "target language" for multilang content should be as the "variant" for transformable trext
19821:35:22 <cscott> i'm sympathetic to DanielK_WMDE's desire, of course. i think it's an interesting question. but it does make the structure of this meeting somewhat challenging. ;)
19921:35:33 <cscott> gwicke: Foo%2Fen-gb
20021:35:37 <TimStarling> ok, so if you have a /variant suffix then that can achieve brion's goal of listing variants on a per-page basis
20121:36:11 <gwicke> the only thing I can see working so far is something like /api/rest_v1/page/variant/{something}/...
20221:36:13 <brion> i'm not a big fan of overloading suffixes, but at least it's 100% distinguishable from revision numbers
20321:36:25 <TimStarling> /page/Australia/variant/ could give a list of variants
20421:36:27 <cscott> DanielK_WMDE: so in my example for zhwiki, how do you solve that problem? you can't localize text in a variant if your UX language is different?
20521:36:43 <DanielK_WMDE> gwicke: i'm good with that if we replace "variant" with "language"
20621:36:51 <TimStarling> /page/Australia/variant/en-gb/html could give the HTML in the en-gb variant
20721:37:11 <brion> cscott, DanielK_WMDE: i tend to agree it'd be ideal to merge variant and language but i may be wildly incorrect ;)
20821:37:14 <gwicke> ../page/html/Australia/12345 is already that revision of Australia
20921:37:16 <brion> *UI language
21021:37:34 <gwicke> and ../page/html/Australia/12345/ lists renders of that revision
21121:37:35 <brion> gwicke: /\d+/ does not match "variant"
21221:37:40 <cscott> brion: i'm just not sure how to actually render a zhwiki page if you set "en-gb" as your language.
21321:37:40 <DanielK_WMDE> cscott: not if the target language is defined to be the UI language. this is the case for wikidata. for zhwiki, the target language is taken from the url path, so no problem, right?
21421:37:49 <brion> it's very easy to distinguish those, though you may not wish to ;)
21521:37:50 <TimStarling> gwicke: that's why you always need a keyword in the path, for extensibility
21621:37:53 <gwicke> brion, that's.. ewww...
21721:38:02 <brion> gwicke: yeah :)
21821:38:11 <brion> there's a conflict between positional parameters and named parameters here
21921:38:13 <DanielK_WMDE> cscott: for commons and wikidata, the target language should probably always be the user's ui language. but for zhwiki and co, perhaps it shouldn't. not sure
22021:38:24 <brion> you can always add positional parameters but urls get ugly when there's a million empty ones
22121:38:35 <brion> and named parameters in url path part pairs feel weird
22221:38:48 <cscott> DanielK_WMDE: unfortunately, if you don't run languageconverter for *some* specific variant, you get text which is a mishmash of character sets which basically no one can read.
22321:39:04 <TimStarling> it's not strictly named parameters, it's still hierarchical, there's a defined order
22421:39:07 <brion> cscott: sure, in that case run to some reasonable default .... oh shit politics ;)
22521:39:16 <cscott> DanielK_WMDE: hence my feeling that it's best to separate the "pick a variant" part from the "ux language" part.
22621:39:26 <cscott> brion: yeah.
22721:39:42 <DanielK_WMDE> brion: variant and target language are handled in the same place internally: Content::getParserOutput gets Content that is in language X (or multilang) and is asked for output in language Y. if Y is a variant of X, a transformation can be applied.
22821:39:42 <brion> wow this is a way more controversial topic than i expected
22921:39:55 <cscott> i mean, i could be convinced that we can just pick some behavior arbitrarily and this is a corner case and it won't matter in the end. i just haven't quite been convinced of that yet.
23021:40:13 <subbu> brion, as far as i know this has always been a controversial topic.
23121:40:17 <DanielK_WMDE> cscott: my point is that it's "pick a target language", not "pick a variant". the target language may or may not be tied to the ui language.
23221:40:32 <gwicke> there's only two hard problems in computer science..
23321:40:38 <TimStarling> for user language you can have /variant/en-gb/userlang/en-au
23421:40:38 <subbu> brion, sorry misinterpreted .. you said: "way more" ..
23521:40:44 <brion> DanielK_WMDE: i tend to like that model, but agree that we may not know what importance of corner cases will be
23621:40:45 <cscott> DanielK_WMDE: yes, but isn't the point of the T114* bugs to try to separate those languages internally?
23721:40:45 <stashbot> T114: The order of tasks in Phabricator Boards doesn't always save - https://phabricator.wikimedia.org/T114
23821:40:54 <DanielK_WMDE> cscott: when viewing commons content, you want to specify the output language. that's not a variant. and it might be different from your ui language (though i find that a bit pointless)
23921:40:58 <TimStarling> but it's hierarchical, it's not key-value, you can't have /userlang/en-au/variant/en-gb, it's not in the schema
24021:41:03 <brion> (subbu: url structure for apis is usually boring stuff)
24121:41:14 * subbu nods
24221:41:34 <brion> the actual details of the converter yeah :DD
24321:41:52 <gwicke> what is the use case for this userlang stuff?
24421:41:53 <DanielK_WMDE> cscott: the point is to internally have a clear notion of the (stored) content language, the desired target language, and the effective output language.
24521:42:00 <DanielK_WMDE> ...and the UI language
24621:42:15 <cscott> gwicke: labels for commons and wikidata metadata, like field labels, etc.
24721:42:20 <gwicke> remember that this is an API exposing data
24821:42:22 <gwicke> not UX
24921:42:25 <DanielK_WMDE> four languages instead of two-plus-odd-bits
25021:42:33 <cscott> https://phabricator.wikimedia.org/T114662 describes some of the use cases
25121:43:00 <DanielK_WMDE> gwicke: i'm not sure, i'm talking about a target language. i don't see how the user language playes into this.,
25221:43:50 <gwicke> in MW terms, what we are interested here is the *content language*
25321:43:54 <DanielK_WMDE> cscott: in wikidata, we would tie the target language to the UI language. but the api shouldn't know or care, and it could be different on other projects
25421:44:00 <brion> the main reason to specify both would be to say 'i'm viewing in language X but need to look at content for language Y'... but i think in a world where UI is more separate from content things may change a bit in the semantics
25521:44:28 <brion> eg is it ok for the template that links to translations to *not* be translated in french when i look at https://www.mediawiki.org/wiki/Manual:Extension_registration?uselang=fr ?
25621:44:52 <brion> currently https://www.mediawiki.org/wiki/Manual:Extension_registration english and https://www.mediawiki.org/wiki/Manual:Extension_registration/fr french pages are distinct, but the template at the top localizes to whatever my uselang is
25721:44:53 <cscott> gwicke: again, the problem is that some of our "content" contains "interface" elements. it sucks, but that's how it is.
25821:45:06 <brion> is the template content? or is it meta-ui?
25921:45:16 <gwicke> templates are content as far as I am concerned
26021:45:21 <brion> even if we remove crap like labeling the "Table of contents" or "edit links" we still have those
26121:45:22 <DanielK_WMDE> brion: yes, that's the question of when and how the target language should be tied to the ui language. it's an interresting one, but not one we need to answere in the context of todays rfc, i think
26221:45:46 <brion> DanielK_WMDE: my concern is just that if we add "/variant" on the end do we have to scramble next week to add "/uselang" ?
26321:45:55 <TimStarling> DanielK_WMDE: right, it doesn't need to be answered, and really a lot of your comments have been a distraction
26421:46:01 <brion> hehe
26521:46:04 <cscott> The {{int}} template/parser function is also interesting.
26621:46:16 <brion> if we think it's ok to treat those at different times, then i withdraw much of my conversation for now :)
26721:46:18 <TimStarling> what we need is to answer gwicke's actual implementation problem in a way that is reasonably forwards-compatible
26821:46:39 <TimStarling> and we can discuss all the things we can do with that forwards-compatibility some other day
26921:46:39 <cscott> i still like /page/variant/{foo}
27021:46:56 <DanielK_WMDE> TimStarling: i'm sorry to hear that. all i want is really to not call it a variant, but a target language, and think in these terms. no further derailment intended
27121:47:08 <gwicke> cscott, I think so far that's the only proposal that would not break existing apis
27221:47:09 <cscott> sorry, /page/langconvert/{foo}
27321:47:22 <cscott> that will be specific to "invoke the language converter as apost processor"
27421:47:27 <gwicke> (apart from domains, which everybody seems to dislike)
27521:47:32 <brion> does langconvert return html same as /page/html/{foo}?
27621:47:33 <cscott> we can figure out some cool way to unify these later, maybe.
27721:48:00 <cscott> brion: yeah, sorry. it should be like tim wrote it. /page/langconvert/en-gb/html/...
27821:48:07 <gwicke> brion, it would be a mirror of the page hierarchy
27921:48:21 <brion> hmm, that sounds ok for that
28021:48:23 <cscott> or /page/langconvert/en-gb/page/html/... even.
28121:48:24 <gwicke> so /api/rest_v1/page/variant/zh-yue/html/Foo
28221:48:36 <brion> but if we add a second option, how do we reconcile the two tree prefixes?
28321:49:19 <gwicke> a second option for language selection?
28421:49:28 <brion> or is it safe to in future extend semantics of /page/variant/zh-yue/html/Foo to support /page/variant/fr/html/Foo ?
28521:49:29 <cscott> brion: best case: I take everything after /langconvert/{code} and pass it back into REST, and do the language conversion on the output.
28621:49:31 <gwicke> are you thinking about regions?
28721:49:35 <brion> gwicke: for target language that isn't a variant
28821:49:52 <cscott> so if /page/coolness/ is ever a thing, then /page/langconvert/en-gb/coolness/... will Just Work.
28921:49:55 <gwicke> does it matter whether it's a variant?
29021:50:03 * DanielK_WMDE is good with /page/langconvert/{foo}
29121:50:08 <subbu> .. /langconvert/<content_lang>:<ui_lang>/ if ever that ui_lang needs to be added? otherwise /langconvert/<content_lang>/ works?
29221:50:33 <subbu> .. /page/langconvert/... i mean
29321:50:35 <cscott> subbu: ui_lang is actually part of template expansion, not language conversion. sadly.
29421:50:36 <DanielK_WMDE> gwicke: what do you mean by "language selection" exactly?
29521:50:42 <gwicke> zh.wikipedia.org/api/rest_v1/page/lang/en-gb/html/Foo
29621:50:47 <cscott> ie, it influences how the {{int
29721:50:53 <cscott> }} template is expanded.
29821:51:04 <subbu> oye.
29921:51:05 <gwicke> DanielK_WMDE, select the content language
30021:51:19 <brion> langconvert feels like a very specialized filter, like mobile-text
30121:51:33 <cscott> so it would be /page/langconvert/en-gb/ui_lang/de/html/ArticleTitle, in one version of the future.
30221:51:39 <DanielK_WMDE> gwicke: does that select where the content is loaded from? i.e. the project?
30321:51:40 <gwicke> one thing I'm concerned about with schemes like this is what it does to the API documentation
30421:51:57 <gwicke> it will basically duplicate the bulk of the API docs in a second hierarchy
30521:52:13 <TimStarling> I would be happy to approve a range of possible path-based schemes at this point
30621:52:22 <cscott> gwicke: some of the api endpoints shouldn't be necessary for /langconvert/
30721:52:26 <TimStarling> with the exact scheme at the discretion of the implementor
30821:52:34 <cscott> ie, listing revisions. that can be done on the main /page endpoint.
30921:53:04 <brion> cscott: revision comments need to be run through the converter don't they?
31021:53:07 <gwicke> yeah, which makes it even more subtle
31121:53:16 <cscott> i'd like to suggest that we discuss DanielK_WMDE's general language questions in a follow-up meeting, not too long from now.
31221:53:33 <cscott> brion: those come from the action api, not from rest.
31321:53:42 <robla> it sounds like there's a tradeoff between cachable URL schemes and ease of documentation with tools like Swagger
31421:53:46 <brion> ugh
31521:53:56 <cscott> brion: and parsoid doesn't really implement "revision comment" parsing, which differs from normal parsing in a bunch of obscure and painful ways.
31621:53:59 <TimStarling> I don't want to bikeshed, I just want it to be done
31721:54:20 <Scott_WUaS> cscott: sounds good - i'd like to suggest that we discuss DanielK_WMDE's general language questions
31821:54:31 <gwicke> TimStarling, the reason we wrote this RFC is that we want to do this consistently with the general strategy of language selection
31921:54:34 <gwicke> so lets not rush it
32021:54:59 * subbu is happy with path-based schemes
32121:55:27 <robla> I'm happy to help someone (cscott?) to come up with a concise list of open questions for this RFC
32221:55:34 <cscott> well, i think that variant conversion is currently "next" on my plate, after balanced templates. but it will still be a while before any patch i write is actually ready to be deployed into production.
32321:55:34 <gwicke> it wouldn't make sense to have several different path-based ways of selecting language variants, for example
32421:56:02 <cscott> robla: i think we've got a reasonable consensus on an interim solution, but concern over the more general questions of DanielK_WMDE is preventing us from finalizing anything.
32521:56:07 <cscott> (which i actually agree with)
32621:56:29 <cscott> so i think the way to make further progress here is to actually grapple with the more general url scheme question, then return here and see if the solution to that problem bears on this one.
32721:57:01 <gwicke> what we are looking for is basically option 2
32821:57:08 <DanielK_WMDE> i don't want to derail or stonewall this or related rfcs.
32921:57:18 <gwicke> a uniform path-based way of selecting language variants
33021:57:21 <brion> are we otherwise happy with the notion of zh.wikipedia.org/api/rest_v1/page/lang/zh-hant/html/Foo with the open question of whether zh-hant can be replaced with en/fr/etc in a way that will be consistent?
33121:57:29 <cscott> well, we're not holding anything up until i've actually got a patch in hand. which i don't yet.
33221:57:43 <DanielK_WMDE> i just want to make sure we have a good concept of how we handle languages in general
33321:57:54 <brion> or do we need to ponder more before committing to that model?
33421:57:57 <subbu> brion, i think DanielK_WMDE preferred /langconvert/ over /lang/ i think unless i misunderstood it.
33521:58:12 <brion> ist egal zu mir, as the germans say :D
33621:58:15 <SMalyshev> I think /page/lang/ would be the most neutral one without overfocusing the semantics
33721:58:15 <brion> i'll take langconvert
33821:58:23 <gwicke> the issue with /langconvert/ et al is that it's a one-off solution for the REST API
33921:58:24 <DanielK_WMDE> subbu, brion: i'm good with /lang/. "convert" is an implementation detail.
34021:58:30 <brion> though lang is happy yeah
34121:58:41 * brion "take it to #wikimedia-bikeshed!" ;)
34221:58:42 <gwicke> rather than someting that will work for articles as well
34321:58:50 <DanielK_WMDE> subbu: i just don't want /variant/, because i think it's too narrow
34421:58:50 <robla> DanielK_WMDE: cscott : is there an action item for DanielK_WMDE to write up a generalized RFC for URL policy?
34521:59:06 <subbu> DanielK_WMDE, ok .. thanks for clarifying.
34621:59:16 <cscott> gwicke: yeah, but a one off solution might be enough for now. it might turn out that the more general /page/html/lang/foo/balh solution internally dispatches to /page/langconvert/ to do the actual language conversion part.
34721:59:26 <brion> TimStarling: what say you? we're coming up on time
34821:59:36 <TimStarling> yes, fine
34921:59:38 <DanielK_WMDE> robla: i could wrinte an rfc that is just about terms and concepts, not about code at all.
35021:59:41 <cscott> so maybe /page/langconvert doesn't actually have to be a part of the public api in the end. but it's a useful narrow solution to the immeditate implementation issue.
35121:59:44 <gwicke> cscott, that doesn't make sense
35221:59:53 <gwicke> the url you propose is already in use
35322:00:19 <cscott> i don't like /lang/ specifically because it's more general than i'm happy with right now. i'm not convinced that language converter and the other languages involved can be unified in the end.
35422:00:25 <brion> agh did i mean /api/rest_v1/lang/zh-hant/page/html/Foo ?
35522:00:32 <cscott> maybe they can be. but at the moment i'd like a narrow solution to a specific problem.
35622:00:42 <TimStarling> time's up now
35722:01:02 <gwicke> brion, it might make sense to shift it up one or more levels, yes
35822:01:23 <brion> [it may be worth considering it a filter like /page/mobile-text/{title} that might go away some day in favor of a more general solution]
35922:01:25 <gwicke> also in the running: /zh-yue/api/rest_v1/...
36022:01:32 <gwicke> and /zh-yue/wiki/...
36122:01:33 <TimStarling> who is going to update the RFC page? gwicke or cscott?
36222:01:44 <robla> May 4 meeting: https://phabricator.wikimedia.org/E169 about PSR-6
36322:01:49 <cscott> i think it's gwicke's RFC
36422:02:04 <gwicke> we should perhaps update DanielK_WMDE's RFC as well
36522:02:18 <cscott> brion: "[it may be worth considering it"... yes, that's what i'm suggesting.
36622:02:24 <DanielK_WMDE> gwicke: in what way?
36722:02:26 <TimStarling> #action gwicke to update T122942 to summarise the options discussed here and remove the rejected option
36822:02:27 <stashbot> T122942: RFC: Support language variants in the REST API - https://phabricator.wikimedia.org/T122942
36922:02:37 <brion> cscott: +1
37022:03:25 <TimStarling> #action DanielK_WMDE to write an RFC discussing the philosophical nature of language
37122:03:37 <robla> lol
37222:03:44 <DanielK_WMDE> TimStarling: hehe ;)
37322:03:49 <Scott_WUaS> :) +1
37422:03:53 <brion> haha
37522:03:58 <TimStarling> #endmeeting

daniel renamed this event from RFC Meeting: Support language variants in the REST API (2016-04-27, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
ssastry renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Support language variants in the REST API (2016-04-27, #wikimedia-office).Nov 30 2016, 4:50 PM
ssastry changed the start date for this event from Apr 27 2016, 9:00 PM to Apr 27 2016, 9:00 PM.
ssastry changed the end date for this event from Apr 27 2016, 10:00 PM to Apr 27 2016, 10:00 PM.