HomePhabricator

RFC Meeting: Overhaul Interwiki map, unify with Sites and WikiMap (2016-05-11, #wikimedia-office)
ActivePublic

Hosted by daniel on May 11 2016, 9:00 PM - 10:00 PM.

Description

See the Architecture meetings page for more general information about this meeting (also: Phab query: list of upcoming RFC meetings, Phab query: list of all RFC meetings).

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

2016-05-11:

@daniel nominated T113034 in {E167}, which seemed like a good idea to everyone at the time. The last RFC meeting we had on this subject was October, and there's now patch 285018 awaiting review in Gerrit.

RobLa-WMF renamed this event from RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Overhaul Interwiki map, unify with Sites and WikiMap (2016-05-11, #wikimedia-office).May 4 2016, 10:35 PM

Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-11-21.00.html
Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-11-21.00.txt
Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-11-21.00.wiki
Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-11-21.00.log.html

Meeting summary

  • RFC: Overhaul Interwiki map, unify with Sites and WikiMap | Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (TimStarling, 21:00:47)
    • question discussed: which backends should InterwikiLookup support? (robla, 21:10:54)
    • i imagine every wiki would read three files actually (and perform a deep merge): one with info shared across the family, one with info shared accross the laanguage, and one with local overrides for the specific wiki (DanielK_WMDE, 21:22:54)
    • aude: also can interwiki ids be renamed? daniel: you can add prefixes. (DanielK_WMDE, 21:23:42)
    • an entry can have multiple global ids. they act as aliases. only one of them would be used as a key in the file, makign it the *canonical* global id. (DanielK_WMDE, 21:24:05)
    • <aude> another thing we should have is configuration for sorting order of interwiki ids (maintained in a sane place) (DanielK_WMDE, 21:33:00)
    • LINK: https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename (aude, 21:33:23)
    • LINK: https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename-firstword (aude, 21:33:26)
    • <TimStarling> anyway, yes, the JSON format you propose looks very extensible and will presumably meet our needs (DanielK_WMDE, 21:33:39)
    • LINK: https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename-firstword (DanielK_WMDE, 21:34:21)
    • <TimStarling> I don't want to have m:Interwiki_map anymore (DanielK_WMDE, 21:36:27)
    • Tim is not convinced that interwiki info should be maintained by hand as json. Perhaps we still want dumpInterwiki (or equivaloent) (DanielK_WMDE, 21:50:15)
    • Tim thinks we need to figure out what information can be taken from wgConf, and what should come from elsewhere, and how to maintain it. But it's not a blocker for now, we can figure iot out later (DanielK_WMDE, 21:53:55)
    • next week's meeting: E184 RFC: Requirements for change propagation (T102476) (robla, 21:57:21)
    • Tim thinks it's ok to go ahead with implementing the proposed next steps, as they are non-threatening. But should we have a formal last call? (DanielK_WMDE, 22:02:40)

Meeting ended at 22:03:54 UTC.

People present (lines said)

  • DanielK_WMDE (137)
  • aude (47)
  • TimStarling (47)
  • Krenair (21)
  • robla (12)
  • jzerebecki (9)
  • bd808 (8)
  • SMalyshev (6)
  • Scott_WUaS (4)
  • wm-labs-meetbot (3)
  • stashbot (3)
  • DanielK_WMDE__ (1)

Full log:

121:00:22 <TimStarling> #startmeeting RFC meeting
221:00:22 <wm-labs-meetbot> Meeting started Wed May 11 21:00:22 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:00:22 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:00:22 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
521:00:47 <TimStarling> #topic RFC: Overhaul Interwiki map, unify with Sites and WikiMap | Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
621:01:17 * DanielK_WMDE__ wibbles
721:01:21 <robla> T113034
821:01:22 <stashbot> T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap - https://phabricator.wikimedia.org/T113034
921:02:07 <DanielK_WMDE> So, shall we start?
1021:02:15 * jzerebecki nibbles a bit
1121:02:37 <DanielK_WMDE> jzerebecki: my word, just take a byte!
1221:02:37 <TimStarling> yes
1321:02:42 <robla> so....DanielK_WMDE , what are you hoping to accomplish in this meeting?
1421:03:04 <DanielK_WMDE> I'm hoping to get more clarity on the next steps to take.
1521:03:25 <TimStarling> so the last meeting was in October?
1621:03:26 <DanielK_WMDE> We have a need for a more flexible, and more easy to maintain, system of information about other wikis on the cluster, and elsewhere
1721:03:44 <robla> DanielK_WMDE: what's stopping you from writing a patch?
1821:03:55 <DanielK_WMDE> The very first step to achieve this is now up for review: https://gerrit.wikimedia.org/r/#/c/250150/
1921:04:09 <Krenair> and non-wikis elsewhere
2021:04:26 <Krenair> which the interwiki map includes
2121:04:31 <DanielK_WMDE> robla: so the next step, i think, would be to write an InterwikiLookup (and/or SiteLookup) based on nexted arrays (aka JSON)
2221:04:36 <DanielK_WMDE> For that, we should agree on a format.
2321:04:51 <DanielK_WMDE> This is what I have in mind: https://phabricator.wikimedia.org/P3044
2421:04:53 <bd808> s/json/php/
2521:05:28 <DanielK_WMDE> One of the things to note is that any site can have several kinds of IDs (global, interwiki, etc), and several ID values for each kind (aliases, basically)
2621:06:01 <DanielK_WMDE> Also, any site can be in a number of groups of various types. Sites can be grouped by language, or by family (wikipedia, wikibooks, etc), or by the database cluster they reside on
2721:06:12 * aude waves
2821:06:17 <robla> DanielK_WMDE: is https://gerrit.wikimedia.org/r/#/c/250150/ close to getting merged?
2921:06:30 <DanielK_WMDE> bd808: yes, the idea is to read from .php or .json files. json is easier to maintain, php faster to read
3021:06:59 <TimStarling> you mean json instead of CDB?
3121:07:04 <DanielK_WMDE> robla: yes, i think so. addshore and legoktm are looking into it
3221:07:15 <DanielK_WMDE> TimStarling: yes, exactly
3321:07:32 <DanielK_WMDE> or, well, php instead of cdb, really, if we are talking about the fast option. #
3421:07:44 <DanielK_WMDE> in my mind, we would maintain the json on gerrit, and generate php for production
3521:07:56 <bd808> why?
3621:08:11 <bd808> Are we short of php editing experience?
3721:08:16 <TimStarling> from last meeting there is an info item: "TimStarling wants a CDB backend which would use the files generated by dumpInterwiki.php"
3821:08:29 <DanielK_WMDE> bd808: no, but if we have a generation step, we can pre-compute indexes.
3921:08:35 <DanielK_WMDE> we'd need to maintain these by hand otherwsie
4021:08:50 <TimStarling> I guess you can have serialized files if you have an APC cache on top, is that the idea?
4121:08:53 <Krenair> didn't interwikis get migrated away from CDB?
4221:09:11 <DanielK_WMDE> TimStarling: the APC cache would be automatic for php files
4321:09:15 <DanielK_WMDE> that'S the idea
4421:09:31 <DanielK_WMDE> Krenair: they now use a php array that still uses the structure of the cdb
4521:09:41 <TimStarling> right
4621:09:49 <DanielK_WMDE> TimStarling: the CDB backend you want - i think ClassicInterwikiLookup in my patch is just that. it's basically the old code.
4721:10:07 <DanielK_WMDE> can you confirm that?
4821:10:53 <DanielK_WMDE> hey TrevorParscal!
4921:10:54 <robla> #info question discussed: which backends should InterwikiLookup support?
5021:11:02 <TimStarling> yes, fine
5121:11:42 <DanielK_WMDE> robla: my plan: new structure in json and php. old structure via legacy code in php or cdb. old structure in the database.
5221:11:48 <DanielK_WMDE> the patch already has support for the last three
5321:12:25 <DanielK_WMDE> ok, so one question is whether the structure i propose is what we want.
5421:12:58 <DanielK_WMDE> another question is whether we want to support the new features (multiple ids and groups, all kinds if extra info) in teh db backend
5521:13:26 <DanielK_WMDE> existing 3rd party wiki clusters may depend on a db based interwiki setup that can be edited from inside the wiki.
5621:13:29 <robla> DanielK_WMDE: which (single) question do you want this group to focus on first?
5721:13:33 <DanielK_WMDE> should they miss out on the new features?
5821:14:03 <DanielK_WMDE> robla: single question: give me feedback on https://phabricator.wikimedia.org/P3044
5921:14:29 <DanielK_WMDE> does that structure seem sensible? is it missing something structurally?
6021:14:41 <TimStarling> that's not how you spell paths
6121:15:07 <DanielK_WMDE> bd808: at the bottom of the example are the indexes. i'd want to generated them. but that can be done from json to php, or php to php, or php to json...
6221:15:27 <DanielK_WMDE> TimStarling: oops ;) it's a wiki...
6321:15:47 <TimStarling> so there will be one such json file per source wiki?
6421:16:07 <aude> DanielK_WMDE: keep in mind the global ids can be renamed
6521:16:18 <aude> e.g. be-x-old to be-tarask
6621:16:30 <TimStarling> you have enwiktionary.ids.interwiki[0] == 'wikt' which only makes sense if the source wiki is in the english language
6721:16:32 <DanielK_WMDE> TimStarling: i imagine every wiki would read three files actually (and perform a deep merge): one with info shared across the family, one with info shared accross the laanguage, and one with local overrides for the specific wiki
6821:16:34 <aude> how would you handle such cases?
6921:17:11 <aude> if t the array is indexed by global id? ("enwiki": {)
7021:17:22 <DanielK_WMDE> TimStarling: so you would have only one faile saying that "wiktionary" is the interwiki prefix for enwikt on all english language wikis. And one that says that "en" is the prefix for enwiki on all wikipedias
7121:17:32 <TimStarling> actual wiki IDs (DB names) have almost never been renamed
7221:17:43 <bd808> DanielK_WMDE: line 88 cancels out line 87
7321:17:48 <TimStarling> I did a few in the early days but it got more complicated later on, I don't think anyone has tried it lately
7421:17:55 <bd808> you can't have two keys in a dict with the same value
7521:17:56 <Krenair> in the past no, but in future I'd like to be able to do that
7621:17:56 <DanielK_WMDE> aude: an entry can have multiple global ids. they act as aliases. only one of them would be used as a key in the file, makign it the *canonical* global id.
7721:18:16 <DanielK_WMDE> bd808: hm, wehn I try to edit the paste, i get an empty text box?...
7821:18:18 <aude> DanielK_WMDE: also can interwiki ids be renamed?
7921:18:18 <DanielK_WMDE> silly
8021:18:37 <DanielK_WMDE> aude: that would break existing page content, right? but you can add prefixes.
8121:18:52 <TimStarling> DanielK_WMDE: that sounds fine, but it's not in the proposal you're asking me to review
8221:18:54 <DanielK_WMDE> Krenair: sorry, what was that?
8321:19:02 <Krenair> renaming of wikis DanielK_WMDE
8421:19:13 <DanielK_WMDE> ah, right
8521:19:21 <aude> DanielK_WMDE: maybe a site can have multiple interwiki ids?
8621:19:29 <DanielK_WMDE> TimStarling: sorry, what isn't? the thign about combining three files?
8721:19:34 <DanielK_WMDE> aude: sure
8821:19:49 <TimStarling> DanielK_WMDE: yes, just one file on P3044
8921:19:49 <stashbot> P3044 interwiki.json - https://phabricator.wikimedia.org/P3044
9021:20:46 * DanielK_WMDE can't edit
9121:21:33 <DanielK_WMDE> TimStarling: it's on the RFC page. it'S not reflected in the JSON file, that's true. The "three files" thing isn't backed into the logic. It would just be the thing i'd do for the wikimedia cluster.
9221:21:37 <jzerebecki> Krenair: you mean renaming of the db name? in addition to renaming the language code which was what aude talked about.
9321:21:46 <DanielK_WMDE> TimStarling: as far as the software is concerned, it reads any number of files, and deep-merges them
9421:22:04 <DanielK_WMDE> do you think i should make three pasts for a more elaborate example?
9521:22:32 <aude> maybe dbname could be another type of id?
9621:22:34 <DanielK_WMDE> # i imagine every wiki would read three files actually (and perform a deep merge): one with info shared across the family, one with info shared accross the laanguage, and one with local overrides for the specific wiki
9721:22:46 <DanielK_WMDE> aude: it could, yes
9821:22:54 <DanielK_WMDE> #info i imagine every wiki would read three files actually (and perform a deep merge): one with info shared across the family, one with info shared accross the laanguage, and one with local overrides for the specific wiki
9921:23:42 <DanielK_WMDE> #info aude: also can interwiki ids be renamed? daniel: you can add prefixes.
10021:24:05 <DanielK_WMDE> #info an entry can have multiple global ids. they act as aliases. only one of them would be used as a key in the file, makign it the *canonical* global id.
10121:24:12 <DanielK_WMDE> so...
10221:24:19 <DanielK_WMDE> any comments on the ugly indexes at the bottom?
10321:24:24 <aude> prefixes or aliases?
10421:24:27 <DanielK_WMDE> i stole that idea from how the cdb files work
10521:25:08 <jzerebecki> eew to maintain by hand
10621:25:09 <DanielK_WMDE> aude: you can have any number of any "kind" of id. so you can have multiple global ids, multiple interwiki prefixes, etc.
10721:25:53 <DanielK_WMDE> jzerebecki: yea :) The idea is that if they are not there, they are comuted on the fly. and when we arite the file back out (as php or json), we also output the indexes, for quicker reading
10821:26:01 <bd808> DanielK_WMDE: do those indices really need to be precomputed? That seems like the sort of thing that could be cached in APC
10921:26:15 <DanielK_WMDE> bd808: the files is cached in apc.
11021:26:15 <aude> DanielK_WMDE: ok, think i understand what you mean by "interwiki prefix"
11121:26:17 <TimStarling> DanielK_WMDE: no need for a more elaborate example, I think the RFC more or less covers it
11221:26:27 <DanielK_WMDE> ok
11321:26:51 <DanielK_WMDE> bd808: to me the questions is: would it be ok to just re-compute the indexes when reading (or when needed)?
11421:26:51 <aude> DanielK_WMDE: is "_by_id" -> global -> and then enwiki twice correct?
11521:27:01 <bd808> "when we write the file back out" -- that I don't like if you mean via a specialpage
11621:27:02 <DanielK_WMDE> aude: yes
11721:27:25 <DanielK_WMDE> bd808: right. if you want on-wiki editing, you want the db backend. which doesn't support the nice new features.
11821:27:26 <aude> or should be "some-old-alias": "enwiki"
11921:27:35 <aude> and "enwiki": "enwiki"?
12021:28:02 <DanielK_WMDE> bd808: so one of my questiosn for today is: do we need to add interwiki_ids, interwiki_groups, and interwiki_props in the database? perhaps that could be an extension.
12121:28:13 <DanielK_WMDE> aude: exactly
12221:28:20 <aude> DanielK_WMDE: ok
12321:28:37 <aude> i can't edit the paste now, but think it should be fixed
12421:28:58 <DanielK_WMDE> aude: yes... i wonder what'S wrong with the edit feature :(
12521:29:08 <DanielK_WMDE> i'll make a new one later
12621:29:10 <bd808> DanielK_WMDE: well, what will need them? Are you making wikibase or some other extension dependent on a particular backend if you don't make them all the same?
12721:29:16 <Scott_WUaS> (DanielK_WMDE, jzerebecki, TimStarling and All: In what ways might this - https://phabricator.wikimedia.org/P3044 - help Content Translation become more precise in all of Wikidata/Wikipedia's ~300 languages?)
12821:30:32 <TimStarling> there are security concerns with allowing web users to write PHP files, it's done by one of the template engines but they use cryptographic signatures to make sure nothing other than MW writes those files
12921:30:33 <DanielK_WMDE> bd808: the current db backend doesn't support everything we need for wikibase (that'S why we added the sites table). For wikidata, we can just use the file based backend. but 3rd parties that rely on on-wiki editing may want the db backend *and* the new features.
13021:30:57 <DanielK_WMDE> TimStarling: my thought is: if you want on-wiki editing, use the db backend.
13121:31:12 <TimStarling> do we want on-wiki editing?
13221:31:20 <DanielK_WMDE> on the wmf cluster, no
13321:31:39 <DanielK_WMDE> many other wiki farms do. especially the smallish ones
13421:31:41 <TimStarling> ok, I'll relax now in that case
13521:31:47 <DanielK_WMDE> :)
13621:32:20 <aude> another thing we should have is configuration for sorting order of interwiki ids (maintained in a sane place)
13721:32:43 <aude> for legacy reasons, this information is in wikibase but totally doesn't belong there
13821:32:45 <DanielK_WMDE> aude: good point... not sure where to put that in this design, but i'll think about it
13921:32:49 <Krenair> we do have m:Interwiki_map on meta
14021:33:00 <DanielK_WMDE> #info <aude> another thing we should have is configuration for sorting order of interwiki ids (maintained in a sane place)
14121:33:01 <Krenair> which has to be transferred to the cluster manually by a deployer
14221:33:11 <aude> there are pages on meta wiki where this ifnormation is maintained also for pywikibot and other stuff
14321:33:22 <TimStarling> anyway, yes, the JSON format you propose looks very extensible and will presumably meet our needs
14421:33:23 <aude> https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename
14521:33:24 <DanielK_WMDE> Krenair: files on gerrit would be a *lot* nicer
14621:33:26 <aude> https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename-firstword
14721:33:33 <Krenair> DanielK_WMDE, it goes via gerrit
14821:33:39 <DanielK_WMDE> #info <TimStarling> anyway, yes, the JSON format you propose looks very extensible and will presumably meet our needs
14921:33:41 <aude> then there is something special for serbian and another languge
15021:34:03 <aude> west frisian
15121:34:04 <Krenair> there's a script that you run on tin to pull the page contents and generate the file, which you download and upload to the git repository, then commit and push it out as a wmf-config change
15221:34:21 <DanielK_WMDE> #link https://meta.wikimedia.org/wiki/MediaWiki:Interwiki_config-sorting_order-native-languagename-firstword
15321:34:38 <DanielK_WMDE> Krenair: I püropose to just maintain the files in the git repo
15421:34:46 <Krenair> without an on-wiki page?
15521:35:00 <aude> Krenair: the script sounds good
15621:35:03 <DanielK_WMDE> personally, yes. but if we want to, we can still pull info from that page
15721:35:11 <DanielK_WMDE> but the information about each site is getting increasingly complex
15821:35:18 <DanielK_WMDE> so it becomes nasty to maintain as wikitext
15921:35:39 <DanielK_WMDE> anyway...
16021:35:53 <TimStarling> I don't want to have m:Interwiki_map anymore
16121:35:59 <DanielK_WMDE> \o/
16221:36:01 <TimStarling> it was a bad idea (of mine) to start with
16321:36:19 <DanielK_WMDE> there is some overlap between the interwiki/site info, and wgConf. Do we want to integrate them? Or keep them separate?
16421:36:27 <DanielK_WMDE> #info <TimStarling> I don't want to have m:Interwiki_map anymore
16521:36:36 <aude> wgConf is quite complex
16621:36:56 <aude> what do you mean by integrate?
16721:37:00 <DanielK_WMDE> yea... i'm not proposing to replace it with the interwiki stuff completely.
16821:37:04 <jzerebecki> there are at least 3 places of the language: wgLanguageCode groups:language props:language . may all three be different?
16921:37:17 <DanielK_WMDE> but e.g. why wiki uses which database could come from the interwiki/site json.
17021:37:28 * aude notes we also have sitematrix
17121:37:34 <jzerebecki> ^^
17221:37:50 <DanielK_WMDE> aude: that uses wgConf, right?
17321:37:52 <TimStarling> on WMF generating interwiki JSON will be scripted, so wgConf could be used as a data source by the script
17421:37:59 <aude> DanielK_WMDE: i think so
17521:38:13 <aude> and then we (currently) populate sites from site matrix
17621:38:17 <aude> a bit evil
17721:38:19 <DanielK_WMDE> jzerebecki: they can, though they would usually be the same. Actually, wgLanguageCode in wgConf should always be the same as props:language in the interwiki info
17821:38:39 <jzerebecki> DanielK_WMDE: can we deup those two then?
17921:38:44 <Krenair> It has some hardcoded stuff in it :(
18021:38:46 <Krenair> (sitematrix)
18121:38:47 <jzerebecki> s/deup/dedup/
18221:39:26 <Krenair> e.g. this:
18321:39:29 <Krenair> if ( in_array( $lang, array( 'cz', 'dk', 'epo', 'jp', 'minnan', 'nan', 'nb', 'zh-cfr' ) ) ) {
18421:39:29 <Krenair> continue;
18521:39:34 <DanielK_WMDE> jzerebecki: if we closely integrate wgConf with the sites info, maybe. But $wgLanguageCode will potentially need to be available very early during the init process. Not sure the interwiki info will be available early enough
18621:39:39 <DanielK_WMDE> it's a bit of a chicken-and-egg issue
18721:39:51 <DanielK_WMDE> (you need wgConf to know where to read the interwiki info from)
18821:40:22 <TimStarling> some of the hard coded stuff in dumpInterwiki.php can be replaced with wgConf
18921:40:55 <TimStarling> it's a chicken-and-egg issue to find the database that corresponds with a given language/domain?
19021:40:56 <jzerebecki> DanielK_WMDE: for wmf production both will be in mediawiki-config.git so that they can be changed atomically?
19121:41:31 <DanielK_WMDE> TimStarling: i think it's solvable, but we have to thing about initialization order, yes
19221:41:46 <DanielK_WMDE> jzerebecki: i suppose so
19321:42:11 <TimStarling> yeah, it is just an index inversion problem, you just iterate through wgLanguageCode and flip it
19421:42:29 * jzerebecki was thinking about what needs to be done for something like https://gerrit.wikimedia.org/r/#/c/277519/
19521:42:36 <DanielK_WMDE> ok. i'd like to re-iterate the next steps i propose, so youall can tell me whether you approve, or have questions, or what.
19621:42:48 <DanielK_WMDE> 1) implement array-based InterwikiLookup (loads from multiple JSON or PHP files)
19721:43:03 <DanielK_WMDE> 2) implement maintenance script that can convert between different interwiki representations.
19821:43:10 <DanielK_WMDE> 3) Provide a config variable for specifying which files to read interwiki info from. If not set, use old settings and old interwiki storage.
19921:43:20 <DanielK_WMDE> bonus) split CDB from SQL implementation
20021:43:33 <DanielK_WMDE> more info on these is on the ticket, https://phabricator.wikimedia.org/T113034
20121:43:44 <TimStarling> 1. this is non-threatening since we're not required to use it in production, right?
20221:43:51 <DanielK_WMDE> yes
20321:44:13 <TimStarling> 2. also mostly non-WMF?
20421:44:14 <DanielK_WMDE> that'S the nice thing about dependency injection. just swap stuff out :)
20521:44:38 <TimStarling> this is not the dumpInterwiki.php equivalent yet?
20621:44:59 <DanielK_WMDE> TimStarling: the conversion script would be used for deployment, or at least when migrating from CDB. But by itself, it doesn't change anything about how things work
20721:45:32 <Scott_WUaS> (DanielK_WMDE, jzerebecki, TimStarling and All: In what ways might this session and this - https://phabricator.wikimedia.org/P3044 - help create a robust translator building on, for example, Content Translation, Google Translate and adding a sophisticated Wiktionary between all of Wikidata/Wikipedia's ~300 languages - and even lead to a Universal Translator between all 7,943+ languages? Thank you.)
20821:45:37 <DanielK_WMDE> no, this doesn't replace dumpInterwiki.php. it takes multiple json (or php) files, combines them, indexes them, and then writes json (or php, or sql)
20921:45:53 <TimStarling> you would use dumpInterwiki.php to generate CDB and then convert from CDB to PHP during deployment?
21021:46:19 <DanielK_WMDE> Scott_WUaS: i don't think it would. we are discussing the management of meta-information about websites.
21121:46:48 <Scott_WUaS> DanielK_WMDE: Thanks
21221:47:06 <TimStarling> 3. sounds fine and conventional
21321:47:18 <DanielK_WMDE> TimStarling: i would use the conversion script to generate a JSON from the CDB, then split that manually, once, and put it into gerrit. I'd then use the conversion script to turn the json into indexed php during deployment
21421:47:48 <aude> like we do for wikiversions.json ?
21521:48:03 <aude> as part of scap?
21621:48:14 <DanielK_WMDE> TimStarling: we may have a dumpInterwiki equivalent that builds interwiki info from config, or use the old dumpInterwiki, and then convert. but i'd prefer to just stop using it, and maintain the interwiki info directly
21721:48:26 <aude> not sure it needs to be regenerated that often
21821:48:31 <TimStarling> so the JSON would then be human-edited configuration?
21921:48:32 <DanielK_WMDE> aude: possibly. i'm blurry on the details
22021:48:39 <DanielK_WMDE> TimStarling: exactly
22121:48:55 <DanielK_WMDE> we could support yaml ;)
22221:49:11 <TimStarling> we don't have much time to discuss this now, but I'm not sold on that part
22321:50:12 <TimStarling> the reason for introducing rebuildInterwiki.php was to reduce the number of points in the configuration file that need to be simultaneously edited when a new wiki is introduced
22421:50:15 <DanielK_WMDE> #info Tim is not convinced that interwiki info should be maintained by hand as json. Perhaps we still want dumpInterwiki (or equivaloent)
22521:50:28 <TimStarling> or when you introduce an alias or whatever
22621:50:57 <DanielK_WMDE> TimStarling: yea, but does the config have a good place for all the extra info we want in the interwiki info? hm, perhaps it'S all there.
22721:51:37 <TimStarling> it's not all in wgConf, dumpInterwiki.php is canonical for some things
22821:51:40 <DanielK_WMDE> TimStarling: the entire thing doesn'Ät require us to stop using dumpInterwiki. It would *allow* us to stop using it, if we want to.
22921:51:51 <DanielK_WMDE> BUt we can keep using it - and we don'
23021:51:52 <TimStarling> you would have to work out which things and migrate them somewhere else
23121:52:05 <SMalyshev> is there some connection between dumpInterwiki and MW* setup?
23221:52:42 <Krenair> dumpInterwiki is a script in Extension:WikimediaMaintenance
23321:52:43 <DanielK_WMDE> TimStarling: yes... do you think we need to have a detailed plan for that in order to move forward? or can we figure that out when we get to it?
23421:52:43 * aude has to run in a few minutes
23521:52:57 <Krenair> it creates our interwiki.php file that controls interwiki link prefixes
23621:53:03 <TimStarling> we can figure it out later
23721:53:04 <aude> but also wants feedback on https://phabricator.wikimedia.org/T90617 (and wants to solve this more short term)
23821:53:10 <DanielK_WMDE> I think the system I propose is flexible enough to allows us to use whatever bits and pieces are convenient
23921:53:18 <aude> what to do with the populate sites script
24021:53:19 <TimStarling> yeah, you can go ahead with the implementation
24121:53:55 <DanielK_WMDE> #info Tim thinks we need to figure out what information can be taken from wgConf, and what should come from elsewhere, and how to maintain it. But it's not a blocker for now, we can figure iot out later
24221:54:44 <DanielK_WMDE> aude: we havea file based implementation of SiteLookup, right? Can't we just stop using the sites table?
24321:54:52 <DanielK_WMDE> then we wouldn't need that script at all.
24421:55:20 <aude> DanielK_WMDE: then how to we generate and maintain the files?
24521:55:32 <aude> file--based is just a caching layer
24621:55:39 <aude> at the moment
24721:55:46 <DanielK_WMDE> in my mind, the files are the cononical info.
24821:55:50 <DanielK_WMDE> or should be
24921:56:06 <DanielK_WMDE> I'd want FileBasedSiteLookup to basically be the same as the file based InterwikiLookup. Could actually be the same class
25021:56:12 <aude> DanielK_WMDE: ok
25121:56:23 <aude> would this be easy for third parties to use?
25221:56:48 <aude> e.g. for dvelopment wiki, i want all wikimedia sites in my site store / interwiki
25321:56:49 <DanielK_WMDE> easier to edit a json file than to manually insert stuff into the database, right?
25421:57:06 <SMalyshev> can't wg* be primary info at least for local wikis?
25521:57:15 <DanielK_WMDE> aude: for a dev setup, we can just have a default sites.json on gerrit somewhere
25621:57:18 <DanielK_WMDE> just doneload it, done
25721:57:19 * aude really has to run now
25821:57:21 <robla> #info next week's meeting: E184 RFC: Requirements for change propagation (T102476)
25921:57:21 <stashbot> T102476: RFC: Requirements for change propagation - https://phabricator.wikimedia.org/T102476
26021:57:29 <Krenair> primary info SMalyshev?
26121:57:57 <DanielK_WMDE> SMalyshev: yes... but it'S incomplete. So we'd have to somehow copmbined it with other info. That's a bit tricky. It's one of the questions I had for today.
26221:58:02 <DanielK_WMDE> but I guess we are out of time.
26321:58:02 <SMalyshev> Krenair: primary source of information about what wikis are there etc.
26421:58:05 <aude> DanielK_WMDE: then we need a way to add new wikis to the sites.json (idally not manualy(
26521:58:09 <aude> ideally*
26621:58:15 <aude> as part of addWiki.php
26721:58:18 <TimStarling> yeah, out of time now
26821:58:23 <DanielK_WMDE> aude: i'd add them manually - and only there.
26921:58:32 <Scott_WUaS> Thank you, All!
27021:58:37 <DanielK_WMDE> TimStarling: do you think the next steps are approved?
27121:58:58 <aude> thanks everyone :)
27221:59:02 <TimStarling> approved or last call?
27321:59:04 <Krenair> SMalyshev, you're suggesting all the wg* config would be part of this?
27421:59:04 * aude runs away
27521:59:26 <TimStarling> we can say informally approved I guess (pending code review)
27621:59:39 <DanielK_WMDE> TimStarling: good question... do we do last calls for parts of rfcs? or should i split that bit out into a separate ticket, and we do a last call on that?
27721:59:43 <robla> I'm a little confused as to what we just approved
27821:59:53 <SMalyshev> Krenair: yes basically what I am wondering - we have a lot of info in wg* configs which MW* scripts are using, but then we seem to have a lot of this info duplicated, if I understand right
27922:00:02 <TimStarling> robla: DanielK_WMDE's numbered points
28022:00:07 <SMalyshev> so I wonder if it can't be in one place only.
28122:00:10 <Krenair> SMalyshev, some of wg* comes from PrivateSettings
28222:00:20 <DanielK_WMDE> robla: a go-ahead for the three next steps i posted earlier, at xx:42
28322:00:50 <Krenair> we have wgConf->get to get data about other local wikis but it seems to have issues when you configure things per-dblist/family instead of per-wiki
28422:00:52 <DanielK_WMDE> SMalyshev: yes. wgConf or the json file should be canonical, and one should somehow reference the other.
28522:01:05 <SMalyshev> Krenair: we don't need all wg*'s of course just those related to wiki definitions. Basically whatever would MW* see when it does its thing
28622:01:17 <DanielK_WMDE> robla: do you think a formal last call for these three points, in a separate ticket, would be in order?
28722:01:27 <DanielK_WMDE> or is an informal go-ahead sufficient?
28822:02:04 <robla> DanielK_WMDE I'm not sure.
28922:02:40 <DanielK_WMDE> #info Tim thinks it's ok to go ahead with implementing the proposed next steps, as they are non-threatening. But should we have a formal last call?
29022:03:01 <robla> so....let's not have a last call
29122:03:07 <TimStarling> I think last call is just for closing an RFC
29222:03:17 <robla> TimStarling: that sounds good
29322:03:30 <DanielK_WMDE> robla: i tend to think we don't need one, since these steps don't actually incurr any changes on existing wikis. they only add options.
29422:03:44 <TimStarling> alright, that's it then
29522:03:48 <DanielK_WMDE> yay :)
29622:03:50 <DanielK_WMDE> thanks, all
29722:03:54 <robla> yup, thanks!
29822:03:54 <TimStarling> #endmeeting

daniel renamed this event from RFC Meeting: Overhaul Interwiki map, unify with Sites and WikiMap (2016-05-11, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: Overhaul Interwiki map, unify with Sites and WikiMap (2016-05-11, #wikimedia-office).