HomePhabricator

RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office)
ActivePublic

Hosted by daniel on May 18 2016, 9:00 PM - 10:00 PM.

Description

Architecture meetings
13:00 PT ArchCom Planning Meetingsupcomingall since 2016-03-30
14:00 PT ArchCom-RFC Meetingsupcomingall since 2015-09-09

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

TechCom discussed this in E170. @daniel and @GWicke agreed that T102476 would be a good topic to discuss (with no objections). That RFC lacked a shepherd, so I agreed to shepherd it.

I'm not entirely clear as to what meeting type we should strive for in this meeting. What should we strive to accomplish in this meeting?

RobLa-WMF renamed this event from RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office).May 11 2016, 9:53 PM
RobLa-WMF updated the event description. (Show Details)

Last week, I wrote:

I'm not entirely clear as to what meeting type we should strive for in this meeting. What should we strive to accomplish in this meeting?

It seems that on track for a "Problem definition" conversation in the upcoming meeting. I haven't had a chance to coordinate with @GWicke on this yet, but I think the next step is to move the prose of the RFC over to MediaWiki.org, as I suggested in T102476#2296335 (and even have the prose copied over; just need to RFC-ize it)

3:03 PM <wm-labs-meetbot> Meeting ended Wed May 18 22:03:45 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
3:03 PM <wm-labs-meetbot> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.html
3:03 PM <wm-labs-meetbot> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.txt
3:03 PM <wm-labs-meetbot> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.wiki
3:03 PM <wm-labs-meetbot> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.log.html

#wikimedia-office: RFC meeting

Meeting started by TimStarling at 21:00:02 UTC. The full logs are
available at
https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-05-18-21.00.log.html
.

Meeting summary

  • RFC: Requirements for change propagation | Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (TimStarling, 21:00:14)
    • LINK: https://phabricator.wikimedia.org/T102476 (gwicke, 21:01:03)
    • robla asks for response on T102476#2296335. gwicke answers "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies" (robla, 21:12:37)
    • LINK: https://commons.wikimedia.org/wiki/Template:LangSwitch (matt_flaschen, 21:24:59)
    • significant use: track dependencies when rendering pages with {{int}} and <translate> so that they can be purged when conditional dependencies change (TimStarling, 21:33:47)
    • what needs purging is *gernerated* content, so we need to track what it depends on (DanielK_WMDE, 21:37:47)
    • LINK: https://phabricator.wikimedia.org/T130528 (DanielK_WMDE, 21:50:39)
    • ACTION: update the RFC to clarify the anticipated concrete dependency relations (TimStarling, 22:01:38)

Meeting ended at 22:03:45 UTC.

People present (lines said)

  • DanielK_WMDE (89)
  • gwicke (53)
  • TimStarling (28)
  • matt_flaschen (27)
  • AaronSchulz (16)
  • robla (11)
  • mobrovac (9)
  • Scott_WUaS (5)
  • subbu (4)
  • bd808 (3)
  • wm-labs-meetbot (3)
  • stashbot (2)
  • jzerebecki (1)

Log:

121:00:02 <TimStarling> #startmeeting RFC meeting
221:00:02 <wm-labs-meetbot> Meeting started Wed May 18 21:00:02 2016 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:00:02 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:00:02 <wm-labs-meetbot> The meeting name has been set to 'rfc_meeting'
521:00:14 <TimStarling> #topic RFC: Requirements for change propagation | Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
621:00:44 <gwicke> hi
721:00:58 <mobrovac> hello
821:01:03 <gwicke> #link https://phabricator.wikimedia.org/T102476
921:02:11 <gwicke> DanielK_WMDE & myself thought that it would be useful to catch up on what has been going on in eventbus & change propagation land, and talk about needs and open questions
1021:02:52 <Scott_WUaS> Great, Gabriel
1121:03:39 <gwicke> while most of the RFC is older & aimed at providing high-level background to the general issue, the section "Current status" is new & has information on current work, as well as some notes on next steps & open questions
1221:05:02 <gwicke> the biggest of those open questions is probably cross-project dependency tracking & change propagation
1321:05:51 <robla> gwicke: what are your thoughts about my last comment? https://phabricator.wikimedia.org/T102476#2296335 I think it'd be easier to keep track of the prose for long RFCs (like this one) using our software.
1421:06:20 <DanielK_WMDE> so, for the sake of stating the obvious: the general idea is to track dependencies between "things" (identified by URIs or some such) as a DAG, so we know what to purgfe or re-generate when. at least, that seems to be the core of it.
1521:06:21 <gwicke> I know DanielK_WMDE is involved in that area, so I hope that he can tell us a bit more about the situation & needs for wikidata
1621:06:52 <DanielK_WMDE> gwicke: yea. so, one important issue is thæt the artifact we track may be created on the fly, during a page view (GET request)
1721:07:19 <DanielK_WMDE> if we track that in an SQL db, that means a master write during a get request, possible a cross-DC operation.
1821:07:25 <gwicke> robla: I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies
1921:07:25 <DanielK_WMDE> aaron and ori don't like that...
2021:07:42 <matt_flaschen> DanielK_WMDE, there are workarounds for it, as long as it can be queued.
2121:07:55 <matt_flaschen> DanielK_WMDE, what is an example of triggering on a GET
2221:07:55 <mobrovac> DanielK_WMDE: and the artefact dies with the req or?
2321:08:09 <bd808> cross-dc master writes on GET certainly aren't optimal
2421:08:25 <DanielK_WMDE> matt_flaschen: yes, we currently use the job queue for that - but the JQ doesn't seem to kope well with the load.
2521:08:41 <gwicke> DanielK_WMDE: is this about queries?
2621:08:49 <DanielK_WMDE> mobrovac: no, the arztifact should be persistet. think parser cache.
2721:09:10 <robla> DanielK_WMDE: "aaron and ori don't like that...", I'm assuming you mean AaronSchulz, right?
2821:09:30 <DanielK_WMDE> gwicke: no, just tracking which rendering of a page (think parser cache key) uses which bits of which wikidata entity
2921:09:36 <DanielK_WMDE> robla: indeed
3021:10:12 <jzerebecki> DanielK_WMDE: the job queue doesn't cope well with cycles
3121:10:18 <DanielK_WMDE> gwicke: in case of the parser cache, we currently always purge all renderings if any of them needs purging. we havn't found a good meachnism to avoid that
3221:10:18 <subbu> DanielK_WMDE, what kind of stuff might get generated during GET reqs? that sounds non-ideal.
3321:10:21 <DanielK_WMDE> that's another open issue
3421:10:28 <mobrovac> ok, so it's not critical that the write happens during the life of the request, it just needs to be recorded somehow
3521:10:46 <robla> which question are we trying to answer here?
3621:10:48 <DanielK_WMDE> subbu: rendering in any of the possible user languages. multilingual wikis can't pre-generate all possible renderings on save.
3721:10:50 <matt_flaschen> DanielK_WMDE, why does that need to be updated on a GET request? I would think that dependencies only change on POST (either update to Wikidata or update to Lua module or update to Wikipedia article)?
3821:11:19 <DanielK_WMDE> matt_flaschen: we are tracking *renderings* of a page. they are generated on demand.
3921:11:46 <bd808> "tracking"?
4021:12:02 <bd808> like a log event or something else entirely?
4121:12:06 <DanielK_WMDE> matt_flaschen: if page Foo uses template X, the wikitext of Foo doesn't depends on X (it does reference X, but that's another issue). the HTML rendering depends on X (and on the wikitext of Foo)
4221:12:25 <DanielK_WMDE> bd808: like link tables.
4321:12:28 <matt_flaschen> DanielK_WMDE, right, i understand that, but the dependencies in that case only change on POST.
4421:12:29 <mobrovac> exaxtly the issue we have in RESTBase
4521:12:34 <matt_flaschen> This part from the RFC seems relevant:
4621:12:36 <matt_flaschen> "Our current approach of re-rendering all seven million articles can easily result in large backlogs of template updates. It might be useful to consider pull based or hybrid solutions (where only a timestamp is propagated and polled) as an alternative to pure push."
4721:12:37 <robla> #info robla asks for response on T102476#2296335. gwicke answers "I'm not opposed to moving the general portion especially to mw.org, lets just make sure we don't end up with multiple copies"
4821:12:37 <stashbot> T102476: RFC: Requirements for change propagation - https://phabricator.wikimedia.org/T102476
4921:12:37 <DanielK_WMDE> but also for generated artifacts. possible stuff that is generated on demand
5021:12:42 <mobrovac> in that we care more about renders than wikitext
5121:13:09 <matt_flaschen> So maybe not all the HTML renderings should be pre-generated on save, but all the *dependencies* should be tracked on POST, and when to actually re-render is a different question, just as with templates.
5221:13:17 <DanielK_WMDE> matt_flaschen: the dependencies of the de rendering of page Foo are unknown until it is actually rendered for de output, and then cached. that happens on demand, during a get request.
5321:13:32 <DanielK_WMDE> matt_flaschen: image description pages have conditionals that depend on the user language.
5421:13:41 <DanielK_WMDE> not to speak of wikidata, where all of the output depends on the user language
5521:14:13 <DanielK_WMDE> matt_flaschen: we can't know the dependencies fro mthe wikitext. not even after resolvoing templates.
5621:14:16 <TimStarling> currently we have a canonical ParserOptions which is used to determine dependencies, and then those dependencies are used to purge all renderings
5721:14:25 <gwicke> DanielK_WMDE: I'm trying to understand how this is different from links, templates & media; is the main issue that dependencies differ between languages?
5821:14:34 <DanielK_WMDE> gwicke: yes.
5921:14:38 <TimStarling> even though renderings may have different dependencies to the canonical one
6021:14:44 <DanielK_WMDE> gwicke: which is already broken for links, templates, and media.
6121:14:57 <DanielK_WMDE> we don't track dependencies that only occurr in non-canonical renderings
6221:15:04 <gwicke> yeah, I was just going to say.. language variants are happily messing with that too
6321:15:08 <DanielK_WMDE> so we sometimes fail to purge
6421:15:50 <DanielK_WMDE> TimStarling: yea, currently, we just oignore the nastiness, because it's not very visible. but with better support for multilingual content, we need a better solution
6521:16:15 <TimStarling> there is a proposal in here to migrate all link tables to cassandra, is that correct? T105766
6621:16:15 <stashbot> T105766: RFC: Dependency graph storage; sketch: adjacency list in DB - https://phabricator.wikimedia.org/T105766
6721:16:24 <DanielK_WMDE> gwicke: i want to point to another open question: usage tracking gor 3rd parties. think InstantCommons. and perhaps InstantWikidata in the future.
6821:16:59 <gwicke> TimStarling: as I said on the task, it's largely theoretical & premature at this point
6921:18:24 <DanielK_WMDE> how abpout redis?
7021:18:35 <gwicke> DanielK_WMDE: we have structured some of the recent change propagation work around URLs, with a view to possibly supporting outside resources
7121:18:37 <DanielK_WMDE> i think this raises the question of how reliable our tracking needs to be
7221:19:01 <DanielK_WMDE> is it ok to have this transient? or should we be sure that it's persistent?
7321:19:02 <gwicke> but we have not tackled any dependency tracking so far
7421:19:29 <DanielK_WMDE> gwicke: 3rd party dependency tracking would basically require a pubsub service. but with very high granularity
7521:19:37 <DanielK_WMDE> it's not hard to do, but hard to scale
7621:19:38 <subbu> DanielK_WMDE, i am confused by your response to matt_flaschen's qn .. " the dependencies of the de rendering of page Foo are unknown until it is actually rendered for de output" .. that seems independent of when a page is rendered .. on a POST or on a GET, right?
7721:20:06 <subbu> how do you re-render all affected pages on a POST then?
7821:20:15 <subbu> or is that what you mean it is currently broken.
7921:20:18 <DanielK_WMDE> subbu: currect. so in some cases, we will only have this information in a GET request - and need to somehow store it
8021:20:26 <gwicke> DanielK_WMDE: to me, the other big question is how to structure interfaces for dependency updates, so that they are both usable & efficient
8121:20:55 <matt_flaschen> subbu, I think DanielK_WMDE's point is that it's computationally infeasible to determine the dependencies for all languages ahead of time. Since you need to parse it for each language just to determine what the dependencies of e.g. the Spanish language version are.
8221:21:12 <matt_flaschen> With e.g. Commons and {{int:}}, etc.
8321:21:38 <gwicke> the XKey Varnish work is heading in this direction as well
8421:21:50 <DanielK_WMDE> gwicke: well, basically, you need add(x,y), remove(x,y), lget(x), rget(y), lpurge(x), rpurge(y). Plus perhaps a batch interface.
8521:22:27 <gwicke> DanielK_WMDE: alternatively, you could post the dependencies each time & let the API figure out diffs
8621:22:47 <DanielK_WMDE> matt_flaschen: also, we only need that info if we actually have a rendering cached. if the page was never rendered in hebrew, i don't care what dependenceis the hebrew version might have
8721:23:19 <DanielK_WMDE> gwicke: yes. that would be part of the batch interface.
8821:23:35 <gwicke> DanielK_WMDE: could you describe an example for such language variance?
8921:23:45 <gwicke> which kind of content would be pulled in conditionally?
9021:24:11 <DanielK_WMDE> gwicke: all renderings of any item page on wikidata. all of it is language dependent.
9121:24:26 <DanielK_WMDE> gwicke: license templates on file description pages on commons
9221:24:36 <DanielK_WMDE> image descriptions on commons
9321:24:44 <DanielK_WMDE> (blame the translate extension)
9421:24:59 <matt_flaschen> https://commons.wikimedia.org/wiki/Template:LangSwitch
9521:25:05 <DanielK_WMDE> anything that uses {{int}}
9621:25:10 <DanielK_WMDE> matt_flaschen: indeed
9721:25:29 <matt_flaschen> DanielK_WMDE, for Wikidata, how are the dependencies language-dependent? Doesn't one page always depend on the same thing regardless of language. It's just the actual output would be different, but how is the dependency graph different?
9821:25:32 <gwicke> fun ;/
9921:25:36 <matt_flaschen> For Q pages?
10021:25:50 <gwicke> for item pages, is there a concern beyond CDN purging?
10121:25:50 <matt_flaschen> Since Q pages don't have LangSwitch AFAIK
10221:26:11 <matt_flaschen> gwicke, it's also the ParserOutput itself, right?
10321:26:48 <DanielK_WMDE> matt_flaschen: in case of wikidata, the dependency graph is probably the same, that is true. that's not true for anything that uses {{int}} or <translate> (or variants)
10421:26:50 <gwicke> yeah, but afaik that's still keyed on the Q page & it's possible to purge all at once
10521:27:19 <gwicke> at first sight, item pages sound like a somewhat simpler case to me
10621:27:49 <DanielK_WMDE> matt_flaschen: but it depends on the granualitty. If Q1 references Q7, that will be the same for all languages. But the de-ch rendering of Q1 would epends on Q7.label.de, and the en rendering would depend on Q7.label.en
10721:28:10 <gwicke> conditional dependencies will always be tricky, and languages are not the only source of complexity there
10821:28:13 <TimStarling> gwicke: is work on this underway? which parts of the RFC need comment most urgently?
10921:28:13 <DanielK_WMDE> I'd like to have a system that can at least in theory handle this kind of thing.
11021:28:21 <gwicke> lua code can transclude things based on the phase of the moon
11121:28:37 <matt_flaschen> gwicke +1 on reference
11221:28:47 <DanielK_WMDE> heh :)
11321:29:14 <DanielK_WMDE> gwicke: they are not conditional. different artifacts depend on different artifacts. the dependency tracking doesn't care how or why.
11421:29:35 <gwicke> TimStarling: work is underway on eventbus & changeprop, but for dependency tracking we are mostly trying to better understand the issues at this point
11521:29:37 <DanielK_WMDE> if you have a uri for "phase of the moon" and you touch it every day, the dependency management should be able to handle this
11621:29:58 <DanielK_WMDE> uri = artifact, here
11721:30:00 <gwicke> TimStarling: this discussion is very helpful in that regard
11821:30:23 <matt_flaschen> DanielK_WMDE, that could be abstracted, though. You could say Q1[de-ch] depends on Q7.label[de-ch], and Q7.label[de-ch] depends on Q7.actual-label[de-ch] and Q7.actual-label[de].
11921:30:26 <robla> gwicke: which question are you hoping to get consensus on?
12021:30:55 <gwicke> robla: I wasn't hoping for any decisions in this meeting
12121:30:59 <matt_flaschen> With label depending on which if any actual-label are available.
12221:31:09 <TimStarling> robla: I think this is more requirements gathering
12321:31:19 <DanielK_WMDE> matt_flaschen: yes, exactly. in case of wikibase with nice structured data, it would probably be feasible to do this in advance. for wikitext, it isn't - so we don't know until we render on demand.
12421:32:20 <DanielK_WMDE> TimStarling, robla: since the entire rfc is about gathering requirements, it's kind of meta... how could it ever be "approved" or "implemented"? what does that mean?
12521:32:25 * robla is trying to figure out if/when he should be capturing any of this with #info commands, and can't figure out how to make that useful
12621:33:10 <DanielK_WMDE> i have tried to formulate the requirements we have for wikidata. i can try and put that into #info tags, but i try no to do that too often with my own comments...
12721:33:23 <robla> DanielK_WMDE: I think that's why I'm hoping to move it to MediaWiki. The finished RFC can be a clear description of the options.
12821:33:42 <DanielK_WMDE> robla: right, but the rfc process seems useful for gathering requirements
12921:33:47 <TimStarling> #info significant use: track dependencies when rendering pages with {{int}} and <translate> so that they can be purged when conditional dependencies change
13021:33:53 <matt_flaschen> The biggest open question I see is how to handle the case where the dependency graph itself varies depending on user language.
13121:34:07 <matt_flaschen> Sorry, didn't see Tim's #info before saying that.
13221:34:14 <robla> DanielK_WMDE: gathering them in your head?
13321:34:34 <gwicke> I'm not sure that this is a problem we can or should try to solve in general
13421:34:47 <DanielK_WMDE> matt_flaschen: just thing of renderings as natural parts of the dependency graph. that should Just Work (tm).
13521:34:53 <DanielK_WMDE> *think
13621:35:17 <gwicke> unpredictable dependencies are always going to be a pain to handle, and even if we fully handled them, it would be fairly expensive to do so
13721:35:18 <DanielK_WMDE> one consequence is actually: wikitext *never* depends on wikitext.
13821:35:21 <matt_flaschen> DanielK_WMDE, yeah, I'm trying to get my mind around that perspective.
13921:35:24 <DanielK_WMDE> you never purge wikitext
14021:35:38 <TimStarling> I think MW's link tables need to be moved out of the core DBs fairly urgently, I've thought so for years
14121:35:46 <DanielK_WMDE> what needs purging is *gernerated* content, so we need to track what it depends on
14221:36:29 <TimStarling> I guess it is fine to start with a new thing like shadow namespaces, but the existing LinksUpdate system is not so awesome that I would want to see it preserved
14321:36:42 <DanielK_WMDE> TimStarling: i think there are two use cases for those: dependency tracking for puring (which should be moved and improved), and tracking references (in whatlinshere, related changes, etc) for maintenance (which we should keep, i think=)
14421:36:45 <matt_flaschen> TimStarling, could you explain about moving the links tables out?
14521:36:49 <gwicke> one work-around for the unpredictable dependency problem is to eliminate the direct depenency by a) making things modular, or b) rendering things dynamically on the client
14621:37:18 <DanielK_WMDE> TimStarling: i think we need to consider these two things separately. with more dynamic content, we can no longer treat them the same
14721:37:33 <TimStarling> matt_flaschen: the *links tables are large and heavily-updated, it's not scalable
14821:37:47 <DanielK_WMDE> #info what needs purging is *gernerated* content, so we need to track what it depends on
14921:38:51 <TimStarling> DanielK_WMDE: yes, they can be split up
15021:38:56 <DanielK_WMDE> gwicke: but in order to do that, i need to know the dependencies, right? the client somehow needs to know what resources to load.
15121:39:11 <TimStarling> maybe we can stop putting entries in whatlinkshere when someone uses #ifexist
15221:39:14 <DanielK_WMDE> TimStarling: i think that is actually the cure issue this rfc is about
15321:39:18 <DanielK_WMDE> *core
15421:39:24 <gwicke> DanielK_WMDE: yes, but the response of each part can vary independently, which avoids a lot of dependencies to the composite
15521:39:54 <DanielK_WMDE> it avoids materializing the composite
15621:40:07 <DanielK_WMDE> the dependencies that need tracking are the same
15721:40:09 <Scott_WUaS> (DanielK_WMDE, matt_flaschen, gwicke: re https://commons.wikimedia.org/wiki/Template:LangSwitch - long term question - in what ways could this be extended to include translation from WMF Content Translation as well as later with Wiktionary - and in combination conceptually with Google Translate?)
15821:40:19 <TimStarling> well, the RFC is requirements gathering, so I am stating my support for a use case which is on the list of things this could solve
15921:40:26 <gwicke> DanielK_WMDE: no, you no longer need to track the dependency to the composite
16021:40:33 <gwicke> instead, you can just purge one component
16121:40:49 <gwicke> & have it update wherever that component is pulled in dynamically
16221:41:10 <gwicke> the same way CSS can be updated without re-rendering everything, or tracking dependencies
16321:41:33 <DanielK_WMDE> Scott_WUaS: this is one small step towards allowing people to view pages in their favorite language without logging in. but it's fairly technical/low level, not directly related to translation.
16421:41:39 <Scott_WUaS> (leading eventual very sophisticated machine translation with artificial intelligence re natural language processing in extending WMF ~300 languages to all languages even?) Thanks!
16521:41:52 <gwicke> of course, this only works with reasonable efficiency for a limited number of components
16621:41:54 <Scott_WUaS> DanielK_WMDE: Thanks
16721:42:37 <DanielK_WMDE> gwicke: the dependencies still have to be known somewhere. but perhaps they can just live on the client
16821:43:23 <gwicke> DanielK_WMDE: the difference is that you only need to store one edge (the reference), and don't need to track the backwards edge
16921:43:41 <DanielK_WMDE> ah, yea... i guess you are right
17021:43:41 <gwicke> it's an example of polling
17121:43:50 <AaronSchulz> 2:25 PM <gwicke> for item pages, is there a concern beyond CDN purging?
17221:43:55 <AaronSchulz> what's the answer to that?
17321:44:20 <DanielK_WMDE> gwicke: anyway - do you agree that the main issue is tracking dynamically generated artifacts, in order to determin when they need to be re-generated? and the proposal is to no longer try to do this based on link tables?
17421:44:21 * AaronSchulz is curious about xkey too
17521:45:07 <AaronSchulz> if per language renderings where bucketed in cdn, they could be purged via a single URL purge, without tracking each language variant, right?
17621:45:08 <DanielK_WMDE> AaronSchulz, gwicke: about items pages and cdn purging: yes. that's why anons can only view wikidata in english.
17721:45:33 <gwicke> DanielK_WMDE: that is the general problem, yes -- but I think we'll need to work on this from both sides: a) avoiding some dependencies by making things more modular, and b) improving our infrastructure for tracking dependencies
17821:45:46 <AaronSchulz> avoiding the need to track new renderings on GET
17921:46:33 <gwicke> AaronSchulz: for item pages, even a Vary might be able to do it right now
18021:46:40 <TimStarling> "anons can only view wikidata in english" -- what a stupid problem for a project like ours to have
18121:46:45 <matt_flaschen> AaronSchulz, but how do you know which pages to purge, if it varies by language what the dependencies are?
18221:46:59 <TimStarling> embarrassing, we should fix that
18321:47:03 <DanielK_WMDE> AaronSchulz: related to the cdn issue: https://phabricator.wikimedia.org/T114662
18421:47:10 <AaronSchulz> gwicke: yeah, if we validate the language and 500 on bogus ones, that limits the hash-chain to only a few 100s of possibilities
18521:47:10 <matt_flaschen> Also, regarding the CDN question, parser cache itself also needs to be purged.
18621:47:36 <gwicke> AaronSchulz: yeah, only a teeny bit of fragmentation ;)
18721:47:41 <AaronSchulz> once you hit MW, there are lots of ways to handle validating variant caches of a single source
18821:48:10 <AaronSchulz> (e.g. checking page_touched, some other field, WAN cache check keys,...)
18921:48:21 <DanielK_WMDE> AaronSchulz: no, we would still need that info, so we can purge the parser cache
19021:48:27 <matt_flaschen> AaronSchulz, never mind, "varies by language what the dependencies are" doesn't apply to Wikidata Q pages.
19121:48:59 <AaronSchulz> DanielK_WMDE: why?
19221:49:13 <DanielK_WMDE> AaronSchulz: also, we want selective purging. if only the french rendering depends on resource X, and X changes, it would be nice if we could purge only the french version.
19321:49:19 <DanielK_WMDE> not all of them
19421:49:36 <AaronSchulz> "would be nice" or "actually is worth it"?
19521:49:46 <DanielK_WMDE> AaronSchulz: you'd need multi-dimensional buckets. not just a bucket for "all renderings of Foo", but also a bucket for "all renderings that depends on X"
19621:49:56 <DanielK_WMDE> both on the CDN, and for the parser cache
19721:50:12 <DanielK_WMDE> (that's actually closely related to the recently rejected PSR6 proposal)
19821:50:39 <DanielK_WMDE> https://phabricator.wikimedia.org/T130528
19921:50:39 <AaronSchulz> what is X? like magic words and templates that vary on language?
20021:50:59 <DanielK_WMDE> AaronSchulz: a template, or a wikidata item. or the phase of the moon. whatever the rendering depends on
20121:51:35 <AaronSchulz> is there a list of concrete use cases for wikidata?
20221:51:46 <DanielK_WMDE> AaronSchulz: the fact that only the french rendering depends on X would be due to {{langswitch}} or <translate> or something, yes
20321:51:59 <AaronSchulz> for MW core, I'd rather discourage/deprecate stuff like that (use lower TTLs where it is needed)
20421:52:25 <DanielK_WMDE> AaronSchulz: the one that has the most need for language dependant tracking is file description pages on commons.
20521:52:39 <DanielK_WMDE> that'S why i implemented on-render usage tracking for entities.
20621:52:54 <matt_flaschen> DanielK_WMDE, couldn't that be solved (the same way it's non-dynamic for Q pages like discussed above) once Commons uses Wikibase?
20721:53:03 <matt_flaschen> To avoid the "actual dependency graph varies by language" issue?
20821:53:08 <matt_flaschen> At least for file description pages.
20921:53:11 <gwicke> there are a lot of trade-offs here between cost of dependency tracking, cost of purges, accuracy of tracking non-deterministic dependencies etc
21021:53:13 <DanielK_WMDE> AaronSchulz: well, if i understand gwickes vision correctly, he would like to habve a *lot* more of this kind of thing.
21121:53:21 <AaronSchulz> it adds a lot of complexity to go from just varying on rendering of entities/pages to using different helper entities/pages to render an asset/entity
21221:53:48 <DanielK_WMDE> matt_flaschen: once commons *only* uses wikibase, and doesn't generate wikitext from it - then yes, to an extend.
21321:54:00 <DanielK_WMDE> i expect that transition period to take about 5 to 10 years.
21421:54:15 <gwicke> for wikipedia, we kind of track dependencies per language already by virtue of having one project per language
21521:54:26 <AaronSchulz> heh, another decade anniversary cake ;)
21621:55:09 <DanielK_WMDE> AaronSchulz: this rfc is about a generic mechanism to allow this kind of tracking, not just for wikidata, but for all kinds of content. this allows us to re-use rendered snippets/widgets, and purge them when appropriate
21721:55:17 <TimStarling> you know {{int:}} varying by user language was an accident
21821:55:31 <DanielK_WMDE> gwicke: indeed. the language issue arises for multilingual pages.
21921:55:38 <DanielK_WMDE> TimStarling: yea :D
22021:55:43 <TimStarling> if I hadn't made that implementation error then commons would have been stuck with JS hacks to hide languages other than the current one
22121:55:51 <TimStarling> which is what they did to start with
22221:55:52 <gwicke> or CSS
22321:56:16 <DanielK_WMDE> they would just have extended <translate> to cover this
22421:57:16 <gwicke> my personal inclination is still to reduce the reasons for such variance in the content
22521:57:45 <gwicke> but it's clear that there are so many sources and use cases for this already that it won't be possible to avoid it altogether
22621:58:11 <AaronSchulz> right
22721:58:36 <TimStarling> any action items or #info for the notes before we wrap up?
22821:59:03 <gwicke> I want to thank you all for participating
22921:59:16 <Scott_WUaS> (Glad for this focus on language and translation)
23021:59:25 <DanielK_WMDE> I want to thank you for listening to my rants ;)
23121:59:33 <TimStarling> next week we don't have a particular RFC scheduled, sowea triage session
23221:59:33 <mobrovac> one action would seem to be to clarify the anticipated concrete dependency relations
23321:59:42 <mobrovac> lang variants seem to be the most painful point so far
23421:59:44 <gwicke> this kind of discussion is a perfect reminder of all the tricky issues that we are trying so hard to forget
23521:59:58 * robla looks up link to next week
23622:00:00 <TimStarling> s/sowea/so we are planning on doing a
23722:00:29 <gwicke> mobrovac: yes, along with update volume & suitable APIs
23822:00:30 <robla> next week: https://phabricator.wikimedia.org/E187
23922:00:43 <robla> as TimStarling said, it's a triage
24022:00:52 <TimStarling> mobrovac: that is an action item for gwicke?
24122:01:21 <mobrovac> i'd say this is an action item for all of us that want to get the most out of this
24222:01:34 <gwicke> yeah, I think we'll look into this further as a team
24322:01:35 <mobrovac> the clearer the problem, the simpler the solution as always
24422:01:38 <TimStarling> #action update the RFC to clarify the anticipated concrete dependency relations
24522:02:18 <gwicke> we might also want to split out the dependency tracking part
24622:02:24 <TimStarling> I think it's usually best if an action item is assigned to a single person, since shared responsibility is equivalent to no responsibility
24722:02:41 <TimStarling> you know the bystander effect
24822:03:06 <gwicke> TimStarling: I can be that person, but mobrovac is leading changeprop development, so is heavily involved
24922:03:16 <gwicke> along with pchelolo
25022:03:22 <TimStarling> ok
25122:03:45 <TimStarling> #endmeeting

daniel renamed this event from RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to RFC Meeting: RFC: Requirements for change propagation (2016-05-18, #wikimedia-office).