Page MenuHomePhabricator
Paste P4819

Redacted log of the RFC IRC meeting on 2017-01-25
ActivePublic

Authored by daniel on Jan 26 2017, 3:50 PM.
Tags
None
Referenced Files
F5363139: Redacted log of the RFC IRC meeting on 2017-01-25
Jan 26 2017, 3:50 PM
Subscribers
None
<Marybelle> My goal for this meeting is to figure out if there are any immediate action items we can take to eliminate edits such as https://en.wikipedia.org/w/index.php?title=Talk:Thomas_W._O%27Brien&diff=757480527&oldid=610994123
<DanielK_WMDE_> ...don't store meta-data in wikitext, let's make MCR happen :)
<legoktm> we could use templatelinks for cache invalidation right?
<Marybelle> You could think of the default category sort key as a partial page transclusion.
<Marybelle> We should acknowledge that we already have stale data and volatile wikitext.
<DanielK_WMDE_> Marybelle: 1) page-props are not a stable interface. Pages that rely on specific page-props may break unexpectedly. a more specicif interface would give more control, and prevent access to nasty things.
<DanielK_WMDE_> Marybelle: 2) we should look at concrete use cases instead of insisting on a general solution, if that general solution is problematic
<Krinkle> I'll mention two things: Firstly, it seems to me that accessing a page's own pageprops has a less clear usecase so far compared to accessing other page's props. Especially considering the ones set by the wikitext itself (makes it rather fragile). E.g. a template that varies based on whether the page is a disambiguation page.
<Krinkle> Secondly, we do have a few foreign-page magic words already. Both ones that work for both current and other pages, and those for other pages only. Such as {{PAGESINCATEGORY:}} and {{PAGESIZE:}} however it seems PAGESINCATEGORY for example has no cache invalidation strategy (no link table entry).
<jackmcbarn> Krinkle: i 100% agree. we shouldn't open the can of worms of letting pages access their own props
<Marybelle> You'd just treat the page as a regular transclusion.
<DanielK_WMDE_> legoktm: oh, not a specific page prop, just all of it? any any change to that page will then cause the talk page to be purged?...
<DanielK_WMDE_> would work, but seems wasteful, since *any* edit would trigger the purge
<DanielK_WMDE_> even though most edits don't change the respective prop
<jackmcbarn> we already have plenty of cases where we do that
<Krinkle> That seems acceptable.
<gwicke> wasteful invalidation is a significant concern
<gwicke> as well as the complexity of tracking even more dependencies
<Krinkle> Given how internal many page properties are I'd also say that whatever solution we come up with should not allow arbitrary access to them, but rather be a stable interface with just a subset of values we can support and have solid use cases. So that they don't depend on the current implementation.
<Krinkle> e.g. {{#pagemeta:Sandbox|pagesize}} and {{#pagemeta:Sandbox|disambig}} or some such.
<legoktm> Krinkle: so whitelist the pageprops we allow?
<legoktm> that seems pretty reasonable
<DanielK_WMDE_> legoktm: whitelist would make it a lot more sane.
<Marybelle> Whitelist would accompany a generic parser function?
<Krinkle> Marybelle: That's an idea yeah :)
<jackmcbarn> i feel like whitelisting isn't really very helpful, since most of the props that Marybelle would want to be whitelisted are ones i have concerns with exposing
<Marybelle> I'm not sure whitelisting solves much.
<DanielK_WMDE_> jackmcbarn: what pre-parse page-props are there? and are we talking about thinngs that are actually in the page_props table? Because page size isn't there, is it?
<Krinkle> [...] in theory someone could vary __DISAMBIG__ or DEFAULTSORT on the time.
<Marybelle> DanielK_WMDE_: You can use magic words with parser functions to create instability in any *links table.
<Marybelle> [[Category:{{CURRENTTIMESTAMP}}]]
<legoktm> {{DEFAULTSORT:{{CURRENTTIMESTAMP}}}} but links tables... yeah
<Marybelle> DanielK_WMDE_: It's very frustrating to store page properties and then not be able to use them.
<DanielK_WMDE_> there is a lot of frustrating things that non the less are as they are for a reason :)
<Marybelle> DanielK_WMDE_: One option is to put sort keys into Wikidata.
<DanielK_WMDE_> i don't think that makes any sense
<DanielK_WMDE_> it's specific to the *page*, not the thing described by the page
<Krinkle> I think there's a genuine use case for meta data that does not belong in Wikidata.
<bawolff> i dont support putting data that functionally depends on a page in wikidata
<bawolff> I dont see why purging is an issue, we already have lots of experiance dealing with that for templates
<Marybelle> bawolff: Would you re-use templatelinks or make a new table?
<bawolff> i like templatelinks, less complexity to reuse, but no strong opinions
<DanielK_WMDE_> is there a compelling use case for such meta-data in wikitext?
<Marybelle> Why can't I use that page image in an article?
<Marybelle> {{#getpageimage:Barack Obama}}
<DanielK_WMDE_> Marybelle: because that would be circular, if you do it on the page itself.
<Marybelle> gwicke: Same as templates?
<Marybelle> We already have all these problems.
<gwicke> templates can currently be processed in parallel, without circular dependencies
<gwicke> changing that would have major performance implications
<gwicke> any data that can potentially be added by a template cannot be accessed by the same page during parse without introducing a circular dependency
<legoktm> the preprocessor blocks recursive templates
<Marybelle> Computer science has grappled with circular dependencies previously, right?
<gwicke> nope, and the solution is predictable
<DanielK_WMDE_> gwicke: namely? Don't Do That Then?
<gwicke> that's the normal solution, yes
<gwicke> or try to break the cycle
<gwicke> trying to break the cycle is a whole other can of worms, of course
<Marybelle> gwicke: When I look at the various ways that wikitext is already volatile, I have difficulty caring.
<gwicke> in any case, I don't see us sacrificing parallelism for a feature like this
<DanielK_WMDE_> Consider {{#pageprop:name|page}} with 1) a whitelist of props 2) no access to the page's own props 3) an entry in templatelinks.
* bawolff likes
<Marybelle> DanielK_WMDE_: Fine with me.
<DanielK_WMDE_> ftr, i'm kind of ok with the strawman, though i'm worried that it will cause a lot of pointless purging, if used pervasively.
* bawolff sees this as basically equivalent to #ifexist
<DanielK_WMDE_> bawolff: yes, it seems to be pretty much the same to me, too.
<Marybelle> DanielK_WMDE_: For sort keys, you're talking about millions of uses.
<Marybelle> Since every talk page will presumably want access to the subject-space's sort key.
<DanielK_WMDE_> Marybelle: so every edit to every page will then purge two pages instead of one. effectively doubeling the rendering load. ugh...
<gwicke> we currently re-render about 400 pages per second
<DanielK_WMDE_> Marybelle: at an educated guess, using templatelinks for puring, the sortkey use case, if used on all talk pages, would add 10 to 20 percent to the rendering load. maybe more.
<gwicke> we will also need better means for tracking fine-grained dependencies
<gwicke> all those issues are not specific to this instance
<gwicke> but they aren't simple ones, and it will take some time
<Marybelle> We could make another links table.
<legoktm> new links table if we're going to do something weird
<DanielK_WMDE_> ok, new link table.
<DanielK_WMDE_> been there, done that, broke the site...
<gwicke> but the proposed scheme would purge on each edit
<gwicke> that's the "fine grained" part
<gwicke> which is missing
<DanielK_WMDE_> if we go for a separate link table, it would of course contain the propname
<DanielK_WMDE_> and only purge when that prop changes
<DanielK_WMDE_> like wbc_entity_usage
<gwicke> I would honestly recommend to take a second look at the actual use cases, and see if they absolutely need general access to random page properties
<Marybelle> gwicke: There are three use-cases: page image, disambiguation status, and category sort key.
<Marybelle> We could also just change MediaWiki behavior.
<Marybelle> So that Talk pages sort under the subject-space sort key all the time.
<Marybelle> That wouldn't solve the other use-cases.
<Marybelle> Like page images.
<DanielK_WMDE_> Marybelle: i think the "sort talk pages under the subject page's sort key" actually has merrit.
<Marybelle> And leave page images and disambiguation status for a different day?
<gwicke> page image and disambig status seem likely to become separate metadata
<legoktm> Can we set out requirements of what any solution has to do? Like prevent against recursion, have proper cache invalidation via the job queue, etc.
<legoktm> 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties
<legoktm> I think a new *links table should meet all of those
<legoktm> DanielK_WMDE_: what's the high cost exactly?
<DanielK_WMDE_> legoktm: developer time, lines of code to maintain, dba time, storage, i/o. not *extremely* high. but if the use case isn't very compelling, probably not worth it.
<DanielK_WMDE_> legoktm: with a generic dependency tracking system, the cost would be much lower
<bawolff> All features have cost, i wouldnt call this proposal excessively high
<DanielK_WMDE_> bawolff: i'm personally undecided on that question. it just may be worth it. or not.
<legoktm> #agree minimum requirements are 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties