Page MenuHomePhabricator

Add wikitext grammer for embedding properties from other pages
Open, Needs TriagePublic

Event Timeline

MZMcBride created this task.Jan 6 2017, 3:48 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 6 2017, 3:48 AM

I'd like to meet to discuss the options listed on the wiki page and any other ideas that would resolve the linked tasks.

TTO added a subscriber: TTO.Jan 7 2017, 6:19 AM

According to https://lists.wikimedia.org/pipermail/wikitech-l/2017-January/087406.html, this task is up for discussion at the next IRC meeting. I guess that would be Wednesday, January 25, 2017 (E464)?

daniel added a subscriber: daniel.Jan 24 2017, 3:38 PM

@MZMcBride yes, indeed. I was traveling and forgot to ping you about this, sorry. Will you be available for the IRC meeting at 2pm PST on January 25?

@MZMcBride yes, indeed. I was traveling and forgot to ping you about this, sorry. Will you be available for the IRC meeting at 2pm PST on January 25?

No worries. Yep, I'm available then; I have the meeting on my calendar. I pinged @Jackmcbarn and @Legoktm about attending as well.

daniel added a comment.EditedJan 25 2017, 8:06 PM

It seems to me that the special case of a wiki page accessing its own page-props deserves another look: The problem with that is that page props come from the ParserOutput object, they are generated during and after parsing. That means that some page-props may already be there when some construct in wikitext asks for them, while others may not exist yet. And which properties already exist and which don't will depend on many things, including implementation details that may change, cache state, and whether we are are using parsoid.

Page props are a result of parsing. Because of this, I see no good way for a page to access its own page props during parsing. The only way around this would be to access information that is stored in a slot different from the wikitext, once we have T107595: [RFC] Multi-Content Revisions.

As to accessing other page's page-props: if we don't care about the information going stale, this should be trivial. If we do care, we need to track which page uses which property of which page, and then purge the pages whenever a property they use changes. That would fit into the scope of T102476: RFC: Requirements for change propagation. (Ab)using templatelinks would be an option, though I think we should rather not.

GWicke added a subscriber: GWicke.EditedJan 25 2017, 10:27 PM

Some issues:

  • The parser processing model currently allows for parallelism. This means that transclusions cannot access properties that are only discovered / built up during parsing. Some of those properties are added by templates, creating a circular dependency.
  • The issue of change propagation is only alluded to in the RFC. I would expect a rough proposal of how this would work, as well as an estimate of how many additional updates this would trigger.
  • Page properties are not defined as a stable API, so wikitext referencing it would break on any changes to internal data structures.

1MariaDB [enwiki_p]> select distinct pp_propname from page_props;
2+------------------------------+
3| pp_propname |
4+------------------------------+
5| defaultsort |
6| disambiguation |
7| displaytitle |
8| forcetoc |
9| graph_specs |
10| hiddencat |
11| index |
12| jsonconfig_getdata |
13| kartographer |
14| kartographer_links |
15| newsectionlink |
16| nocontentconvert |
17| noeditsection |
18| noexternallanglinks |
19| nogallery |
20| noindex |
21| nonewsectionlink |
22| notitleconvert |
23| notoc |
24| page_image |
25| page_image_free |
26| page_top_level_section_count |
27| score |
28| staticredirect |
29| templatedata |
30| wikibase-badge-Q17437796 |
31| wikibase-badge-Q17437798 |
32| wikibase-badge-Q17506997 |
33| wikibase-badge-Q17580674 |
34| wikibase-badge-Q20748091 |
35| wikibase-badge-Q20748092 |
36| wikibase-badge-Q20748093 |
37| wikibase-badge-Q20748094 |
38| wikibase_item |
39| wpb_banner |
40| wpb_banner_focus_x |
41| wpb_banner_focus_y |
42+------------------------------+

Alternatively to relying on templatelinks, a more specialized dependency tracking mechanism could be implemented, similar to the usage tracking mechanism for Wikibase: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/docs/usagetracking.wiki

daniel added a comment.EditedJan 26 2017, 2:45 PM

This was discussed on IRC on Wednesday, January 25. Full meetbot log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-01-25-22.02.log.html

Some notable comments/exchanges:

1 <Marybelle> My goal for this meeting is to figure out if there are any immediate action items we can take to eliminate edits such as https://en.wikipedia.org/w/index.php?title=Talk:Thomas_W._O%27Brien&diff=757480527&oldid=610994123
2
3 <DanielK_WMDE_> ...don't store meta-data in wikitext, let's make MCR happen :)
4
5 <legoktm> we could use templatelinks for cache invalidation right?
6 <Marybelle> You could think of the default category sort key as a partial page transclusion.
7 <Marybelle> We should acknowledge that we already have stale data and volatile wikitext.
8
9 <DanielK_WMDE_> Marybelle: 1) page-props are not a stable interface. Pages that rely on specific page-props may break unexpectedly. a more specicif interface would give more control, and prevent access to nasty things.
10 <DanielK_WMDE_> Marybelle: 2) we should look at concrete use cases instead of insisting on a general solution, if that general solution is problematic
11
12 <Krinkle> I'll mention two things: Firstly, it seems to me that accessing a page's own pageprops has a less clear usecase so far compared to accessing other page's props. Especially considering the ones set by the wikitext itself (makes it rather fragile). E.g. a template that varies based on whether the page is a disambiguation page.
13 <Krinkle> Secondly, we do have a few foreign-page magic words already. Both ones that work for both current and other pages, and those for other pages only. Such as {{PAGESINCATEGORY:}} and {{PAGESIZE:}} however it seems PAGESINCATEGORY for example has no cache invalidation strategy (no link table entry).
14 <jackmcbarn> Krinkle: i 100% agree. we shouldn't open the can of worms of letting pages access their own props
15
16 <Marybelle> You'd just treat the page as a regular transclusion.
17 <DanielK_WMDE_> legoktm: oh, not a specific page prop, just all of it? any any change to that page will then cause the talk page to be purged?...
18 <DanielK_WMDE_> would work, but seems wasteful, since *any* edit would trigger the purge
19 <DanielK_WMDE_> even though most edits don't change the respective prop
20 <jackmcbarn> we already have plenty of cases where we do that
21 <Krinkle> That seems acceptable.
22 <gwicke> wasteful invalidation is a significant concern
23 <gwicke> as well as the complexity of tracking even more dependencies
24
25 <Krinkle> Given how internal many page properties are I'd also say that whatever solution we come up with should not allow arbitrary access to them, but rather be a stable interface with just a subset of values we can support and have solid use cases. So that they don't depend on the current implementation.
26 <Krinkle> e.g. {{#pagemeta:Sandbox|pagesize}} and {{#pagemeta:Sandbox|disambig}} or some such.
27 <legoktm> Krinkle: so whitelist the pageprops we allow?
28 <legoktm> that seems pretty reasonable
29 <DanielK_WMDE_> legoktm: whitelist would make it a lot more sane.
30 <Marybelle> Whitelist would accompany a generic parser function?
31 <Krinkle> Marybelle: That's an idea yeah :)
32
33 <jackmcbarn> i feel like whitelisting isn't really very helpful, since most of the props that Marybelle would want to be whitelisted are ones i have concerns with exposing
34 <Marybelle> I'm not sure whitelisting solves much.
35
36 <DanielK_WMDE_> jackmcbarn: what pre-parse page-props are there? and are we talking about thinngs that are actually in the page_props table? Because page size isn't there, is it?
37
38 <Krinkle> [...] in theory someone could vary __DISAMBIG__ or DEFAULTSORT on the time.
39 <Marybelle> DanielK_WMDE_: You can use magic words with parser functions to create instability in any *links table.
40 <Marybelle> [[Category:{{CURRENTTIMESTAMP}}]]
41 <legoktm> {{DEFAULTSORT:{{CURRENTTIMESTAMP}}}} but links tables... yeah
42
43 <Marybelle> DanielK_WMDE_: It's very frustrating to store page properties and then not be able to use them.
44 <DanielK_WMDE_> there is a lot of frustrating things that non the less are as they are for a reason :)
45
46 <Marybelle> DanielK_WMDE_: One option is to put sort keys into Wikidata.
47 <DanielK_WMDE_> i don't think that makes any sense
48 <DanielK_WMDE_> it's specific to the *page*, not the thing described by the page
49 <Krinkle> I think there's a genuine use case for meta data that does not belong in Wikidata.
50 <bawolff> i dont support putting data that functionally depends on a page in wikidata
51
52 <bawolff> I dont see why purging is an issue, we already have lots of experiance dealing with that for templates
53
54 <Marybelle> bawolff: Would you re-use templatelinks or make a new table?
55 <bawolff> i like templatelinks, less complexity to reuse, but no strong opinions
56
57 <DanielK_WMDE_> is there a compelling use case for such meta-data in wikitext?
58 <Marybelle> Why can't I use that page image in an article?
59 <Marybelle> {{#getpageimage:Barack Obama}}
60 <DanielK_WMDE_> Marybelle: because that would be circular, if you do it on the page itself.
61 <Marybelle> gwicke: Same as templates?
62 <Marybelle> We already have all these problems.
63 <gwicke> templates can currently be processed in parallel, without circular dependencies
64 <gwicke> changing that would have major performance implications
65 <gwicke> any data that can potentially be added by a template cannot be accessed by the same page during parse without introducing a circular dependency
66 <legoktm> the preprocessor blocks recursive templates
67
68 <Marybelle> Computer science has grappled with circular dependencies previously, right?
69 <gwicke> nope, and the solution is predictable
70 <DanielK_WMDE_> gwicke: namely? Don't Do That Then?
71 <gwicke> that's the normal solution, yes
72 <gwicke> or try to break the cycle
73 <gwicke> trying to break the cycle is a whole other can of worms, of course
74 <Marybelle> gwicke: When I look at the various ways that wikitext is already volatile, I have difficulty caring.
75 <gwicke> in any case, I don't see us sacrificing parallelism for a feature like this
76
77 <DanielK_WMDE_> Consider {{#pageprop:name|page}} with 1) a whitelist of props 2) no access to the page's own props 3) an entry in templatelinks.
78 * bawolff likes
79 <Marybelle> DanielK_WMDE_: Fine with me.
80 <DanielK_WMDE_> ftr, i'm kind of ok with the strawman, though i'm worried that it will cause a lot of pointless purging, if used pervasively.
81
82 * bawolff sees this as basically equivalent to #ifexist
83 <DanielK_WMDE_> bawolff: yes, it seems to be pretty much the same to me, too.
84 <Marybelle> DanielK_WMDE_: For sort keys, you're talking about millions of uses.
85 <Marybelle> Since every talk page will presumably want access to the subject-space's sort key.
86 <DanielK_WMDE_> Marybelle: so every edit to every page will then purge two pages instead of one. effectively doubeling the rendering load. ugh...
87 <gwicke> we currently re-render about 400 pages per second
88 <DanielK_WMDE_> Marybelle: at an educated guess, using templatelinks for puring, the sortkey use case, if used on all talk pages, would add 10 to 20 percent to the rendering load. maybe more.
89
90 <gwicke> we will also need better means for tracking fine-grained dependencies
91 <gwicke> all those issues are not specific to this instance
92 <gwicke> but they aren't simple ones, and it will take some time
93
94 <Marybelle> We could make another links table.
95 <legoktm> new links table if we're going to do something weird
96 <DanielK_WMDE_> ok, new link table.
97 <DanielK_WMDE_> been there, done that, broke the site...
98
99 <gwicke> but the proposed scheme would purge on each edit
100 <gwicke> that's the "fine grained" part
101 <gwicke> which is missing
102 <DanielK_WMDE_> if we go for a separate link table, it would of course contain the propname
103 <DanielK_WMDE_> and only purge when that prop changes
104 <DanielK_WMDE_> like wbc_entity_usage
105
106 <gwicke> I would honestly recommend to take a second look at the actual use cases, and see if they absolutely need general access to random page properties
107 <Marybelle> gwicke: There are three use-cases: page image, disambiguation status, and category sort key.
108
109 <Marybelle> We could also just change MediaWiki behavior.
110 <Marybelle> So that Talk pages sort under the subject-space sort key all the time.
111 <Marybelle> That wouldn't solve the other use-cases.
112 <Marybelle> Like page images.
113 <DanielK_WMDE_> Marybelle: i think the "sort talk pages under the subject page's sort key" actually has merrit.
114 <Marybelle> And leave page images and disambiguation status for a different day?
115 <gwicke> page image and disambig status seem likely to become separate metadata
116
117 <legoktm> Can we set out requirements of what any solution has to do? Like prevent against recursion, have proper cache invalidation via the job queue, etc.
118 <legoktm> 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties
119 <legoktm> I think a new *links table should meet all of those
120
121 <legoktm> DanielK_WMDE_: what's the high cost exactly?
122 <DanielK_WMDE_> legoktm: developer time, lines of code to maintain, dba time, storage, i/o. not *extremely* high. but if the use case isn't very compelling, probably not worth it.
123 <DanielK_WMDE_> legoktm: with a generic dependency tracking system, the cost would be much lower
124 <bawolff> All features have cost, i wouldnt call this proposal excessively high
125 <DanielK_WMDE_> bawolff: i'm personally undecided on that question. it just may be worth it. or not.
126
127 <legoktm> #agree minimum requirements are 1) Prevent against recursion 2) Have proper invalidation via the job queue/whatever method 3) Only allow whitelisted page properties
128

daniel added a comment.EditedJan 26 2017, 3:48 PM

The essence of the meeting, as summerized above, seems to be:

  1. Allowing a page to access it's own properties does not seam feasible.
  2. Access to page_props should be filtered, to prevent page content relying on unstable implementation details.
  3. Usage of page properties needs to be tracked, so pages that use them can be purged when the respective property changes.
    • we already do this for PAGESIZE and #ifexists, by recording them in template links, like template transclusions.
    • using templatelinks for for the DEFAULTSORT use case would mean puring most talk pages on every edit of their respective subject page, even though most of these edits will not touch DEFAULTSORT.
    • introducing a specialized *links table that tracks the page and the property name used would allow targeted purging, but comes with costs for engineering/deployment/maintenance. This approach is used by Wikibase to track entity usage on wikitext pages. Some kind of cost/benefit analysis would be needed to decide on this option.

For the DEFAULTSORT use case, it may be best to simply change MediaWiki so it always uses the subject key's DEFAULT key also for the associated talk page.

MZMcBride changed the task status from Open to Stalled.Feb 11 2017, 4:48 AM
daniel moved this task from Request IRC meeting to Old on the TechCom-RFC board.Feb 11 2017, 10:43 AM
Aklapper changed the task status from Stalled to Open.May 24 2020, 7:29 PM

The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task is out of scope and nobody should ever work on this, then task status should have the "Declined" status.)

Krinkle renamed this task from RFC: Accessing page properties from wiki pages to Add wikitext grammer for embedding properties from other pages.Sep 16 2020, 7:31 PM
Krinkle edited projects, added MediaWiki-Parser, Parsoid; removed TechCom-RFC.
Krinkle added a subscriber: Krinkle.

Untagging an old RFC predating our current process. It appears to be a feature request for the parser, which I've tagged accordingly. If and when it is accepted and turns out to be cross-cutting or strategic, freel free to turn into an RFC.