Page MenuHomePhabricator

Audit SOL state of subpipelines
Open, NormalPublic

Description

In https://gerrit.wikimedia.org/r/#/c/220698/, it seems there's only one place we explicitly set the SOL state, and that's while serializing! Why isn't that necessary or done for subpipelines? Just a thought that occurred to me. Leaving it here so it isn't forgotten.

Event Timeline

Arlolra created this task.Jul 21 2015, 10:05 PM
Arlolra raised the priority of this task from to Needs Triage.
Arlolra updated the task description. (Show Details)
Arlolra added a subscriber: Arlolra.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 21 2015, 10:05 PM
Arlolra triaged this task as Normal priority.Jul 21 2015, 10:05 PM
Arlolra set Security to None.

Because of T14974. Although there might be some subtleties / edge case behavior there.

ssastry added a comment.EditedJul 22 2015, 1:04 PM

Given that templates can change sol state in ways not visible at the top level in the tokenizer, the only place where this matters is for sol-sensitive wikitext constructs that get identified in the tokenizer vs. in stage 3 transforms after everything has been expanded. The list of sol-sensitive constructs are given in HTMLTagsRequiringSOLContext in mediawiki.wikitext.constants.js

This matters in 2 ways: (a) the construct coming from a template x{{echo|*a}} (b) the sol-context affected by a preceding template {{echo|\n}}*a

  • So, indent-pre and lists work in both scenarios because they are identified in stage 3 (effectively they are handled as if there were a preprocessor).
  • But headings are not. Try {{echo|\n}}=foo= or a{{echo|1==foo=}}
  • Tables and table-construct wikitext are also sol-sensitive (if you strip all whitespace start of line). So, for the purposes of this bug, we should look at them as well.
  • Table start ("{|" ) in scenario (a) works because of T14974. It works in scenario (b) because of special handling in TSP which is a hack for exactly some of these scenarios.
  • But, I am sure that table row, table cell, table end won't work.

Part of the reason for not introducing additional context / state into subpipelines is because of the desire of wanting to keep the processing of them independent. Otherwise, all subpipeline processing becomes effectively serialized which kills perf in terms of latency. As noted in T14974, and in other discussions (ex: Scott's template talk at Wikimania '15), with some templating improvements, some of this can be handled that way.

It is also worth examining which of these scenarios are edge cases seen in real production pages and considering adding support to TSP, for ex. (that is how we added the TSP and special-case support for some scenarios). With additional templating changes, we might slowly be able to kill the TSP and other such hacks.

In any case, this should cover all the scenarios. Worth creating separate tickets for individual edge cases that need to be supported (because of being seen in production frequently enough), and also documenting this behavior and rationale as outlined in this bug on a wiki page.

The current values were made explicit in,
https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/482761

but the request auditing was not done.