Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
@Fuzzy you may be interested in T254522: Set appropriate wikitext limits for Parsoid to ensure it doesn't OOM, which will eventually replace the limits in the legacy parser. Appropriate metrics are not easy to find, because ideally they must be computed *before* spending the compute resources that a full computation of the desired result would achieve. That is why there are separate limits on article size and expanded size (and cpu time, and expansion depth, and expensive function count, and visited postprocessor nodes, etc). At every point we try to avoid spending the resources to do the actual expansion if it is likely based on "what we already know" that the other limits would fail. If we do the entire expansion and rendering to HTML and *then* check to see if it turned out to be too big we're already too late to reclaim the resources spent.
@stjn you are correct that this particular issue is a mix of social and technical factors as I pointed out in T275319#6884320. The technical factors absolutely scale with bytes; the social factors scale with <a more complicated metric related to information entropy>.
This discussion risks going in circles. As I wrote previously in T275319#6884320:
zhwiki for example should have 4x the character limit if this is to be the new rule. Unlike what is claimed above, many of the performance metrics *do* scale with bytes rather than characters -- most wikitext processing is at some point regexp-based, and that works on bytes (unicode characters are desugared to the appropriate byte sequences), and of course network bandwidth, database storage size, database column limits, etc, all scale with bytes not characters. We should be careful before bumping the limit that we're not going to run into problems with database schema, etc.
In short: it's not obvious what the new limit "should be", and in fact it's fairly certain that whatever the new limit is, there will still be source texts which will exceed it.
You can work around by setting border-top-color, border-bottom-color, etc independently.
See T84937#957838 for justification when support for nbsp was added. The new entities added here aren't likely to be generated by Visual Editor so it's not entirely clear why we need to support them; wikitext is not a superset of HTML5.
I think this is complete; let me know if additional work needs to be done.
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TemplateData/+/819740 T54607 T55413 are related -- right now we have no way to associate desired styling information with extension tags.
The 1031918 patch was backported and worked, but the information gleaned was a little disappointing: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.05.15?id=qKHwfY8BS8vmb5K1Na1f
Questions are: /who/ is adding non-serializable data to the ParserOutput, and /when/. Our suspicion is that somehow ParserOutput::getText() is being called on the ParserOutput *before* it is written to the cache, so that modifications made in the OutputTransform pipeline are being retroactively applied to the cached content.
We are now tagging with both the parsoid library version (eg v0.20.0-a4) and the "HTML version", eg https://www.mediawiki.org/wiki/Specs/HTML/2.8.0 in the data-mw-parsoid-version and data-mw-html-version attributes, respectively, set on the wrapper div.
See also https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1027012 which I forgot to tag with this task #.
In T282499#9731215, @Func wrote:In T282499#9730874, @cscott wrote:
- Current Parsoid calls back to the legacy Parser to handle various types of content. That can cause there to be multiple $parser objects used on a given page. Most *present* issues with "linear parsing" are due to the fact that, even though the parse is technically in order, data being attached to the Parser object is lost because there are multiple parsers in play.
I saw that the DataAccess class would use the same legacy parser object for all calls to extension tag when the PageConfig remains unchanged, basically it means within the same page? Do you mean this is going to be changed soon, or did I miss some concept about the PageConfig etc.?
In T13555#9738532, @Jdlrobson wrote:Just to make sure web team schedule sufficient time for this work, when are we expecting to merge https://gerrit.wikimedia.org/r/c/mediawiki/core/+/842859 ?
Seems like extension data being dropped during clone is the root cause?
$ php tools/regression-testing.php -u cscott --url https://parsoid-rt-tests.wikimedia.org/regressions/between/d16b7c75/b1d94b70 ssh cscott@testreduce1002.eqiad.wmnet sudo rm -f /tmp/titles scp /tmp/titles cscott@testreduce1002.eqiad.wmnet:/tmp/titles titles 100% 507 13.3KB/s 00:00 ----- Checking out d16b7c75 on scandium ----- ssh cscott@scandium.eqiad.wmnet cd /srv/parsoid-testing && git fetch && git checkout d16b7c75 && sudo systemctl restart php7.4-fpm.service HEAD is now at d16b7c757 Match core styles for gallery line media errors ----- Checking out d16b7c75 on testreduce1002 ----- ssh cscott@testreduce1002.eqiad.wmnet cd /srv/parsoid-testing && git fetch && git checkout d16b7c75 HEAD is now at d16b7c757 Match core styles for gallery line media errors ----- Running tests ----- ssh cscott@testreduce1002.eqiad.wmnet sudo rm -f /tmp/results.d16b7c75.json && cd /srv/parsoid-testing && node tools/runRtTests.js --proxyURL http://scandium.eqiad.wmnet:80 --parsoidURL http://DOMAIN/w/rest.php -f /tmp/titles -o /tmp/results.d16b7c75.json REDIRECT: mznwiki:کورو -> mznwiki:کوروف REDIRECT: jawiki:Protest_Songs -> jawiki:プロテスト・ソングス REDIRECT: enwiki:AK74M_with_universal_upgrade_kit -> enwiki:AK-74#AK-74M_UUK_(Universal_Upgrade_Kit) REDIRECT: enwiki:Chadžibėjus -> enwiki:Odesa scp cscott@testreduce1002.eqiad.wmnet:/tmp/results.d16b7c75.json /tmp/ results.d16b7c75.json 100% 2529 35.3KB/s 00:00 ----- Checking out b1d94b70 on scandium ----- ssh cscott@scandium.eqiad.wmnet cd /srv/parsoid-testing && git fetch && git checkout b1d94b70 && sudo systemctl restart php7.4-fpm.service Previous HEAD position was d16b7c757 Match core styles for gallery line media errors HEAD is now at b1d94b70f Move mock option to the top in CLI ----- Checking out b1d94b70 on testreduce1002 ----- ssh cscott@testreduce1002.eqiad.wmnet cd /srv/parsoid-testing && git fetch && git checkout b1d94b70 Previous HEAD position was d16b7c757 Match core styles for gallery line media errors HEAD is now at b1d94b70f Move mock option to the top in CLI ----- Running tests ----- ssh cscott@testreduce1002.eqiad.wmnet sudo rm -f /tmp/results.b1d94b70.json && cd /srv/parsoid-testing && node tools/runRtTests.js --proxyURL http://scandium.eqiad.wmnet:80 --parsoidURL http://DOMAIN/w/rest.php -f /tmp/titles -o /tmp/results.b1d94b70.json REDIRECT: mznwiki:کورو -> mznwiki:کوروف REDIRECT: jawiki:Protest_Songs -> jawiki:プロテスト・ソングス REDIRECT: enwiki:AK74M_with_universal_upgrade_kit -> enwiki:AK-74#AK-74M_UUK_(Universal_Upgrade_Kit) REDIRECT: enwiki:Chadžibėjus -> enwiki:Odesa scp cscott@testreduce1002.eqiad.wmnet:/tmp/results.b1d94b70.json /tmp/ results.b1d94b70.json 100% 2529 38.6KB/s 00:00 ----- Comparing results ----- jawiki:仮面ライダーギーツ No changes! metawiki:Movement_Charter/Ambassadors_Program/Conversations/en No changes! pnbwiki:پہلا_صفہ No changes! zhwiki:阮明哲 No changes! mznwiki:کورو No changes! mznwiki:بحره_(شهر) No changes! mznwiki:کوت No changes! jawiki:坂善商事 No changes! jawiki:Protest_Songs No changes! ptwiki:Sociedad_Deportiva_Ponferradina No changes! enwiki:Talk:Kristi Noem No changes! jawiki:ブローニュ=シュル=メール No changes! enwiki:AK74M_with_universal_upgrade_kit No changes! enwiki:Chadžibėjus No changes! metawiki:Wiki Loves Africa 2024/Participating communities No changes! eswiki:Thunnus No changes! ruwiki:TC-PAM No changes! viwiki:Mẹ No changes! --------------------- *** No pages need investigation ***
\o/
The logic for combining slot html in ContentRenderer (I think) is probably pretty broken from an editing standpoint. Probably the composition should be moved to OutputTransform and the separate slot content should be stored separately in ParserOutput, which is more or less what @subbu and @daniel is proposing -- except @subbu is saying one ParserOutput contains many slots and @daniel is saying ParserCache should be "per slot" and the combining ParserOutputs into a final page should be post-cache. I think!
I'd prefer not to include any additional __MAGIC_WORDS__ and instead use case-sensitive {{#parserfunction}} syntax for new additions, as discussed in T204370: Behavior switch/magic word uniformity. The current proposals for "no categories" "but wait some categories" seem quite confusing. One of the benefits of parser function syntax is that we can add arguments! So it seems like {{#cat:Foo|always}} would be (eg) an alternative way to ensure that the page is /always/ added to Category:Foo, instead of making it a property of the category itself. Alternatively, an extensible {{#categoryproperties|always|hidden|....}} on the category page would be more sustainable than a plethora of __HIDDENCAT__ ALWAYSCAT` etc magic words.
What remains to be done on this task? Is this a still a blocker for rolling out parsoid read views on some wikis?
In T3605#6851298, @BrandonXLF wrote:One major use case would be the {{Empty section}} and {{Expand section}} templates at the English Wikipedia. The templates would be able to use this magic word in edit links by adding §ion={{SECTION}] to the links. This would make it easier for editors to help expand these sections and not have to navigate to them by scrolling through the editor for the entire page. See this section on the Village Post.
In T363484#9762805, @Ladsgroup wrote:One thing that I might be wrong but it feels like this is that if the notice comes and goes without me clicking on the close button, it shows up again but if I hit the x button, it stays gone.
In T363484#9762730, @Jdlrobson wrote:Have you considered using a banner either using siteNotice or at the footer? Since there is a rendered by Parsoid indicator I am curious at the goal of also having the notification...?
Some design review notes at https://docs.google.com/document/d/1xWztBEE2E414IEEAJAFqn_xKX2szMIS0jNqKhkVIWz4/edit (WMF only, sorry).
For future reference, the PHP side code for this looks like:
https://github.com/wikimedia/mediawiki-extensions-ParserMigration/blob/e2bf2f59f00dac44b053de5aabdae0f38dfad435/src/Hooks.php#L132
and
https://github.com/wikimedia/mediawiki-extensions-ParserMigration/blob/e2bf2f59f00dac44b053de5aabdae0f38dfad435/extension.json#L78
See also T42307: mw.notification Usability Improvements, which is approaching the same question from the other direction.
See also T303612: ToastNotification: Add ToastNotification component to Codex which is related.
No issues from the deploy.
No problems from the train, resolving.
If you wanted to make this "cleaner" you could also say "in wikitext, unlike in html, <!-- --> style comments nest". We have other places where we diverge from html syntax (wikitext is not a superset of html), and I think you could plausibly argue this better matches how most human editors /expect/ delimited comments to work.
Yet another option is to use {{Empty template|.....}} to comment out content. This isn't super attractive now, but should get nicer as {{Empty template|<<< .... >>>}}.
Also work double-checking that no one is explicitly calling unset($parser->mInParse) since that's a pattern I had to fix in a number of places. Well-meaning folks were thinking "I'll reduce memory overhead a little bit" (even though that's completely pointless, as the hash table allocated for dynamic properties by PHP was allocated when the dynamic property was set and doesn't go away, it just sits there empty instead) and unsetting their dynamic properties "when they were done with them" which then makes them dynamic and subject to a complaint from DeprecationHelper.
Is there any way for DeprecationHelper to determine that the object is being destructed, and suppress its warning in that case?
Note that setting the ParserOption to suppress edit links at the present time splits the parser cache. That's probably not an issue since the page will *always* be rendered with that setting.
A compromise is to make the Translate-specific tokens *temporary*. The big benefit of adding support to the Tokenizer is that it makes it easier to emit a linter error for them if they are found inside a comment region.
FWIW JS code can figure out if parsoid is being used to render the page by looking for parsermigration-parsoid in the JsConfigVars, but that's a temporary thing put there by the ParserMigration extension, not a "real" solution.
My concerns are:
Clarified the current compatibility policy with @daniel (prompted by a discussion with @Krinkle on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1023098) and documented the policy at https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility:
DT is turning on for these wikis on Thursday (Apr 25).