Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
I think it would also be helpful to determine why the template in question (assuming it is a template) is adding id attributes. These are supposed to be unique by the HTML spec. Perhaps id is not the right attribute to generate, and these should either be removed or changed to something like data-mw-.., which don't have to be unique.
Ok, on investigation, Parsoid does invoke guessVariant(), but the legacy Parsoid invokes it *twice*: once on the overall text of any string to be converted (including the embedded html tags and attribtues) and then again on the text substrings between tags. That seems to be a bug: if the topmost 'guess' returns false, then nothing on the page will be converted at all. It seems like the intended behavior is for the individual strings / paragraphs / etc to be the proper subjects of "guessing".
It's possible dewiki can work around the firefox bug with some client-side JS using jquery like:
$("span[typeof="mw:Entity"]:contains(\u00AD)").css('word-break','break-all');I haven't tested that, it's just a gesture in the general direction of a possible workaround for firefox users.
Oh, it appears to be a Firefox bug, and I'm testing on Chrome. Not a parser bug, I don't think, just a firefox bug.
I think the issue is that the browser doesn't chose to break at the ­ if it is inside a <span>. The Html for the table in the report definitely has the entity inside:
<th id="mwBDM">Frühstücksdirektorenkonferenz<span typeof="mw:Entity" id="mwBDQ">­</span>tagesordnungspunkt</th>
but the browser doesn't break the word there. Someone's going to have to study the unicode-word-break-algorithm-as-implemented-in-browsers to figure out why and if there's a fix. Maybe just some css would do the trick.
The character is certainly present in:
$ echo 'a­b' | php bin/parse.php <p data-parsoid='{"dsr":[0,7,0,0]}'>a<span typeof="mw:Entity" data-parsoid='{"src":"&shy;","srcContent":"","dsr":[1,6,null,null]}'></span>b</p>
and I see it in the output of
<section data-mw-section-id="0" id="mwAQ"><p id="mwAg">a<span typeof="mw:Entity" id="mwAw"></span>b</p>
as well. I think your terminal just didn't chose to display it, since technically it's an 'optional hyphen'.
That's a soft hyphen, your terminal might not display it.
This appears to be fixed, likely via 1a304c1526c8caf8f70a86ee929f69e4fcaa8e7b.
I don't think you need to resort to a maintenance script for this. If you're able to maintain a bit more storage, you can just represent each entry in the bloom filter with a few-bit counter instead of an int, and update the counter value whenever a new disambiguation page is created/removed (aka, whenever DISAMBIG is added/removed from a page). Edits which create/remove disambiguation pages should be rare.
Tested & verified fixed. You might have to purge the cache in some cases, but most pages shouldn't need this.
I created a subtask, T422866: Migrate parser tests to new phpunit:config mechanism, for the parser test-related piece of this. CTT isn't likely to get to this before next quarter though.
This behavior has been in place for 3 years now; hopefully all of the pages involved have been fixed.
Somewhat related to T17941: create magic word __NOCATEGORY__/T204370#11805009 which similarly wanted metadata to apply "to pages the template is included on" not "on the template page itself". Often <includeonly> and friends are used to manage this in the template context. Presumably you'd want to suppress the lint only for the template page, not for the pages it was included on.
T17941: create magic word __NOCATEGORY__ could be solved by using {{#category:...}} syntax for [[Category:...]], which would allow additional options of the sort requested there to be added.
It looks like the problem is the display:flex in the style attribute on the wrapper is at fault. Parsoid generates <span> wrappers around the html entities in the wikitext, by design. The whitespace is present in the HTML, so that's not a fault of Parsoid.
I think this is entirely on the VE side. Parsoid generates a ParserOutput, and the warnings should be present in ParserOutput::getWarningMsgs(). It's likely that VE's preview API doesn't actually return these to VE and/or VE doesn't have UX to display it, but I believe all the necessary information *should* be present from Parsoid.
Aren't there already functions to customize number formatting? We have Language::getNumberFormatter(), scribunto has number formatting instructions, etc. Technically I guess what's being asked is to surround any numbers in the output with a <span class="mw-number" data-mw-value="111"> so that client-side javascript can replace the number with the user's preferred format. You can already do this with templates in mediawiki and a user gadget. It requires buy-in from editors to add all this markup, which seems like the heavy lift here.
Parsing without a title has been deprecated since MW 1.34 (T245129). I'm surprised this hasn't shown up before, but I don't think this is related to this week's roll out: it's probably just that some spider has decided to hit some special page (?) which triggers this error.
Parsing without a title has been deprecated since MW 1.34 (T245129). I'm surprised this hasn't shown up before, but I don't think this is related to this week's roll out: it's probably just that some spider has decided to hit some special page (?) which triggers this error.
Should be fixed when wmf.23 rolls out tomorrow (Apr 9).
Just for info: Move createwithcontentmodel to autoconfirmed (1268225) was deployed just now as a config change, which adds a new permission which is not present in wmf.22 but is present in wmf.23 : Add createwithcontentmodel permission (1222750).
Also, aliasing a built-in class is legal in PHP 8.3, so this error (if reproducible) would only be applicable to PHP 8.2.
Parsoid in MW 1.45 should require wikimedia/remex-html ^5.1.0 and core requires *exactly* 5.1.0. That version of remex doesn't seem to correspond to the line numbers you've given here, and in any case it shouldn't be trying to instantiate Parsoid's DOMImplementation but instead Remex's:
https://github.com/wikimedia/mediawiki-libs-RemexHtml/blob/c75f653afdfc42040e27d311236e7856ebedaa25/src/DOM/DOMBuilder.php#L121
public function __construct( $options = [] ) {
$options += [
'errorCallback' => null,
'domImplementation' => null,
'domExceptionClass' => null,
] + ( class_exists( '\Dom\Document' ) ? [
'domImplementationClass' => '\Dom\Implementation',
] : [
'domImplementationClass' => \DOMImplementation::class,
] );
$this->errorCallback = $options['errorCallback'];
$this->domImplementation = $options['domImplementation'] ??
new $options['domImplementationClass'];And Parsoid's DOMFragmentBuilder shouldn't be mentioning a DOMImplementation class at all:
https://github.com/wikimedia/mediawiki-services-parsoid/blob/REL1_45/src/Wt2Html/TreeBuilder/ParsoidDOMFragmentBuilder.php#L20
/** @param Document $ownerDocument */
public function __construct( $ownerDocument ) {
'@phan-var \DOMDocument $ownerDocument'; // Remex pretends everything is \DOM
parent::__construct( $ownerDocument, [
'suppressIdAttribute' => DOMCompat::isUsingDodo(),
] );
}In T314399#11350611, @cscott wrote:The display title should probably include a language component as well, so we can properly set lang/dir attributes: T36514: The language and the direction of the title in first heading should depend on page content language instead of user interface language.
Two separate bugs: first, the code in ParserOutputAccess which dumps the parser cache key shows the key as it would be used in the primary (latest revision) cache, but in this case output was coming from the secondary (old revision) cache so the dumped key was misleading. The old revision cache was actually omitting all the postprocessing options from the key, which caused it to fetch output for the "wrong" skin from the cache. Fixed in Ensure RevisionOutputCache uses post-processing options where appropriate (1267124).
Looking at, eg, https://als.wikipedia.org/w/index.php?title=Photosynthese&oldid=1074722&useskin=monobook the limit report says:
<!-- NewPP limit report Parsed by mw‐web.eqiad.main‐6dbf997859‐jj4pp Cached time: 20260402143044 Cache expiry: 2592000 Reduced expiry: false Complications: [show‐toc, use‐parsoid] CPU time usage: 0.942 seconds Real time usage: 1.948 seconds Preprocessor visited node count: 836/1000000 Revision size: 79883/2097152 bytes Post‐expand include size: 10979/2097152 bytes Template argument size: 2186/2097152 bytes Highest expansion depth: 15/100 Expensive parser function count: 8/500 Unstrip recursion depth: 0/20 Unstrip post‐expand size: 1152/5000000 bytes Lua time usage: 0.093/10.000 seconds Lua memory usage: 4417372/52428800 bytes Number of Wikibase entities loaded: 1/500 --> <!-- Saved in RevisionOutputCache with key alswiki:parsoid-rcache:1074722:dateformat=default!useParsoid=1!userlang=en and timestamp 20260402143044 and revision id 1074722. --> <!-- Post‐processing cache key alswiki:postproc‐parsoid‐pcache:44770:|#|:idhash:enableSectionEditLinks=0!injectTOC=0!postproc=1!skin=vector‐2022!useParsoid=1, generated at 20260402143045 -->
Note that the skin in the postproc cache key is vector-2022 despite useskin=monobook in the URL. This suggests that the way we are fetching the skin here isn't working. I'd expect the result would be "vector-2022 style" section edit links, even though we're supposed to be using monobook, in addition to the TOC differences.
Here's another interesting test case:
T55784 is marching along, we're shortly going to be able to close this 20-year-old feature request.
This might be fixed by Arlo's patch this week (Html headings aren't section wrapped (1244822)), since it appears this is styling around raw <h2> tags generated by
https://fr.wikipedia.org/w/index.php?title=Template:Section%20d%C3%A9roulante%20d%C3%A9but&action=edit
Some suggestions for reducing load:
Maybe add a lint for "mixing wikitext and table syntax" to make it clear this is considered not a good thing to do?
What version of PHP were you using? I suspect it was not PHP 8.
The fix was posted a few months ago in https://es.wikipedia.org/wiki/Usuario_discusi%C3%B3n:Qwertyytrewqqwerty/DisamAssist-core.js#c-Cscott-20251218143100-Updating_for_Parsoid_read_views but hasn't been applied by the gadget author yet.
This is most likely a bug in the gadget, and not an issue in Parsoid per se.
WIP: Use template3 tokenization for native parser functions/template expansion (1189554) · Gerrit Code Review uses template3 tokenization for this, which should fix the problem eventually.
Shall we do this collaboratively, or does WMDE want to make some edits first and ping Content-Transform-Team for review, or what? This should be documented at https://www.mediawiki.org/wiki/Specs/HTML/2.8.0/Extensions/Cite .
Parsoid uses TemplateData and/or the pre-existing order (preserved in data-parsoid) to order template arguments. @thiemowmde is correct that Visual Editor plays little role in this.
Request from @Ottomata is to include limit report data, including cache key info, as well. Basically a version of the RenderDebugInfo stage, but putting it into a <script> tag in the <head> or something like that.
This is a known "missing feature" but not a priority for either editing or Content-Transform-Team at this time.
VE represents this as <mw:signature> node, and VE tricks Parsoid into accepting this as an "inline transclusion" with the contents "~~~~" and apparently that works during html2wt. So this is fixed (or works) in the VE-to-wikitext transition, what's apparently not working is the wikitext -to- VE transition.
Oh, @hashar says in T421206: PHP Deprecated: Use of MediaWiki\Parser\ParserOutput::setOutputFlag with non-standard flag was deprecated in MediaWiki 1.45. [Called from MediaWiki\Parser\ParserOutput::initFromJson]:
There was roughly 85 of them happening while I was promoting group 1 and stopped once the train command had completed. I can imagine they are entries cached by newly promoted wmf.21 which are read by old wmf.20 processes?
Hm. The deprecation notice was added in I6363016b8bf1a09f104e475bfd949697d0df9a5c in Sep 2025. The warning should be triggered whenever any entry is added to the cache, which hasn't been happening for months now, far longer than a cache expiration time. And we've never deprecated or removed an existing parser output flag. So I'd understand if this happened during roll-back, when a flag existing in the "new" version that the "old" version didn't know about when it was rolled back. But this shouldn't ever be generated by roll-forward as far as I know.
In T419328#11696990, @Winston_Sung wrote:We should drop guessVariant and decide a way to set different Wikitext source code language instead.
In T407379#11641290, @A_smart_kitten wrote:(^ re the move to 'To Verify') I can still personally repro this from the instructions in T407379#11287964.