Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
I think it's high time that the Parser/Linker maintain a list of interface-reserved prefixes (like n-, p-, and mw-), as well as a (short) list of legacy IDs (such as footer), that are automatically mapped to a different name to avoid clashes with interace styles.
For example, by prepending it with h- for heading, or something like that. For compatibility this would of course be limited only to where it is causing potential conflicts. Doing this for the other 99.9% of headings is out of scope for this task.
\Wikimedia\DoDo maybe? Makes "DOm DOcument" clearer? OTOH, maybe reads as the repeated imperative "do do" instead of the bird.
Can we open a new phab task for this? I apologize for not noticing/flagging this earlier. There are a number of tasks already in phab to deprecate and remove the old mediawiki codes (including sr-ec, sr-el, etc) and it would be a significant step backwards to have the old names written into article wikitext, which would require manually updating all that wikitext in the future.
Not necessarily going to work on this immediately (I've got higher-priority parser tests tasks) but since I added the GetLinkColors hook to core/Parsoid I'll provisionally claim this task.
@GWicke's idea about putting the "document identity" in the CSS is interesting, so that a link could be styled as a self-link (or not) depending on the CSS that is applied to it.
I think addDBDataOnce is more fundamentally broken, and shouldn't be used.
Related Q: how can we make code CI run your test suite so that it doesn't just break Parsoid CI? Core CI *does* run some tests in a mode where Parsoid is installed -- can we add your tests to that group?
Yes, the Parser test runner setup creates its own interwiki table (using wgInterwikiCache) so that test results are not dependent on the host wiki configuration.
The local/global/site interwiki tables are implemented in the CDB caching, that's not expected to change.
Some comments left on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/617294 -- see if you can determine if the $deps array is correct or not.
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/649755 is my recommended fix here. It's been waiting for review for a while.
Yeah, this is a bug in the lua code. I've attempted to contact the author: https://fr.wikipedia.org/w/index.php?title=Discussion_module%3ACoordinates&type=revision&diff=177884216&oldid=173976505
I strongly suspect that someone is converting "-71.3" degrees to "71.3 S" by chopping off the first *byte*, instead of the first *character*.
The unicode minus sign is from formatnum -- it shouldn't be getting chopped up into bad UTF-8, unless someone somewhere it doing a naive substr(1, ...) or something like that. I'll look.
I bet something like __NO_P_WRAP__ would be fairly easy to support. Would it get enough adoption to get us closer to our goal of turning it off by default?
But it looks like this isn't necessary, it happens already (as long as you don't hit the window between branch cut and branch commit merge). I don't know why/how, but it doesn't look like 618808 is required.
Oh, and Special:Version also gives credit to all of our active developers, which has a social importance which shouldn't be underestimated.
I use Special:Version as an active mediawiki developer all the time. Just sayin'.
See T270444: Parsoid needs a bidirectional interwiki map (and hooks) -- this mapping would have to be bidirectional to support Parsoid.
Any chance this is going to be taken up again?
Agreed!
Note that core already does a lot of 'expensive' startup work wrt loading extensions/etc on every request. So it doesn't really make sense to super-optimize this when we're still sitting behind core startup. Although latency is certainly additive, we should get a quantitative sense for what percentage of the request startup time Parsoid is responsible for.
I don't particularly like option 4, because I think it's a little too 'magical'. It covers up the NFC normalization under selser, which makes it less likely to cause dirty diffs (good!) but more surprising when the same bug creeps into edited HTML. But maybe defense in depth is warranted. I think option 2 is necessary because I think there are plenty of cases where the action API should *not* be trying to normalize the input string -- just immediately adjacent to the area changed in option 1 we see an attempt to pass *compressed* HTML to ApiVisualEditorEdit. I'm sure that ran into all sorts of mysterious problems because the binary deflated string was being NFC normalized...
^ this pair of patches implements "option 2" above.
Option 1: ^ the above is one possible fix here.
This is a very strange bug:
>>> json_encode(Validator::cleanUp("abc\u{2001}\u{2003}")) => ""abc\u2003\u2003"" >>> json_encode(Validator::NFD("abc\u{2001}\u{2003}")) => ""abc\u2003\u2003""
And of course since those are invisible characters, I need to look *real close* to see where that happened...
(but not all of them are invisible characters, I thought ed's test case had an omega...)
Oh, and indeed it has changed: the srcContent is \342\200\201 but the actual span contents are \342\200\203. How did *that* happen, I wonder?
Getting there (slowly):
OLD DOM:
<p id="mwAw" data-parsoid='{"dsr":[17,343,0,0]}'>Vi kan förvänta oss att bilden är komplicerad när det gäller huruvida individer från Göteborg har ett tionde fonem, och i vilka ord de i så fall uttalar med ett tionde fonem. Det kan finnas infödda människor med arbetaryrken som uttalar många typiska <i id="mwBA" data-parsoid='{"dsr":[279,285,2,2]}'>ô</i><span typeof="mw:Entity" id="mwBQ" data-parsoid='{"src":"&#x2011;","srcContent":"‑","dsr":[285,293,null,null]}'>‑</span>ord med en regional form av <i id="mwBg" data-parsoid='{"dsr":[321,327,2,2]}'>å</i><span typeof="mw:Entity" id="mwBw" data-parsoid='{"src":"&#x2001;","srcContent":" ","dsr":[327,335,null,null]}'> </span>fonemet.</p>
After DOM diff:
<p id="mwAw" data-parsoid='{"dsr":[17,343,0,0]}' data-parsoid-diff='{"id":4946,"diff":["subtree-changed"]}'>Vi kan förvänta oss att bilden är komplicerad när det gäller huruvida individer från Göteborg har ett tionde fonem, och i vilka ord de i så fall uttalar med ett tionde fonem. Det kan finnas infödda människor med arbetaryrken som uttalar många typiska <i id="mwBA" data-parsoid='{"dsr":[279,285,2,2]}'>ô</i><span typeof="mw:Entity" id="mwBQ" data-parsoid='{"src":"&#x2011;","srcContent":"‑","dsr":[285,293,null,null]}'>‑</span>ord med en regional form av <i id="mwBg" data-parsoid='{"dsr":[321,327,2,2]}'>å</i><span typeof="mw:Entity" id="mwBw" data-parsoid='{"src":"&#x2001;","srcContent":" ","dsr":[327,335,null,null]}' data-parsoid-diff='{"id":4946,"diff":["children-changed","subtree-changed"]}'><meta typeof="mw:DiffMarker/deleted" data-parsoid="{}"/> </span>fonemet.</p>
Everything looks good, but selser is marking the entity as deleted for some reason.
In my local test w/ RESTBase, I got this:
I can confirm this is necessary to edit page titles containing slashes (whether they are subpages or not). I've added the apache information to the main VE configuration section: https://www.mediawiki.org/w/index.php?title=Extension:VisualEditor&type=revision&diff=4285160&oldid=4258839&diffmode=source
In T47096#6688455, @abi_ wrote:Thanks for your work, left a comment on the patch.
From an accessibility standpoint, there may be reasons to emit a descriptive <figcaption> even if it is not visible to a sighted user.
T118517: [RFC] Use <figure> for media, coming soon to a wiki near you.
Opened T270116: Figures should support `inline-start` and `inline-end` alignments in addition to `left` and `right`. for the general issue of supporting start and end as image alignment options.
I *think* what we should be doing is adding a class like mw-align-start instead of choosing left or right in the Linker. That would be float: inline-start, which could be simulated with:
body[dir=ltr] .mw-align-start { float: left } body[dir=rtl] .mw-align-start { float: right }
We already have the GetLinkColors hook, called from LinkHolderArray, which Disambiguator uses to add the appropriate class.
Probably that hook is sufficient, we just need to restructure how the Parsoid DataAccess works. This would still require Disambiguator-specific information in the Parsoid 'API' backend, but that's probably reasonable.
We're probably getting to the same place from different directions: you're adding the media options to LST, I'm adding LST-like transclusion abilities to media. But yeah, that's the basic idea one way or the other. Key point is to specify the semantics rather than just add HTML tags onto the whitelist.
Here's a strawman proposal, just to wrap up the discussion for the moment: we have a float and size mechanism for media, which uses <figure>. I'd be interested in thinking about how we might add 'text' as a different sort of 'media'. You could imagine syntax like: {{Text:/Foo|aside|left}} (which maybe would include text from PageName/Foo) which would set the proper wrapper tag (<aside>), role, and styling.
I think <aside> like <section> is arguably part of the skin / meta-layout, not part of the article content. I've added lots of HTML5 elements to the whitelist, but I'd lean towards declining this one for now -- wikitext doesn't have a good page layout mechanism (although there are phab tasks for this, eg T90914: Provide semantic wiki-configurable styles for media display). It seems like a future page layout mechanism might want to generate <aside> itself, which would be complicated if we allowed wikitext to contain those tags directly.
@abi_ the latest version of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/617294 should be ready to go; can you verify that it will work for your patch set?
Currently if you comment check experimental on a gerrit patch it will run npm run api-testing.
So we'll deploy -a19 to group0 with the usual train at 2000 UTC, and verify that Parsoid -a19 at least doesn't crash and burn and break -group0 before we then backport -a19 early to group1 and group2 in the backport window 2 hrs later at 0000 UTC. Does that timing work? If not, we can do the backport immediately after the train deploy, but we would like to see -a19 live on group0 at least for smoke testing before we go ahead and push it to all prod machines.
Ok, adding a patch to tonight's backport window which should resolve the issue (by early-deploying Parsoid -a19).
In T259832#6603251, @cscott wrote:Not sure why this didn't work for wmf.4 -- maybe it was another case where our cherry-pick landed between the time the branch was cut and the branch commit merged, and so the automatic update didn't work properly.
\Wikimedia\DoDo\Document ? Or \Wikimedia\DODO\Document?
In T269508#6674241, @matmarex wrote:Alternatively, maybe it's already possible to do this if you hard-code the HTTP username and password into the URL configured in $wgVirtualRestConfig['modules']['parsoid']['url']?
I'm boring. I suggested \Wikimedia\DOM\Document, and calling it just the "Wikimedia DOM library" or something like that.
In Parsoid this is WTUtils::isRenderingTransparentNode(), which seems to include:
@Esanders suspects that when they parse the page using Remex they are somehow losing the entity. I'm not convinced, but ed's going to try to trace the html into and out of discussion tools to figure out more precisely what's going on.
Foo {{category template}} becomes <p>Foo</p><p><meta....></p> but in a comment :Foo {{category template}} puts the <meta> tag inside the list item as expected?
@tstarling left a comment saying he was fine with my approach on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/617294 so it looks like I just have to update that patch and get it merged.