Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.
On github: https://github.com/cscott
See https://en.wikipedia.org/wiki/User:cscott for more.
In T300325#7660824, @matmarex wrote:Thanks. So… should that use 'deduplicateStyles' => false, instead? Because Parsoid already deduplicates styles using a slightly different implementation?
I think this is a won't fix, although I'm open to hearing otherwise. The TOC can have no sections for a number of reasons, including that the author has explicitly requested it (as in this case), or any seen sections are bogus (ie, content model is javascript), etc. Assuming that a page will always have sections seems to be a misunderstanding, and from the above it seems there's already been a fix committed to the affected project.
Yes, that's what all the patches do: split the cache. 800769 splits the cache via the key (parser option) and 883501 forces a split via the parser cache name. The question is how to describe how/why the parser cache is being split, and to do it in a way that doesn't depend on "magical" properties of the code.
I'm having a bit of trouble figuring out a coherent state for the useParsoid option. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/800769 "uses" the useParsoid option on all paths through the parser, which is /safe/ (in terms of avoiding corruption of a single non-forked parser cache) but (as we discussed previously) means that after that patch were to land in production, any new parse of page [[X]] would invalidate all other parses for [[X]] in the parser cache -- ie, if you look at the parsoid version of X then the legacy versions of X get invalidated; if you do a new parse for a spanish-speaking user of [[en:X]] it has the side effect of invalidating the existing english-language parse of [[X]]. (Briefly, this is because the new parse will have 'useParsoid' in the 'used options' where 'useParsoid' is true or false, and this will cause existing entries with 'useParsoid' not in the used-options set to be invalidated.)
Just as a quick add-on -- I think the real blocking child task here is to configure CI (phan?) to enforce some of these guidelines. If we want to have fine-grained stability markers ("some methods in Util are @stable and some are @internal") instead of the big-hammer "everything in this namespace is @stable") then we need some tooling support.
My interpretation would be that @internal is acceptable to use in core, but maybe we should be a little more precise, because it's not clear that we want to use *all* of this stuff indiscriminately in core -- at the very least we ought to mark the things used in core so that we don't 'accidentally' break core by changing them in Parsoid.
I have to make a decision about default case for T204370, but we might want to make this {{#toc}} (aka lowercase) instead of {{#TOC}}. The recommendation from Unicode is to make the word case-sensitive, not case-insensitive, since the rules for case folding are hard to localize well.
This parse contains the literal wikitext:
{{#REDIRECT [[]]}}
for some reason, in the place where a signature would generally appear. The edit was done by an IP user in March 2014:
https://en.wikipedia.org/w/index.php?title=Talk%3AMehath&diff=598632387&oldid=579155148&diffmode=source
using the wikitext source editor, apparently (ie, it is not tagged with Visual Editor or any other editor tag).
@daniel's team should probably be tagged here (I don't know the exact tag to use) as well as DiscussionTools -- seems related to the 'direct parsoid access' patches, but fundamentally seems to be about trying to invoke discussion tools on something which is not actually a talk page?
Should be fixed.
Ah, the 891358 patch should have been listed as depending on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/888116 -- I'd never tested them in isolation and didn't expect one to be merged without the other. We probably need to tag a new parsoid with the miscellaneous test runner fixes in 888116 as well.
Probably caused by the merge of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/891358 at 03:40 Mar 13.
I'm just going to mention composer update here so it show up in my search (since I always forget) and mention https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/612431 as a patch which is blocked on the resolution of this bug.
This bug should be retitled if the actual problem to be solved is "hide template content from search engine snippets".
In T318433#8458276, @Arlolra wrote:In T318433#8434348, @cscott wrote:In T318433#8256434, @Izno wrote:Tangentially (or maybe not), editors should have access to <figure> and <figcaption> in wikitext per my comment in T25932#7070297.
That's a separate debate, one on which I think there is a wide gap in consensus.
Without figure/figcaption literals though, templates won't actually be able to mimic the parser output for the block form. We're back to div soup,
<div class="enwiki-figure"> <a href=""><img src="" /></a> <div class="enwiki-figcaption"></div> </div>
Briefly: "takes a load off /you/" -- I am a big fan of enabling the community to solve problems the WMF can't resource. But I think we should also acknowledge that a large part of /our/ burden (and why it takes so long to do things on wiki in general) is that we have a huge amount of legacy content to port forward with every change, and by Hyrum's Law, every feature and corner case and bug in our implementation *will* get used by the community and *will* cause us additional effort/porting time next time we need to change or fix that.
I'm looking at this.
Not sure why Parsoid is tagged here? @Arlolra might be able to offer some insight w/r/t the media styles; it's possible you can use a simplified version of the stylesheet if you are using *only* the "new style media" output; I think we currently have rules for both legacy media output and new media output enabled because there are transclusion scenarios which might cause 'legacy' markup to appear on 'new' pages.
The Content-Transform-Team never actually worked on ReadingLists, as far as I know. We could use some help here, since we're not familiar with these code bases at all.
But the way to do that is just to reorder the aliases. The whole point of the message in MessagesEn.php is to give the wiki community control over which of the alternatives is used by Visual Editor by ordering the alises in their localization. If you want VE to use english rather than localized messages, just list the english ones first. It's a SHOULD not a MUST.
I'm not sure why this is needed/wanted? The guidance in MessagesEn.php states:
* Note to localisers: * - Include the English magic words as synonyms. This allows people from * other wikis that do not speak the language to contribute more easily. * - The first alias listed MUST be the preferred alias in that language. * Tools (like Visual Editor) are expected to use the first listed alias * when editing or creating new content. * - Order the other aliases so that common aliases occur before more rarely * used aliases. The aliases SHOULD be sorted by the following convention: * 1. Local first, English last, then * 2. Most common first, least common last. * @phpcs-require-sorted-array
It turns out that this issue *is* related to T306862, via a side-effect of calling LanguageConverter::convertTo(). The linked patch above describes the situation in its commit message.
In T306862#8671076, @Jdlrobson wrote:[mediawiki/core@master] WIP: Don't clear LanguageConverter display title when converting ToC
@cscott perhaps we could capture the remaining work in a new ticket?
I don't *think* that parsoid would ever split an existing list in two during selser, so my intuition is that this is probably more likely a DT-side bug w/ the DOM normalization that you are doing.
Ideally we should be using MediaWiki::getTitle() exactly to do title lookup in the REST APIs, instead of trying to (partially) reimplement this.
(T26072 is a much older but perhaps related bug, although it was apparently fixed back in 2010.)
A somewhat cleaner diff:
Which is presently the latest version, aka the same as:
But note that
(aka the "no conversion" output) seem to appear correctly.
@Diskdance That does seem like expected behavior. We'd like to suggest that "best style" is to ensure all conversion rules are at the top of the document (ideally in glossaries, as is typical practice on zhwiki I believe), which avoids this particular discrepancy.
Probably https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/594557 would help, although the summary endpoint needs to actually ask for the content to be converted to zh (the base language` for that patch to help out.
From chat with WMDE:
Great, thanks!
We probably need to set up a conversation with Wikidata/WMDE about whether they are comfortable with representing "no short description wanted" as the empty string in wikidata. If so, then there's a trivial change to https://en.wikipedia.org/wiki/Template:Short_description which will set the short description to '' if none is passed, and @vadim-kovalenko can simply look for '' in the short description API output to indicate that the "add short description" button should be suppressed (or whatever other representation of this situation the wikidata team would prefer). If wikidata /doesn't/ want to represent this case, making it "just" an enwiki feature, then I guess there's no better solution other than to have @vadim-kovalenko look through the parsoid HTML output for the template arguments and match against the english string 'none'. As documented at https://www.mediawiki.org/wiki/Specs/HTML/2.7.0#Template_markup you'd be looking for nodes with document.querySelectorAll('[typeof="mw:Transclusion"]') and then deserializing the data-mw attribute and looking for mw.parts[0].template.target.href === './Template:Short_description' and mw.parts[0].template.params["1"].wt === 'none' -- which is pretty hacky and ugly. A regexp on the raw wikitext is also possible and also ugly. Probably "better" would be to update [[Template:Short description]] to add pages with 'none' to a special category, and then we can check for that category instead of trying to pull out the template arguments.
@vadim-kovalenko points out that https://en.wikipedia.org/api/rest_v1/page/mobile-html/User%3Acscott%2FTogetherJS.js "works" in production -- which apparently is just because production is still pointed to RESTBase while the development instance is pointed to the core APIs which are expected to replace RESTBase. So once that switchover is complete, apps will start displaying the "dummy content" message instead of any output, which is a bug in itself likely. Generally apps should be using the same mechanism for "non wikitext content" pages as it does for (eg) Special pages. The question is how apps should know that a given Title corresponds to a non-wikitext content type.
The content on the app doesn't seem to be generated by the mobile html service, as the mobile html service shows the output that @vadim-kovalenko shows above and returns 400 for these pages (T324711). So this is probably not a mobile html service bug but instead a bug in the app itself, specifically whatever fallback path the app is using to generate content for these pages after the mobile html service returns 400. (I note that it is rendering the javascript "as if it were wikitext" as well, so there are bigger problems here than "just" the presence of the edit button.)
A free external link is an unbracketed url in wikitext; it is grouped with the other unbracketed link types (RFC/PMID/ISBN) in Parser::handleMagicLinks().
Please visit https://cscott.net <!-- free external link -->
There's a related conversation about tools for managing deploys and rollbacks -- it would be useful to have robust tools for purging the cache of content generated by the 'rolled back revision' when a rollback is needed during a deploy.
https://en.wikipedia.org/wiki/User:Cscott/TogetherJS.js is a user script on my page, it should be protected? And I think that https://en.wikipedia.org/wiki/User:GhostInTheMachine/SDsummary.js is the link to the protected page from @vadim-kovalenko's example.
This should ride the train this week; added User-notice to get a mention on tech news. If someone could post in the zhwiki village pump that would be helpful!
This was fixed and then reverted, so there's some work to do here still.
I think the subtask T329067 covers Parsoid collecting and propagating this information via ParserOutput (or parsoid's own name for that, ContentMetadataCollector). /This/ task probably should cover if or whether to expose some of this metadata in the PageBundle API. Generally post of the metadata collected in ParserOutput probably doesn't belong in the page bundle (?) but I think there's a reasonable argument that cache lifetime in particular is likely to be of interest to anyone consuming the content from the REST APIs.
I think I fixed this via the subtask, but it should be tested -- I don't know of any explicit checks in core for this (and there are some related patches pending in core, esp T320668 / https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569628
This is incorrect -- the page language is *never* a variant. Only the user interface language can be a variant.
Might be worth double-checking language-variant redirects as well.
This is blocked because JsonCodec is core-only, and can't be used from the Parsoid library without a cyclic dependency. So this task is effectively blocked until JsonCodec is brought out of core. There's a proposal for doing this at https://github.com/cscott/json-codec but I'm having trouble getting consensus on what the library should look like.
In core this is handled via LanguageConverter::findVariantLink(), which is invoked early in the web request dispatch, in MediaWiki::parseTitle(), and in the action API endpoint, in ApiPageSet::processTitlesArray(). IMO it should be done in a top-of-stack manner in the REST apis as well.
We should be using the Wikidata api to get short descriptions, instead of trying to look for a specific template (or magic word).
So the state of play is that we need Ed's patches deployed, and then we can make the config change to turn on native galleries. My understanding is that we've shipped nativeGalleriesEnabled=>true as the default configuration for "quite a while" even though it's not enabled in production everywhere quite yet.
Merged and deployed in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Kartographer/+/888754
Merged & deployed in:
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ReadingLists/+/888038
The patch was applied to page bundles, but it appears the bug is with the summary endpoint? I'm a little confused. AIUI the title redirection should happen in core, and in common for all endpoints, and it's not clear a patch to do that has landed yet.
In T329170#8607072, @thiemowmde wrote:Possible plan:
- Add backwards compatibility code that keeps the old hashes working for a while.
- Note we plan to change the hashes anyway because of parse time expansion. So we need this either way.
- Make the hashes stable, only based on the raw wikitext, but not on language variant and whatnot.
- We are now in a situation where users will see maps with the wrong language variant, wrong thumbsize and whatnot. This is already a massive improvement compared to the broken maps from before.
- Make sure the mapdata API is called with all necessary information to be able to deliver the map in the correct variant.
Ok, https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/889641 is merged to mediawiki-vendor and https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/889607 is the cherry-pick to the wmf.23 branch. We'll leave it to ops' discretion whether they want to merge that the wmf.23 and deploy the backport, or leave that cherry-pick unmerged and just suppress the errors.
I'm starting the patch-and-tag-and-release-to-vendor process with https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/889637 and we'll get that merged to mainline mediawiki-vendor, then I'll post here and leave it up to you all on ops whether you want to backport that mediawiki-vendor patch to wmf.23.
The spike seems related to the transition between wmf.22 and wmf.23, perhaps? There are logs like https://logstash.wikimedia.org/goto/a21be3de9c29c8905f2621f63dbb0c92 which are against wmf.22 and refer to properties which were removed from wmf.23, so it's possible the front end machines saw an inconsistent vendor directory briefly during the deploy?
As a quick-and-dirty backport fix we could also just comment out the call to ::computeSectionMetadata() too, couldn't we? That information isn't used by anything yet. That would give us more time to do a proper fix incl test case for the next train.
The Content-Transform-Team has continued to work on the Table of Contents. Potentially risky changes:
I think you just need to composer update to get the latest Parsoid version, that should stop the warning for you.