Page MenuHomePhabricator

cscott (C. Scott Ananian)
Parser whisperer

Projects (15)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (400 w, 3 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Today

cscott updated subscribers of T293512: ParserOutput::getText() should be removed from ParserOutput.

More discussion from slack:

@daniel By the way, I'm inclined to replace ?stash=true with a format, html_for_editing or some such. Or we could have both: html_for_editing has parsoid data, but if you combine it with stash, it has IDs. The stash param would do nothing with normal /html.
@cscott I think my preference is for a "format" option, because I think there will be (eventually) lots of different post processing combinations.
@cscott (Although I'd /like/ to avoid that.)
@cscott for example, "mobile HTML" is a thing
@cscott and "TOC included, yes/no" is already a thing
@cscott basically all the options in ParserOutput::getText() and then some are probably going to show up in the API endpoint at some point.
@cscott although one hopes that they are all coming from the same cached content/ParserOutput and are just various postprocessing done on it
@daniel ParserOutput::getText needs to be factored out urgently, before it accumulates more logic
@cscott Yes, I've even got a phab task for it
@cscott https://phabricator.wikimedia.org/T293512
@cscott any bikeshedding you want to do about the name/location/etc of the future "post processing parserOutput" class/hierarchy would be welcome. I've speculated that the Content hierarcy might be a good place, but I'm not really familiar with everything that lives there now.
@cscott I think my vague idea is that we might have a small number of different "format" options like "read", "edit", "mobile", "discussiontools" (!) which each map internally into a set of postprocessing options/steps like "discardDataParsoid", "no-toc", etc, to avoid exposing the entire postprocessing chain as part of the API.
@daniel Yes, that sounds reasonable.
@cscott The other option is just to allow an extensible number of "transforms" and ask for them explicitly. That is a bit more future-proof, in that it allows you to (say) transition the mobile app to no-toc content by having the latest version of the app explicitly request options=mobile,no-toc or whatever, while not breaking older versions of the app which expect toc inline in the HTML.
@daniel I'd not put the post processing in the content hierarchy. I'd prefer them to be orthogonal. The same post-processing could be applied to output generated from various kinds of content.
@daniel The thought experiment I'd make is "if someone wrote extensive markdown support for mediawiki, would they want this transformation to work on their output"?
@cscott maybe something under MediaWiki\Parser then?
@cscott Red link tagging is a good thought experiment, you'd probably want that to work on markdown.
@cscott and it's specific to mediawiki & requires database access, eg, so fits better in core than in Parsoid.
@daniel I'd want variant conversion to work on markdown too.
@daniel In my mind, these transformations have nothing to do with wikitext. They are html-to-html transformations. They are independent of the content type, and should be oblivious of whatever generates the HTML, or what from.
@cscott "strip infoboxes" is something that happens in mobile HTML, I guess that's something you'd do for markdown as well? It's really an artifact of the fact that we never fully pursued MCR so the infoboxes are still "inline" with the main article content instead of being a separate content type.
@daniel Not sure MCR would remove infoboxes completely from the content. The layout gets tricky if you want to do it generically, especially if there are many kinds of infobox on the same page
@cscott (i'll just say that "page layout" wants to be an MCR type of its own.)
@daniel I'd probably start a new top level namespace for "output transformation".
@cscott MediaWiki\OutputTransform ?
@daniel MediaWiki\OutputTransform sounds good to me
@ssastry ( oh ...lots of dicsussion here ... will catch up later) .. but MediaWiki\html2html is less cumbersome?
@ssastry but, i suppose that ties it to html as a format.
@cscott (it's also a bit tricky by the specific way we use "html" and "dom" in parsoid, where "html" is usually "html as a string" and "dom" is "html as a parsed DOM tree"; to be consistent I'd hope this would be "dom2dom")
@ssastry I will also add another variable into this space ... does a html2html transform always have to live as core code? Are they / can they be "micro"services?
@daniel html2html would also work for me. I want it to be tied to HTML as a format.
@daniel Other kinds of transformations are conceptually different.
@daniel DOMTransform also works for me. Anything that doesn't bind it to wikitext or parsing :slightly_smiling_face:
@ssastry scott dom/html is a detail can be combined in htm2html in my view.
@cscott the word "output" is more-or-less specific in mediawiki: ParserOutput/OutputPage/etc.
@daniel ParserOutputTransform would be nice and specific, but maybe overly so.
@cscott the api question is just "does the set of postprocessing transforms want to be exposed as a limited number of 'format's, or as an infinitely-extensible set of boolean flags"
@daniel for wikidata rdf exports, we invented a system of "flavors" for this purpose
@cscott (subbu doesn't like ParserOutputTransform, but I will submit that it is extremely meaningful to a Mediawiki core developer, and it would be 100% clear that ParserOutput::getText() should live there.)
@daniel I agree that it's both very clear and very ugly ๐Ÿ˜‰
@cscott yes, it shares the "very ugly" (but not the "very clear") with most parts of core. :slightly_smiling_face:
@daniel hehe...
@cscott My gut feeling is that "format" should refer to "cachable or database" things, eg wikitext from the db, html from the cache, future-expensive-to-recompute-derived-content-type.
@cscott and some other word should be used for the set of transformations you can make to that 'format'. I like 'flavor', fwiw.
@cscott format=html,flavor=edit or format=html,flavor=mobile,edit
@cscott those dimensions seem to make intuitive sense

Fri, Jun 24, 10:48 PM ยท MediaWiki-Parser, Parsoid
cscott updated subscribers of T293512: ParserOutput::getText() should be removed from ParserOutput.
Fri, Jun 24, 10:30 PM ยท MediaWiki-Parser, Parsoid
cscott updated subscribers of T293512: ParserOutput::getText() should be removed from ParserOutput.

Some discussion copied from slack:

@cscott: The key refactor there is ParserOutput::getText() https://phabricator.wikimedia.org/T293512 which doesn't belong there -- it does a bunch of content post-processing that shouldn't belong in a "dumb array of keys".
@ihurbain: and where would that belong, ultimately?
@cscott: Not sure! The phab task speculates "somewhere in the Content* hierarchy"? But maybe we'll want a Postprocess.php class on par with Parser.php? Or else this goes into OutputPage, which generally contains that sort of thing. That's a big question mark. But it's definitely "not in ParserOutput".

Fri, Jun 24, 10:26 PM ยท MediaWiki-Parser, Parsoid

Yesterday

cscott created T311274: Graph extension should use modern Hooks style.
Thu, Jun 23, 9:00 PM ยท MediaWiki-extensions-Graph
cscott added a comment to T310197: Move editing toolbar below page toolbar.

See also: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+blame/refs/heads/REL1_38/includes/parser/Sanitizer.php#567

Thu, Jun 23, 4:39 PM ยท MW-1.39-notes (1.39.0-wmf.18; 2022-06-27), Patch-For-Review, Readers-Web-Backlog (Kanbanana-FY-2021-22), Desktop Improvements

Wed, Jun 22

cscott added a comment to T309660: VisualEditor not working under fresh install of mediawiki-1.37.2 when HTTP is not enabled.

Should be fixed in MW1.39 due to T305108: Zero Config Install of VE + Parsoid for MW 1.39.

Wed, Jun 22, 3:18 PM ยท MW-1.39-release, Editing-team (Tracking), VisualEditor
cscott added a subtask for T305108: Zero Config Install of VE + Parsoid for MW 1.39: T269508: VisualEditor gives 401 when behind basic auth.
Wed, Jun 22, 3:18 PM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), Editing-team (Tracking), Parsoid (Third-party), MW-1.39-release, VisualEditor
cscott added a parent task for T269508: VisualEditor gives 401 when behind basic auth: T305108: Zero Config Install of VE + Parsoid for MW 1.39.
Wed, Jun 22, 3:18 PM ยท Parsoid (Tracking), VisualEditor
cscott added a subtask for T305108: Zero Config Install of VE + Parsoid for MW 1.39: T309660: VisualEditor not working under fresh install of mediawiki-1.37.2 when HTTP is not enabled.
Wed, Jun 22, 3:17 PM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), Editing-team (Tracking), Parsoid (Third-party), MW-1.39-release, VisualEditor
cscott added a parent task for T309660: VisualEditor not working under fresh install of mediawiki-1.37.2 when HTTP is not enabled: T305108: Zero Config Install of VE + Parsoid for MW 1.39.
Wed, Jun 22, 3:17 PM ยท MW-1.39-release, Editing-team (Tracking), VisualEditor
cscott added a comment to T269508: VisualEditor gives 401 when behind basic auth.

This shouldn't be an issue in MW 1.39, which doesn't require a separate REST request for zeroconf VE: T305108: Zero Config Install of VE + Parsoid for MW 1.39

Wed, Jun 22, 3:17 PM ยท Parsoid (Tracking), VisualEditor

Tue, Jun 21

mpopov awarded T149667: Amazing Article Annotations a Baby Tequila token.
Tue, Jun 21, 3:43 PM ยท Wikispeech-Text-to-Speech, Parsing-Team--ARCHIVED, Cite, VisualEditor, MediaWiki-extensions-Translate, Wikimedia-Developer-Summit (2017)

Fri, Jun 17

cscott added a comment to T153801: File and global user pages should not be redirected.

It seems like we're not *necessarily* saying that the redirect shouldn't happen, but that instead perhaps it ought to include a language parameter? T301372#8011652 discusses other places where the redirect ought to depend on the user language.

Fri, Jun 17, 4:08 PM ยท Platform Team Legacy (Later), Patch-For-Review, Product-Infrastructure-Team-Backlog, Services (next), Parsoid, RESTBase, Mobile-Content-Service
cscott added a comment to T301372: Core HTML REST API should follow redirects.

Global user pages are another instance which *perhaps* should be handled via the generic redirect service, despite the title of this task: T153801: File and global user pages should not be redirected. (I think the title is mostly referring to the fact that the language settings should be taken into account when the redirect is done, which was mentioned in the comment above.)

Fri, Jun 17, 4:06 PM ยท Platform Team Workboards (MW Expedition), Code-Health-Objective, VisualEditor, Platform Engineering Roadmap
cscott added a comment to T69486: Links: Add support for self-links to Parsoid.

I think current plan is to do this in the redlink-marking pass (or in a new similar postprocessing pass) so that we can reuse core parsoid html as fragments in other pages, but the selflinks are replaced "at the last minute".

Fri, Jun 17, 2:09 PM ยท Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Growth-Team-Filtering, Parsoid-Rendering, Growth-Team, StructuredDiscussions, Parsoid
cscott added a comment to T301372: Core HTML REST API should follow redirects.

Note that the redirects functionality in RESTBase has some known bugs as well. There are the following separate aspects to how redirects "should" work:

Fri, Jun 17, 1:47 PM ยท Platform Team Workboards (MW Expedition), Code-Health-Objective, VisualEditor, Platform Engineering Roadmap
cscott added a subtask for T277824: [EPIC] Language variant issues: T69486: Links: Add support for self-links to Parsoid.
Fri, Jun 17, 1:41 PM ยท Epic, Wikipedia-iOS-App-Backlog
cscott added a parent task for T69486: Links: Add support for self-links to Parsoid: T277824: [EPIC] Language variant issues.
Fri, Jun 17, 1:41 PM ยท Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Growth-Team-Filtering, Parsoid-Rendering, Growth-Team, StructuredDiscussions, Parsoid
cscott added a comment to T182351: Make HTML dumps available.

HTML dumps are already available in https://dumps.wikimedia.org/other/enterprise_html/ ; see also T302237: Outreachy Project (Round 24): Build Python library to work with html-dumps.

Fri, Jun 17, 1:35 PM ยท Research, Analytics-Radar, Datasets-Archiving
cscott added a comment to T182351: Make HTML dumps available.

This has been requested by the kiwix team multiple times over the years. Hopefully this would be parsoid-format HTML dumps.

Fri, Jun 17, 1:28 PM ยท Research, Analytics-Radar, Datasets-Archiving

Wed, Jun 15

cscott added a comment to T307251: [ToC] Show new/modified sections after publishing an edit (new floating ToC).

I think this part of the task can be unblocked, as we've got some consensus around the ParserOutput portion of the task I think. We need a patch which sets the ParserOptions flag to finish the work on T307691, and I've got an initial version here: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/805897

Wed, Jun 15, 9:15 PM ยท Patch-For-Review, VisualEditor, Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2021-22)

Tue, Jun 14

cscott added a project to T273505: Merge mw:Image|mw:Audio|mw:Video into a single mw:File: User-notice.
Tue, Jun 14, 5:54 PM ยท User-notice, MW-1.39-notes (1.39.0-wmf.17; 2022-06-20), Parsoid-Read-Views (Phase 0 - Parsoid-Media-Structure), Parsoid, Parsoid-Media-Structure, Patch-For-Review

Mon, Jun 13

cscott created T310544: Ensure MobileFrontend works with Parsoid read views for discussion tools.
Mon, Jun 13, 7:41 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Editing-team (Tracking), Parsing-Active-Work, DiscussionTools, Parsoid
cscott created T310526: Parsoid read views doesn't support -{T|...}- page title markup.
Mon, Jun 13, 4:59 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Editing-team (Tracking), Parsing-Active-Work, DiscussionTools, Parsoid
cscott added a subtask for T310512: Parsoid and the legacy parser should emit exactly the same ParserOutput metadata: T305159: Improve coverage of the ContentMetadataCollector interface.
Mon, Jun 13, 4:45 PM ยท Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core), Parsoid
cscott added a parent task for T305159: Improve coverage of the ContentMetadataCollector interface: T310512: Parsoid and the legacy parser should emit exactly the same ParserOutput metadata.
Mon, Jun 13, 4:45 PM ยท Patch-For-Review, Parsoid
cscott created T310520: Parsoid content not compatible with `index.php?title=` URLs.
Mon, Jun 13, 4:00 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Editing-team (Tracking), Parsing-Active-Work, DiscussionTools, Parsoid
cscott updated the task description for T310512: Parsoid and the legacy parser should emit exactly the same ParserOutput metadata.
Mon, Jun 13, 3:29 PM ยท Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core), Parsoid
cscott edited projects for T310512: Parsoid and the legacy parser should emit exactly the same ParserOutput metadata, added: Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core); removed Parsoid-Read-Views (Phase 1 - DiscussionTools support).
Mon, Jun 13, 3:04 PM ยท Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core), Parsoid
cscott created T310512: Parsoid and the legacy parser should emit exactly the same ParserOutput metadata.
Mon, Jun 13, 3:03 PM ยท Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core), Parsoid
cscott created T310511: Metadata comparison testing between Parsoid and the legacy parser.
Mon, Jun 13, 3:01 PM ยท Parsoid-Read-Views (Phase 4 - Parsoid generates metadata needed by core), Parsoid
cscott updated the task description for T297840: Populate ParserCache with Parsoid output as a post-edit hook.
Mon, Jun 13, 3:00 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid
cscott created T310510: Discussion tools visual diff testing between parsoid read views and legacy read views.
Mon, Jun 13, 2:59 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid

Fri, Jun 10

cscott added a comment to T206765: config-mysql-old not being substituted by update.php.

Regression in 1.39: T310378: Messages don't get localized in the updater.

Fri, Jun 10, 4:34 PM ยท Patch-For-Review, MW-1.31-release-notes, MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), MW-1.31-release, MW-1.32-release, MediaWiki-Installer
cscott created T310378: Messages don't get localized in the updater.
Fri, Jun 10, 4:32 PM ยท MediaWiki-Internationalization, I18n, MediaWiki-Installer, Patch-For-Review
Debber1 awarded T305108: Zero Config Install of VE + Parsoid for MW 1.39 a Love token.
Fri, Jun 10, 5:34 AM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), Editing-team (Tracking), Parsoid (Third-party), MW-1.39-release, VisualEditor

Thu, Jun 9

Bertvandepoel awarded T305108: Zero Config Install of VE + Parsoid for MW 1.39 a Party Time token.
Thu, Jun 9, 9:35 PM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), Editing-team (Tracking), Parsoid (Third-party), MW-1.39-release, VisualEditor
cscott created T310283: Performance: improve speed of SiteConfig creation.
Thu, Jun 9, 1:56 PM ยท MW-1.39-notes (1.39.0-wmf.16; 2022-06-13), Parsoid

Tue, Jun 7

cscott added a comment to T306112: Cloud VPS "wikitextexp" project Stretch deprecation.

(off topic: why does it say "developer account not linked to phabricator" by my name? I'm @cscott here... is there some hidden preference I need to update to make that correspondence?)

Tue, Jun 7, 8:58 PM ยท Parsoid, Cloud-VPS (Debian Stretch Deprecation)
cscott created T310083: ToC issues.
Tue, Jun 7, 3:53 PM ยท MW-1.39-notes (1.39.0-wmf.17; 2022-06-20), Patch-For-Review, MediaWiki-Parser
cscott added a comment to T306876: Android app edit summary becomes URL encoded.

The bug seems to be in lib/transformations/wrapSections.js in mobileapps, which is trying to recreate section information from the legacy HTML output.
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/mobileapps/+/refs/heads/master/lib/transformations/wrapSections.js#33

const makeSection = ( heading, id, text ) => {
	const headingLabel = heading ? heading.querySelector( 'span[id]' ) : null;
	return {
		id,
		anchor: headingLabel ? headingLabel.getAttribute( 'id' ) : '',
		toclevel: heading ? getTocLevel( heading ) : undefined,
		line: heading ? heading.textContent : undefined,
		text
	};
};

I'm pretty sure the heading.querySelector( 'span[id]' ) should be heading.querySelector( 'span.mw-headline[id]' ) or something like that.

Tue, Jun 7, 3:20 PM ยท Product-Infrastructure-Team-Backlog (Kanban), Wikipedia-Android-App-Backlog, Chinese-Sites
cscott added a comment to T306876: Android app edit summary becomes URL encoded.

This appears correctly on desktop, eg: https://zh.wikipedia.org/w/index.php?title=2011%E5%B9%B4%E5%A4%A7%E8%A5%BF%E6%B4%8B%E9%A3%93%E9%A3%8E%E5%AD%A3%E6%97%B6%E9%97%B4%E8%BD%B4&action=history

Tue, Jun 7, 3:03 PM ยท Product-Infrastructure-Team-Backlog (Kanban), Wikipedia-Android-App-Backlog, Chinese-Sites

Mon, Jun 6

cscott added a watcher for Parsoid-Read-Views: cscott.
Mon, Jun 6, 3:06 PM

Thu, Jun 2

cscott added a comment to T91154: {{=}} should be a parser function.

Thanks so much @AlexisJazz !

Thu, Jun 2, 8:43 PM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), MediaWiki-Installer, MW-1.35-notes (1.35.0-wmf.40; 2020-07-07), MediaWiki-Page-editing, User-notice, MediaWiki-Parser
cscott added a comment to T302237: Outreachy Project (Round 24): Build Python library to work with html-dumps.

The Content-Transform-Team maintains the HTML format specifications (informally "Parsoid HTML" as opposed to the HTML currently displayed on the web site), and may be a useful resource for questions about (eg) how templates are represented in the HTML dump. Without distracting too much, the following projects might be an inspiration for how an "easy to use" API might look:

The Kiwix project also uses "Parsoid HTML" format dumps: https://www.kiwix.org/en/

Thu, Jun 2, 4:13 PM ยท Research (FY2021-22-Research-April-June), Outreach-Programs-Projects, Outreachy (Round 24)

Wed, Jun 1

cscott added a comment to T91154: {{=}} should be a parser function.

@AlexisJazz thanks, the tech news entry looks good. I'd probably add another sentence or two mentioning that this has been "in process" since July 2020 (maybe link to the previous tech news entries) and giving credit to the many editors who helped make this happen: https://meta.wikimedia.org/wiki/Equals_sign_parser_function_template_conflicts

Wed, Jun 1, 4:37 PM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), MediaWiki-Installer, MW-1.35-notes (1.35.0-wmf.40; 2020-07-07), MediaWiki-Page-editing, User-notice, MediaWiki-Parser

Tue, May 31

Jdlrobson awarded T293513: Deprecate and remove ParserOutput::setTOCHTML() a Like token.
Tue, May 31, 3:34 PM ยท MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), Platform Team Workboards (MW Expedition), Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Parsoid
Jdlrobson awarded T293513: Deprecate and remove ParserOutput::setTOCHTML() a Dislike token.
Tue, May 31, 3:34 PM ยท MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), Platform Team Workboards (MW Expedition), Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Parsoid
cscott added a comment to T307251: [ToC] Show new/modified sections after publishing an edit (new floating ToC).

I think this should be handled similarly to how we handle categories, specifically:

  • Reading web should add a tochtml param to action=parse (like categorieshtml), that returns the ToC HTML iff the ToC is not present in the content (i.e. in Vector-2022)
  • When VE sees data in tochtml result, we can render it to a specific skin element (again, like categorieshtml).
  • wikipage.content hook fires and Vector adds interactivity back
Tue, May 31, 3:23 PM ยท Patch-For-Review, VisualEditor, Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2021-22)
cscott added a subtask for T293513: Deprecate and remove ParserOutput::setTOCHTML(): T218330: Table of contents HTML may be unbalanced.
Tue, May 31, 3:20 PM ยท MW-1.38-notes (1.38.0-wmf.5; 2021-10-19), Platform Team Workboards (MW Expedition), Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Parsoid
cscott added a parent task for T218330: Table of contents HTML may be unbalanced: T293513: Deprecate and remove ParserOutput::setTOCHTML().
Tue, May 31, 3:20 PM ยท Parsing-Team--ARCHIVED, MediaWiki-Parser

Fri, May 27

cscott added a comment to T307691: JS pages show ToCs in vector-2022.

Could someone clarify the code path here? We're rendering as Wikitext and generating a ParserOutput for category/backlink/etc purposes, then throwing away the generated HTML and rendering the code as preformatted text (or some such) instead?

Fri, May 27, 4:00 PM ยท Readers-Web-Backlog, MediaWiki-Parser, Patch-For-Review, Desktop Improvements

May 25 2022

matmarex awarded T305108: Zero Config Install of VE + Parsoid for MW 1.39 a Party Time token.
May 25 2022, 1:56 AM ยท MW-1.39-notes (1.39.0-wmf.14; 2022-05-30), Editing-team (Tracking), Parsoid (Third-party), MW-1.39-release, VisualEditor

May 24 2022

cscott updated the task description for T308013: Assign SPDX headers to puppet.git.
May 24 2022, 4:07 PM ยท Patch-For-Review, Infrastructure-Foundations, SRE

May 23 2022

cscott added a comment to T299528: Deprecate and remove `ParserFirstCallInit` hook (move hook/tag registration out of Parser constructor).

Because there are some legacy extensions that may deliberately register a parser function or extension tag *only* in a specific context, our "expand this extension tag/parser function" API should explicitly have a "skip this registration and look for another" return value. That way those legacy extensions can (in the new system) *always* register the extension tag/parser function, but return the "skip this registration" value in all cases except those of the specific context in which it wants to be active.

May 23 2022, 3:50 PM ยท MediaWiki-extensions-Variables, Parsing-Active-Work, Parsoid, MediaWiki-Parser
cscott added a comment to T268144: Add setFunctionHook equivalent support to Parsoid Extension API.

Some additional thoughts: it would be desirable to have a single implementation of the core parser functions (in includes/parser/CoreParserFunctions.php) that can be used by both the legacy parser and Parsoid. This shouldn't be *too* hard: most are simple text-in/text-out functions, so a wrapper which massaged the input arguments (converted from Text node or Tokens to a flat string), called the implementation function, and then converted the output from a string into a Text Node should probably be sufficient to provide for Parsoid compatibility.

May 23 2022, 3:48 PM ยท Parsoid-Read-Views (Phase 3 - Main namespace of officewiki / mediawiki.org renders with Parsoid), Patch-For-Review, Parsoid
cscott added a comment to T25138: Create a flag to setFunctionHook to force the parser not to parse parameters.

A lot of existing code implements this by playing games with strip state, ie asking the user to pass the argument surrounded by <nowiki> and then pulling the raw text out of the strip state.

May 23 2022, 3:35 PM ยท WorkType-NewFunctionality, MediaWiki-Parser

May 18 2022

cscott added a comment to T308663: LogicException: This ParserOutput contains no text!.

Did we identify the root cause? (ie, what was the change being deployed that caused this to trigger?)

May 18 2022, 4:04 PM ยท MW-1.39-notes (1.39.0-wmf.13; 2022-05-23), Platform Engineering, CommonsMetadata, MediaWiki-Parser, Wikimedia-production-error

May 16 2022

cscott added a parent task for T194815: InputBox using interface language in parser hook causing cache pollution.: T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in.
May 16 2022, 7:54 PM ยท User-TheDJ, Patch-For-Review, MediaWiki-extensions-InputBox, Commons
cscott added a subtask for T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in: T194815: InputBox using interface language in parser hook causing cache pollution..
May 16 2022, 7:54 PM ยท MediaWiki-Parser
cscott added a subtask for T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in: T109705: [Task] Consistently and correctly get target or 'cached' language in ParserOptions when userlang option is used.
May 16 2022, 7:50 PM ยท MediaWiki-Parser
cscott added a parent task for T109705: [Task] Consistently and correctly get target or 'cached' language in ParserOptions when userlang option is used: T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in.
May 16 2022, 7:50 PM ยท Wikibase-Lua, Wikidata-Sprint-2015-11-17, Patch-For-Review, Wikidata, MediaWiki-Parser
cscott added a subtask for T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in: T114640: make Parser::getTargetLanguage aware of multilingual wikis.
May 16 2022, 7:49 PM ยท MediaWiki-Parser
cscott added a parent task for T114640: make Parser::getTargetLanguage aware of multilingual wikis: T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in.
May 16 2022, 7:49 PM ยท Patch-Needs-Improvement, User-Daniel, Wikidata-Sprint-2016-01-19, Wikidata-Sprint-2015-12-01, MediaWiki-Internationalization
cscott added a comment to T249361: Is there a better way for Translate to disable PST on messages?.

Somewhat related to T299316 (w/r/t rendering JS as wikitext), T307691 (TOC generated for wikitext). A Content based way to disable TOC/PST on certain content types would be useful.

May 16 2022, 7:49 PM ยท Technical-Debt, Platform Team Initiatives (MCR), MediaWiki-Parser, MediaWiki-extensions-Translate
cscott added a project to T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in: MediaWiki-Parser.
May 16 2022, 7:44 PM ยท MediaWiki-Parser
cscott created T308487: Article content (in the "content language") often has user-interface elements ("in the UX language") mixed in.
May 16 2022, 7:43 PM ยท MediaWiki-Parser
cscott updated subscribers of T307691: JS pages show ToCs in vector-2022.

This is an interesting issue.

May 16 2022, 3:46 PM ยท Readers-Web-Backlog, MediaWiki-Parser, Patch-For-Review, Desktop Improvements

May 13 2022

cscott created T308367: Integrate ParserTestTopLevelSuite $flags with $parserTestFlags.
May 13 2022, 9:42 PM ยท Parsoid (Tracking), MediaWiki-Parser
cscott added a comment to T307720: Transition !!config sections in ParserTests to use JSON-compatible syntax.

https://codesearch.wmcloud.org/deployed/?q=!!%5Cs*config%5Cs*%24&i=nope&files=&excludeFiles=&repos= shows that it is Parsoid, core, Cite, ImageMap, SyntaxHighlight, Kartographer, TimedMdiaHandler that will need updates for Wikimedia-deployed code (and possible syncs to/from Parsoid repo to ensure CI continues to pass).

May 13 2022, 2:27 AM ยท MW-1.39-notes (1.39.0-wmf.15; 2022-06-06), Patch-For-Review, Parsoid, MediaWiki-Parser

May 12 2022

cscott added a comment to T272942: Make Graph extension compatible with Parsoid.

Dump from chat transcript outlining the issues:

The problematic code is in Graph:includes/ParserTag.php

May 12 2022, 10:09 PM ยท Patch-For-Review, Parsoid-Read-Views (Phase 1 - DiscussionTools support), MediaWiki-extensions-Graph, Parsoid-Rendering, Parsoid
cscott updated the task description for T307720: Transition !!config sections in ParserTests to use JSON-compatible syntax.
May 12 2022, 1:15 AM ยท MW-1.39-notes (1.39.0-wmf.15; 2022-06-06), Patch-For-Review, Parsoid, MediaWiki-Parser

May 11 2022

cscott created T308094: Document Parsoid replacement for MediaWiki:cite_link_label-* customization.
May 11 2022, 3:19 AM ยท Parsoid, Parsoid-Read-Views, Cite

May 6 2022

cscott added a comment to T307618: mediawiki/libs/Dodo test failure for php8.1.

There's a known-failures mechanism for broken tests. I'll look into it.

May 6 2022, 6:33 PM ยท Parsoid (Dodo), PHP 8.1 support

May 5 2022

cscott updated the task description for T307720: Transition !!config sections in ParserTests to use JSON-compatible syntax.
May 5 2022, 4:17 PM ยท MW-1.39-notes (1.39.0-wmf.15; 2022-06-06), Patch-For-Review, Parsoid, MediaWiki-Parser
cscott created T307720: Transition !!config sections in ParserTests to use JSON-compatible syntax.
May 5 2022, 4:13 PM ยท MW-1.39-notes (1.39.0-wmf.15; 2022-06-06), Patch-For-Review, Parsoid, MediaWiki-Parser

Apr 26 2022

cscott added a comment to T266361: Minerva should use a single ResourceLoader module for shipping its styles.

@Krinkle would it make sense for https://gerrit.wikimedia.org/r/c/mediawiki/services/mobileapps/+/701574 to do an HTTP request to the production ResourceLoader for /just/ the minerva module, and use that as the basis for the compiled-in styles that mobileapps ships, instead of trying to fetch the various .less files from checked out sources and manually build them?

Apr 26 2022, 2:56 PM ยท MW-1.38-notes (1.38.0-wmf.6; 2021-10-26), Patch-For-Review, Readers-Web-Backlog (Needs Prioritization (Tech)), MW-1.37-notes (1.37.0-wmf.12; 2021-06-28), Performance-Team (Radar), patch-welcome, MinervaNeue, Technical-Debt
cscott added a comment to T301600: REST endpoints cannot handle requests from ka.wikipedia.org with Georgian titles.

Four-ish options, all not great:

  • Do a simple client-side script that runs ?action=purge on every title in georgian wikipedia over the course of a week or so. "Big hammer", but reasonable since it's a small-ish wiki. (We could be fancy and only purge titles with problematic letters.)
  • Use a patch like https://github.com/wikimedia/restbase/pull/1297/commits/10aa15501e5f747bce891b174ca7fb12f46c4179 which effectively blacklists content from a particular timestamp range in restbase. (So a "rolling purge" as it were.)
  • Do some more network snooping to try to figure out exactly what requests are being made from the mobile client. (Either run android studio in a VM and use the VM facilities to capture traffic from the VM, or use pcap on linux, or figure out if Android Studio has built-in network capture utilities...)
  • Throw this over to the mobile team and try to figure out if this is an app-side caching issue, or at least narrow this down to a specific request which is failing, so we can further dig into the precise nature of the problem (aka try to figure out *why* action=purge works).
Apr 26 2022, 2:30 PM ยท Parsoid (Tracking), RESTBase, Patch-For-Review, Wikipedia-Android-App-Backlog (Android Release FY2021-22), Product-Infrastructure-Team-Backlog, Page Content Service
cscott added a comment to T301600: REST endpoints cannot handle requests from ka.wikipedia.org with Georgian titles.

Restbase content doesn't have a TTL, so waiting for the TTL expire won't work. Generally pages are edited pretty quickly, which has the effect of purging RESTBase, but Georgian wiki is low-traffic so that probably won't work either (at least not in a reasonable time frame).

Apr 26 2022, 2:21 PM ยท Parsoid (Tracking), RESTBase, Patch-For-Review, Wikipedia-Android-App-Backlog (Android Release FY2021-22), Product-Infrastructure-Team-Backlog, Page Content Service

Apr 14 2022

cscott added a comment to T294612: Raw HTML from Language Converters' title conversion displayed in plaintext.

I reviewed @Fomafix's patch shortly after it was written and made some comments, but they were never followed up on. Further, after my initial review some new code landed on master (and made it to 1.38) which would simplify what he was doing. So the patch needs to be updated, either by @Fomafix or @me or someone else.

Apr 14 2022, 7:01 PM ยท MW-1.39-notes (1.39.0-wmf.10; 2022-05-02), Parsoid (Tracking), MW-1.38-release, Chinese-Sites, Regression, MediaWiki-Parser, MediaWiki-Language-converter
cscott added a comment to T302117: ZeroConf VE for MW 1.38.

Yep, I tested locally, but I'm glad it works for at least one other person as well! Thanks, Mark!

Apr 14 2022, 6:58 PM ยท MW-1.38-release, Parsoid (Third-party)
cscott added a comment to T306186: TypeError: Argument 1 passed to DOMNode::appendChild() must be an instance of DOMNode, null given.

(Might be worth considering adding more "pages of this type" (whatever that is) to rt testing?)

Apr 14 2022, 6:14 PM ยท MW-1.39-notes (1.39.0-wmf.9; 2022-04-25), Parsoid, Wikimedia-production-error

Apr 11 2022

cscott added a comment to T214994: PP passes that run `atTopLevel` omit HTML stashed in data-mw.

Yeah, I'd expect language converter markup to be affected too (as another source of "a DOM tree serialized and embedded in data-mw"). There's a similar issue w/r/t traveresal -- some users will want to be sure to traverse even embedded HTML. I have a vague recollection there's a phab task and/or a generic traversal mechanism somewhere for that.

Apr 11 2022, 6:07 PM ยท Patch-For-Review, Parsoid

Apr 8 2022

cscott added a comment to T243854: Evaluate how to structure internal calls to TemplateData in PHP.

I created a "proper" hook interface in core for TemplateData in 1c3216bf92342e9f141946693f27c75e9c78a646 and filed T304899: Clean up the ParserFetchTemplateData hook because the interface didn't seem quite right. Then a week later I found this task. :)

Apr 8 2022, 4:44 AM ยท VisualEditor, Technical-Debt, Parsoid, MediaWiki-Parser, TemplateData

Apr 7 2022

cscott removed a project from T265518: Move Parsoid ServiceWorker.php and extension/src/Config into core: MW-1.38-release.
Apr 7 2022, 11:46 PM ยท Parsoid
cscott added a comment to T265518: Move Parsoid ServiceWorker.php and extension/src/Config into core.

Yeah, it took a while to get the trains all aligned, so I decided I'm not going to try to backport this. Thanks.

Apr 7 2022, 11:46 PM ยท Parsoid
cscott updated the task description for T297840: Populate ParserCache with Parsoid output as a post-edit hook.
Apr 7 2022, 9:00 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid
cscott merged T300546: Remove ParsoidSiteConfigInit hook once backport to 1.35 is done into T303029: Revert ParsoidSiteConfigInit hook creation.
Apr 7 2022, 8:07 PM ยท MW-1.39-notes (1.39.0-wmf.7; 2022-04-11), MW-1.38-release, MediaWiki-extensions-Translate, Parsoid
cscott merged task T300546: Remove ParsoidSiteConfigInit hook once backport to 1.35 is done into T303029: Revert ParsoidSiteConfigInit hook creation.
Apr 7 2022, 8:06 PM ยท Parsoid
cscott created T305658: Parsoid shouldn't do the mw:DisplaySpace transform (french spacing) on preformatted text.
Apr 7 2022, 8:02 PM ยท Patch-For-Review, Parsoid-Read-Views (Phase 3 - Main namespace of officewiki / mediawiki.org renders with Parsoid), Parsoid
cscott added a comment to T305597: Spaces before a colon in a preformatted block replaced with random other letters.

Arguably we should not be adding display space in <pre> tags. I don't know if that avoids the bug, but displayspace is a formatting hack for french : style " punctuation and is completely unnecessary for preformatted text.

Apr 7 2022, 2:11 PM ยท Parsoid
cscott added a comment to T216003: Linter fails to detect multiple "upright" parameters as a Bogus file option.

Caption is last one wins, but also remember that "anything not recognized as a valid option is a caption". That behavior actually makes the syntax very hard to extend & very hard to deprecate no-longer-acception options without creating a bunch of inadvertent captions.

Apr 7 2022, 1:55 AM ยท MW-1.39-notes (1.39.0-wmf.8; 2022-04-18), Parsoid, MediaWiki-extensions-Linter

Apr 5 2022

cscott changed the status of T189140: Parsoid doesn't support FunctionTags from Duplicate to Resolved.

Turns out I accomplishes this task by getting rid of the feature in core! :)

Apr 5 2022, 4:57 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid-Rendering, Parsoid
cscott added a comment to T268144: Add setFunctionHook equivalent support to Parsoid Extension API.

See also T25138: Create a flag to setFunctionHook to force the parser not to parse parameters ("raw" parameter mode) and T204307: Parser Functions should support named parameters (named parameters).

Apr 5 2022, 4:52 PM ยท Parsoid-Read-Views (Phase 3 - Main namespace of officewiki / mediawiki.org renders with Parsoid), Patch-For-Review, Parsoid
cscott merged task T189140: Parsoid doesn't support FunctionTags into T305493: Deprecate and remove setFunctionTagHook().
Apr 5 2022, 4:51 PM ยท Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid-Rendering, Parsoid
cscott merged T189140: Parsoid doesn't support FunctionTags into T305493: Deprecate and remove setFunctionTagHook().
Apr 5 2022, 4:51 PM ยท Parsoid (Tracking), Parsoid-Read-Views, MediaWiki-Parser
cscott created T305493: Deprecate and remove setFunctionTagHook().
Apr 5 2022, 4:50 PM ยท Parsoid (Tracking), Parsoid-Read-Views, MediaWiki-Parser

Apr 4 2022

cscott added a comment to T305254: Transform HTML to wikitext from a maintenance script.

There's also the /transform/html/to/wikitext endpoint exported by RESTBase (and by Parsoid if you turn on $wgParsoidEnableREST, although that endpoint is experimental and subject to change). That has the benefit of using the wiki configuration of the wiki which is exporting that endpoint. You can write a simple curl or other script to hit the REST endpoint; I think the rate limits are quite generous.

Apr 4 2022, 9:50 PM ยท Parsoid
cscott added a comment to T305254: Transform HTML to wikitext from a maintenance script.

Parsoid itself has a bin/parse.php that takes an argument, --html2wt
https://github.com/wikimedia/parsoid/blob/master/bin/parse.php#L59

Does that suffice for your purposes?

Apr 4 2022, 9:48 PM ยท Parsoid
cscott added a comment to T302114: Parsoid ignores manualthumb for non-image media.

when you use a static image thumb for a video, we're not *really* doing <a href="video"><img src="thumb"></a>, we're using a hack that works only for video-(and-audio?)-with-static-image-thumbnails: <video src="video" poster="thumb">.

To be clear, that's not something that the legacy parser does, it was just something I did in the Parsoid implementation because it seems reasonable and I hadn't explored how manualthumb worked in the timedmedia case.

Apr 4 2022, 9:40 PM ยท MW-1.39-notes (1.39.0-wmf.8; 2022-04-18), Parsoid-Read-Views (Phase 0 - Parsoid-Media-Structure), Parsoid