Page MenuHomePhabricator

cscott (C. Scott Ananian)
Parser whisperer

Projects (17)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (466 w, 4 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Yesterday

cscott committed rMLBCf4c180cedc7d: Bump version in HISTORY.md after release (authored by cscott).
Bump version in HISTORY.md after release
Sat, Sep 30, 12:24 AM
cscott committed rMLBC0d2204a8c3bb: Release wikimedia/bcp-47-code v2.0.0 (authored by cscott).
Release wikimedia/bcp-47-code v2.0.0
Sat, Sep 30, 12:24 AM
cscott committed rMLBCce658c675243: Add wikimedia/update-history to require-dev (authored by cscott).
Add wikimedia/update-history to require-dev
Sat, Sep 30, 12:24 AM

Fri, Sep 29

cscott committed rMLJCd2043e0d7894: Update HISTORY.md after release (authored by cscott).
Update HISTORY.md after release
Fri, Sep 29, 6:54 PM
cscott committed rMLJC3c82557669f7: Update build dependencies (authored by cscott).
Update build dependencies
Fri, Sep 29, 6:54 PM
cscott committed rMLJC090d90dc8846: Add support for manually encoding/decoding components with implicit types (authored by cscott).
Add support for manually encoding/decoding components with implicit types
Fri, Sep 29, 4:56 PM
cscott committed rMLJC475358c203ea: Standardize the file header for source files (authored by cscott).
Standardize the file header for source files
Fri, Sep 29, 4:56 PM
cscott committed rMLJCf3771b302e9d: Add support for class hints in the JsonCodec to save space (authored by cscott).
Add support for class hints in the JsonCodec to save space
Fri, Sep 29, 4:56 PM

Mon, Sep 25

cscott added a comment to T122934: Section-scope declarations for Wiktionary template invocations.

I'm not in factor of T331906 at all. This task (or something like it) is a much better solution.

Mon, Sep 25, 4:49 PM · All-and-every-Wiktionary, Parsing-Team--ARCHIVED
cscott added a comment to T331906: Add Lua function to read out previous section heading.

I also greatly prefer T122934 (or something similar) to this hacky workaround.

Mon, Sep 25, 4:38 PM · MediaWiki-extensions-Scribunto, All-and-every-Wiktionary

Thu, Sep 21

cscott closed T310378: Messages don't get localized in the updater as Resolved.
Thu, Sep 21, 7:15 PM · MW-1.39-notes, MW-1.40-notes, MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), MW-1.39-release, MediaWiki-Internationalization, I18n, MediaWiki-Installer
cscott updated the task description for T305161: Hard-deprecate and remove @deprecated methods from ParserOutput.
Thu, Sep 21, 4:39 PM · MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), Patch-For-Review, MW-1.40-release, MediaWiki-Parser
cscott added a comment to T305161: Hard-deprecate and remove @deprecated methods from ParserOutput.

Got most of this done in 1.41; get/setFlag will have to wait until 1.42, and hard-deprecation of ::addJsConfigVars() is still awaiting review.

Thu, Sep 21, 4:31 PM · MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), Patch-For-Review, MW-1.40-release, MediaWiki-Parser
cscott updated the task description for T305161: Hard-deprecate and remove @deprecated methods from ParserOutput.
Thu, Sep 21, 4:30 PM · MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), Patch-For-Review, MW-1.40-release, MediaWiki-Parser
cscott updated subscribers of T347009: PHP Notice: Undefined index: OutputHooks.

This notice is harmless, since OutputHooks hasn't been used in production since at least 1.38 (when deprecation notices were added to it). However @Nikerabbit points out on Slack that I really should have split this patch in two, and kept writing an empty array for $jsonData['OutputHooks'] until "the next train" from the patch which stopped reading $jsonData['OutputHooks'], so that we wouldn't get any of these notices if we needed to rollback. This was an oversight on my part, mea culpa.

Thu, Sep 21, 4:27 PM · Parsoid, Wikimedia-production-error
cscott added a subtask for T293512: ParserOutput::getText() should be removed from ParserOutput: T347062: Create HtmlHolder interface.
Thu, Sep 21, 2:36 PM · Patch-For-Review, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Content-Transform-Team-WIP, MediaWiki-Parser, Parsoid
cscott added a parent task for T347062: Create HtmlHolder interface: T293512: ParserOutput::getText() should be removed from ParserOutput.
Thu, Sep 21, 2:35 PM · MediaWiki-Parser, Parsoid-Read-Views, Parsoid
cscott added a comment to T347062: Create HtmlHolder interface.

Adding T346829 is a subtask since the HtmlHolder interface needs to be able to serialize/deserialize itself in order to be stored in ParserCache. Technically we could probably get away without this using explicit serialization/deserialization code in ParserOutput, but (as described in the HtmlHolder proposal) ideally we would like to be able to customize the on-disk representation for fast access and use independent from the details of the HTML string/DOM model formats defined by the HtmlHolder abstraction.

Thu, Sep 21, 2:35 PM · MediaWiki-Parser, Parsoid-Read-Views, Parsoid
cscott added a parent task for T346829: JsonCodec should be a standalone library: T347062: Create HtmlHolder interface.
Thu, Sep 21, 2:32 PM · JsonCodec, Librarization, MediaWiki-General
cscott added a subtask for T347062: Create HtmlHolder interface: T346829: JsonCodec should be a standalone library.
Thu, Sep 21, 2:32 PM · MediaWiki-Parser, Parsoid-Read-Views, Parsoid
cscott created T347062: Create HtmlHolder interface.
Thu, Sep 21, 2:32 PM · MediaWiki-Parser, Parsoid-Read-Views, Parsoid
cscott added a comment to T346996: Move page view language detection logic / functions out of LanguageConverter class and stop using global state.

I'm not sure exactly what you mean here. Could you explain further? The parent task seems to be much more concrete.

Thu, Sep 21, 2:16 PM · Technical-Debt (Deprecation process), MediaWiki-Language-converter, Parsoid, MediaWiki-Parser

Wed, Sep 20

cscott added a comment to T267067: Make language variant a parser option.

Also worth noting: since LanguageConverter maintains persistent parse state, it should really not be a singleton the way that LanguageConverterFactory::getLanguageConverter() makes it, but instead should be instantiated and kept live only for the duration of a particular parse and then discarded. Probably a good step there would be adding a '$forceNew' or '$forceFresh` option to LanguageConverter::getLanguageConverter() and insisting that the parser (and/or anything else which can call convert or convertTo and thus add new rules to LanguageConverter state) start with a fresh language converter.

Wed, Sep 20, 3:03 PM · MediaWiki-Language-converter, Parsoid, MediaWiki-General, MediaWiki-Parser
cscott created T346911: LanguageConverter::autoConvert() claims it "would not parse the conversion rules" but in fact it can.
Wed, Sep 20, 3:02 PM · MediaWiki-Language-converter
cscott added a comment to T267067: Make language variant a parser option.

Yeah, the way LanguageConverter::getPreferredVariant() works breaks all sorts of abstractions, and I'm frankly surprised it works at all.

Wed, Sep 20, 2:49 PM · MediaWiki-Language-converter, Parsoid, MediaWiki-General, MediaWiki-Parser
cscott created T346900: Create project tag for MediaWiki-libs-JsonCodec.
Wed, Sep 20, 2:12 PM · Project-Admins

Tue, Sep 19

cscott added a comment to T311523: Move code which depends on wfMessage out to core.

Sounds right to me. If possible you should use Parser::msg() to ensure that the page context language is set correctly. (This will be part of the Parser base class so you can use it on ParsoidParser as well sooner or later.) See T202481.

Tue, Sep 19, 8:00 PM · Patch-For-Review, Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid
cscott added a comment to T346829: JsonCodec should be a standalone library.

I believe the initial feedback on https://github.com/cscott/json-codec was that creating a new class codec for every class to be deserialized (as in https://github.com/cscott/json-codec/blob/f7bbad011d084dc4ede30bf6d1a47b853b8c64dd/src/JsonCodecableTrait.php#L48) might not meet the performance goals, but I think that can be easily patched.

Tue, Sep 19, 7:17 PM · JsonCodec, Librarization, MediaWiki-General
cscott updated the task description for T346829: JsonCodec should be a standalone library.
Tue, Sep 19, 7:14 PM · JsonCodec, Librarization, MediaWiki-General
cscott created T346829: JsonCodec should be a standalone library.
Tue, Sep 19, 7:04 PM · JsonCodec, Librarization, MediaWiki-General
cscott added a comment to T13555: .mw-editsection links should not be part of the <h#> element.

Proposed markup is, more-or-less:

<section>
 <div class="heading-stuff">
  <h2>Heading</h2>
  <div>[edit] (etc)</div>
 </div>
 <div class="content-stuff">
  Lorem ipsum...
 </div>
 <section>
    ... nested section...
 </section>
</section>
Tue, Sep 19, 3:41 PM · User-notice, Patch-For-Review, Editing-team (Kanban Board), Web-Team-Backlog (Needs Prioritization (Tech)), Technical-Debt, Epic, Accessibility, MediaWiki-Parser
cscott added a comment to T269630: Parsoid should support section editing links.

It occurs to me that the legacy ID tags (with the alternate encoding) could also be added in read views, they are just stripped by VE as far as I know.

Tue, Sep 19, 3:36 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid
cscott added a comment to T269630: Parsoid should support section editing links.

See also T13555: .mw-editsection links should not be part of the <h#> element which proposes a new organization for section edit links.

Tue, Sep 19, 3:31 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid

Mon, Sep 18

cscott added a comment to T200915: Allow SlotRoleHandlers to control page layout.

See also T151952 and linked tasks, including T90914. I suspect what we want is a proper "page layout specification" that you can "pour" the article text, sidebars, media etc into. But it's hard to find the right balance that allows some customization while maintaining a uniform overall appearance for the site which is compact enough to allow reasonably reskinning for print/mobile/etc (ie without having to recode a bazillion different page layouts).

Mon, Sep 18, 7:52 PM · Platform Team Initiatives (MCR), Multi-Content-Revisions (New Features)

Fri, Sep 15

cscott added a comment to T222807: Sandbox Graph extension into an iframe.

^ above are some initial implementations of an API in OutputPage for discussion.

Fri, Sep 15, 9:41 PM · Patch-For-Review, MediaWiki-extensions-Graph

Thu, Sep 14

cscott added a comment to T345932: DiscussionTools: Can not reply to comments.

That seems correct.

cananian:~/Wikimedia/Extensions/DiscussionTools$ git log origin/REL1_39
commit 2d87dd9f54a9a0974a7b2750a90e13aa49ccbc6f (origin/REL1_39)
Author: Translation updater bot <l10n-bot@translatewiki.net>
Date:   Tue Sep 12 07:51:44 2023 +0200
Thu, Sep 14, 3:52 PM · Editing-team (Tracking), Parsoid (Third-party), DiscussionTools
cscott added a comment to T345932: DiscussionTools: Can not reply to comments.

Since this is an older version of MediaWiki (1.39.x) I would first check that the version of DiscussionTools properly corresponds to the version of MediaWiki you are using. Version skew between Parsoid (which should be bundled with core) and the DiscussionTools extension would be a good explanation of errors of this sort.

Thu, Sep 14, 2:24 PM · Editing-team (Tracking), Parsoid (Third-party), DiscussionTools
cscott updated subscribers of T345866: Include important Core / HTML / CSS / JS changes into future MediaWiki changelogs.
Thu, Sep 14, 2:20 PM · Maintenance-Worktype, Content-Transform-Team-WIP, MediaWiki-Documentation
cscott added a comment to T345866: Include important Core / HTML / CSS / JS changes into future MediaWiki changelogs.

These was an entry in 1.40, which links to on-wiki documentation of the changes:

* (T314318) $wgParserEnableLegacyMediaDOM – This setting has been changed, so
  the alternative modern HTML structure for media is now the default. You can
  disable it for now by over-riding this back to `true` in LocalSettings, but
  this configuration will be removed in future versions of MediaWiki. For more
  details, see the documentation at:
  https://www.mediawiki.org/wiki/Parsoid/Parser_Unification/Media_structure/FAQ

This was also mentioned back in 1.37:

* $wgParserEnableLegacyMediaDOM - This setting defaults to true, and enables
  the legacy media HTML structure in the output from the Parser.  The
  alternative modern HTML structure for media is described at
  https://www.mediawiki.org/wiki/Parsing/Media_structure
  In a future release of MediaWiki this option will default to false,
  so it's a good idea to test this setting on your wiki early and report
  any issues.
Thu, Sep 14, 2:18 PM · Maintenance-Worktype, Content-Transform-Team-WIP, MediaWiki-Documentation
cscott added a comment to T345895: all_writing parameter in {{Track listing}} template renders incorrectly.

The bug does appear in the mobile-html output:
https://en.wikipedia.org/api/rest_v1/page/mobile-html/Abbey_Road#Track_listing

Thu, Sep 14, 2:13 PM · good first task, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Page Content Service

Tue, Sep 12

cscott added a comment to T334940: All Graphs broken on Wikimedia wikis (due to security issue T336556).

@Sj Briefly from the technical side: generating offline graphs would practically speaking mean resurrecting the Graphoid project, which would be more work than just re-enabling graphs (using the iframe sandboxing method currently being proposed).

Tue, Sep 12, 9:23 PM · User-zeljkofilipin, Regression, User-notice, Tech Ambassadors & Translators, MediaWiki-extensions-Graph
cscott added a comment to T341009: syntaxhighlight and translate don't play well together in Parsoid.

https://www.mediawiki.org/wiki/Parsoid/fr is an example of translation inside SyntaxHighlight working as expected.
Interestingly, https://www.mediawiki.org/wiki/Parsoid/fr?useparsoid=1 works properly too. Worth investigation.

Tue, Sep 12, 2:46 PM · Patch-For-Review, Content-Transform-Team-WIP, MediaWiki-extensions-Translate, SyntaxHighlight, Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Parsoid
cscott added a comment to T343120: Investigate cache needs based on existing logs/metrics.

Ballpark figures: previously we found that 35% of the traffic hit the 1st level varnish cache, and that the misses from the 1st level cache caused a slowdown of ~20%. If a cache with a lifetime of ~1 week and size of ~8.2GB results in a 2nd level hit rate of 78%, then we expect that 14% ((1-.35)*(1-.78)) of the traffic still misses. If that 14% causes a 20% slowdown (and all the cache hits cause 0% slowdown), waving hands a bit, then we'd expect something like 3% overall slowdown after deploying this cache. Could be some surprises in p10 vs p90 etc if it turns out that our misses turn out to cluster on the tail of the latency distribution, but there's no particular reason to think that would be the case.

Tue, Sep 12, 2:38 PM · User-jijiki, serviceops-radar, Content-Transform-Team-WIP, RESTBase Sunsetting, Epic, Page Content Service

Mon, Sep 11

cscott added a comment to T346094: InvalidArgumentException: Multiple conflicting values given for ScribuntoErrors.

Probbaly caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/946633 as @Jdlrobson notes. Worth pointing out that this marked an underlying bug in Scribunto, where some of the errors on the page were being lost because ScribuntoErrors was being overwritten. Will look into this a little bit to make sure I understand fully what's going on before reverting.

Mon, Sep 11, 8:43 PM · Patch-For-Review, MW-1.41-notes (1.41.0-wmf.26; 2023-09-12), MediaWiki-extensions-Scribunto

Fri, Sep 8

cscott added a comment to T345319: TypeError: Argument 1 passed to HtmlFormatter\HtmlFormatter::onHtmlReady() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314.

My advice would be to stop using HtmlFormatter and its dodgy regexps in WikiTextStructure (see also T255586 and T258964), and instead process the HTML using Wikimedia\Parsoid\Utils methods (like DOMUtils::parseHTML() / DOMCompat::querySelector()) and standard DOM methods (like removeChild()).

Fri, Sep 8, 5:02 PM · HtmlFormatter, MediaWiki-Parser, Discovery-Search, CirrusSearch, Wikimedia-production-error
cscott added a comment to T345319: TypeError: Argument 1 passed to HtmlFormatter\HtmlFormatter::onHtmlReady() must be of the type string, null given, called in /srv/mediawiki/php-1.41.0-wmf.24/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314.

There's another bug related to PREG_BACKTRACK_LIMIT_ERROR -- T341320. The fix for this (an incorrectly too-low backtrack limit) was deployed relatively recently; worth a double check that this isn't the same cause.

Fri, Sep 8, 4:59 PM · HtmlFormatter, MediaWiki-Parser, Discovery-Search, CirrusSearch, Wikimedia-production-error

Aug 31 2023

cscott added a comment to T141971: Within extension tags (<ref>, <gallery>), the <noinclude> and <includeonly> tags are ignored.

This seems like "working as designed" to me. The whole point of the angle-brackets-extension-tag syntax in wikitext is that *everything inside is ignored* until you get to the end tag. That allows you to encapsulate arbitrary content inside an extension tag, which is pretty fundamental. The {{#tag}} syntax is explicitly intended to be the opposite: to allow exposing the contents to the wikitext parser (so that <noinclude> etc work). So I'd lean strongly toward "won't fix" here.

Aug 31 2023, 2:30 PM · MediaWiki-Templates, All-and-every-Wikisource, Cite
cscott added a comment to T344945: Disable storage of Parsoid content in RESTbase.

Presumably the same hooks we use for pregeneration could be preserved, just hollowed out so that they "only" did cache invalidation?

Aug 31 2023, 2:20 PM · Parsoid (Tracking), RESTBase Sunsetting, Epic

Aug 30 2023

cscott added a member for MediaWiki-Parser: cscott.
Aug 30 2023, 6:38 PM

Aug 29 2023

cscott added a comment to T342352: Create ParsoidExperimentalDeprecatedPostProcessingHookDoNotUse hook for DiscussionTools testing.

Note that DT really cares a lot that the post-DT output is cached; see discussion in https://www.mediawiki.org/w/index.php?title=Parsoid%2FOutputTransform&diff=6086524&oldid=6051987 -- the experimental hook will probably not be cached, and that will have to be resolved before read views deployment.

Aug 29 2023, 3:46 PM · Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), DiscussionTools, Parsoid
cscott added a comment to T342352: Create ParsoidExperimentalDeprecatedPostProcessingHookDoNotUse hook for DiscussionTools testing.

The full proposal is at https://www.mediawiki.org/wiki/Parsoid/OutputTransform, "Add a hook in FlavorDispatcher to allow inserting additional passes into the chain for any flavor; this will replace the ParserOutputAfterTidy hook."

Aug 29 2023, 3:22 PM · Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), DiscussionTools, Parsoid
cscott updated subscribers of T328695: Parsoid's Cite output could break gadgets, bots, user scripts.

Briefly summarizing a discussion on CTT's tech forum, there are some specific changes which could move Parsoid and legacy output closer together:

  1. Add a span[rel=mw:referencedBy] wrapper to Parsoid's non-named refs output, which both brings the two cases of parsoid output closer together as well as makes parsoid's non-named refs output structurally more similar to legacy (this is proposed at T328695#9048889 as well)
  2. Add .mw-cite-backlink to the span[rel=mw:referencedBy] wrapper
  3. Rename .mw-reference-text in Parsoid to .reference-text
  4. Rename .reference-text in core to .mw-reference-text (yes, this is the mirror image of the previous proposal)
  5. Add [rel=mw:referencedBy] to the legacy output (@Arlolra believes there are some issues with cut-and-paste of RDF-contiaining output from legacy pages to VE, and has a proposal on how to deal with it)
  6. Add .mw-linkback-text to the legacy output
Aug 29 2023, 3:12 PM · MW-1.41-notes (1.41.0-wmf.18; 2023-07-18), Cite, Content-Transform-Team-WIP, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid

Aug 11 2023

cscott added a comment to T287419: `mediawiki-core-php72-phan-docker` job runs `composer install` instead of using packages from mediawiki/vendor.

T344032: CI failing on wmf/1.41.0-wmf.20 due to Parsoid\Config\SiteConfigTest is another instance of this issue, with a little wrinkle. In addition to composer install taking the wrong versions in the window between our tagging a new Parsoid release and merging the corresponding patch in mediawiki-vendor, in T344032 as I understand it the wmf.21 train was delayed, and so patches were backported to wmf.20 even after Parsoid had tagged and released for wmf.21. As a result, the composer install step in the wmf.20 CI ended up using the wmf.21 version of Parsoid instead of the proper version from mediawiki-vendor.

Aug 11 2023, 3:14 PM · Parsoid (Tracking), Release-Engineering-Team (Radar), Continuous-Integration-Config, ci-test-error (WMF-deployed Build Failure), MediaWiki-Vendor
cscott added a comment to T344032: CI failing on wmf/1.41.0-wmf.20 due to Parsoid\Config\SiteConfigTest.

Parsoid seems not use semver right now, so pinning the version would help for master and wmf branches when bringing out breaking changes, but makes the release process complicated.

Aug 11 2023, 3:11 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), ci-test-error (WMF-deployed Build Failure), Content-Transform-Team, Parsoid
cscott added a comment to T344032: CI failing on wmf/1.41.0-wmf.20 due to Parsoid\Config\SiteConfigTest.

Ok, let me try to see what's going on. Parsoid tagged -a21 and released it to mediawiki-vendor on Monday (https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/946615), well before the wmf.20 branch I believe. There's a "well known" bug in CI that means certain jobs don't run from mediawiki-vendor: T287419: `mediawiki-core-php72-phan-docker` job runs `composer install` instead of using packages from mediawiki/vendor. As a result, we broke CI for some period between when we tagged -a21 and when we actually merged it into mediawiki-vendor. Merging into mediawiki-vendor was "slow" (took a couple of hours) because there were some leftover dependencies in core on the old version, which the gate-and-submit for mediawiki-vendor correctly discovered. Once the mediawiki-vendor job merged, there weren't any further CI issues, AFAIK.

Aug 11 2023, 3:03 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), ci-test-error (WMF-deployed Build Failure), Content-Transform-Team, Parsoid

Aug 10 2023

cscott added a comment to T343843: Remove fixed width handling for cached HTML.

Sorry for the phab spam, the above patches belonged with T343849.

Aug 10 2023, 7:03 PM · MW-1.41-notes (1.41.0-wmf.24; 2023-08-29), Patch-For-Review, Web-Team-Backlog (Web Team FY2023-24 Q1 Sprint 5)
cscott updated the task description for T343994: OutputPage::setPageTitle() should call Message::escaped() when given a Message.
Aug 10 2023, 3:35 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Patch-For-Review, MediaWiki-General, MediaWiki-Internationalization
cscott created T343997: Message should support FORMAT_HTML.
Aug 10 2023, 3:35 PM · Security, MediaWiki-Internationalization
cscott created T343994: OutputPage::setPageTitle() should call Message::escaped() when given a Message.
Aug 10 2023, 3:31 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Patch-For-Review, MediaWiki-General, MediaWiki-Internationalization
cscott created T343992: Message: remove the distinction between FORMAT_TEXT and FORMAT_PARSE.
Aug 10 2023, 3:18 PM · MediaWiki-Internationalization, MediaWiki-Parser
cscott moved T343874: Tables utilising the {{album chart}} template render incorrectly from Backlog to In Progress on the Content-Transform-Team-WIP board.
Aug 10 2023, 2:27 PM · Parsoid-Read-Views (Phase 2 - testwiki Main namespace support), Content-Transform-Team-WIP, Parsoid, Wikipedia-Android-App-Backlog
cscott added a comment to T341587: Redirect to transcoded version of [[Media:file]] links.

IMO this is a feature request ("detect a direct browser request for an .ogg file and transparently redirect to a transcoded version") which needs to be refined a bit further before it is actionable. I'm not convinced this is a parser issue -- it seems perhaps this is something the iOS app could do and/or the media servers.

Aug 10 2023, 2:24 PM · Parsoid, MediaWiki-Parser
cscott added a comment to T341587: Redirect to transcoded version of [[Media:file]] links.

When the author puts a [[Media:]] link, they are explicitly requesting a direct file link. It's not entirely clear to me that we should be excessively clever here and in the process deny Safari users any way to download the original .ogg file (and perhaps play it with a proper app).

Aug 10 2023, 2:20 PM · Parsoid, MediaWiki-Parser
cscott added a comment to T343518: Bug: Turkish language not displaying "on this day" card.

The selector seems to be broken and shows trwiki as having no feed content, even though it is enabled on trwiki.

Aug 10 2023, 2:16 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Wikipedia-Android-App-Backlog (Android Release - FY2023-24), Wikifeeds, Wikipedia-iOS-App-Backlog
cscott added a comment to T343880: Horizontal lists within tables do not follow the style guidelines of the tables they are within.

I guess we need a stylesheet update in PCS?

Aug 10 2023, 2:11 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Page Content Service, Wikipedia-Android-App-Backlog

Aug 8 2023

cscott added a comment to T308471: CVE-2022-34911: Username is not escaped in the "welcomeuser" message.

Oh, wow, that sure /looks/ like a bug in ::stripAllTags(), but in fact is exactly as it is documented (as you linked):

	/**
	 * Take a fragment of (potentially invalid) HTML and return
	 * a version with any tags removed, encoded as plain text.
	 *
	 * Warning: this return value must be further escaped for literal
	 * inclusion in HTML output as of 1.10!
	 *
	 * @param string $html HTML fragment
	 * @return string
	 * @return-taint tainted
	 */
	public static function stripAllTags( $html ) {
Aug 8 2023, 9:09 PM · Patch-For-Review, MW-1.35-notes, MW-1.38-notes, MW-1.37-notes, MW-1.39-notes (1.39.0-wmf.16; 2022-06-13), user-sbassett, SecTeam-Processed, MediaWiki-User-login-and-signup, Vuln-XSS, Security, Security-Team
cscott added a comment to T308471: CVE-2022-34911: Username is not escaped in the "welcomeuser" message.

I'm contemplating changing the first parameter of LoginSignupSpecialPage::showSuccessPage() from string|Message to just plain Message, at least in part to improve phan-taint-check's accuracy: T343849. I could use some help understanding the impact of this bug, however (and some similar code in SpecialContributions). It seems to be that either you're double-escaping here (maybe deliberately?) or we should be using something other than Message::text() when OutputPage::setPageTitle is given a Message as its argument.

Aug 8 2023, 8:05 PM · Patch-For-Review, MW-1.35-notes, MW-1.38-notes, MW-1.37-notes, MW-1.39-notes (1.39.0-wmf.16; 2022-06-13), user-sbassett, SecTeam-Processed, MediaWiki-User-login-and-signup, Vuln-XSS, Security, Security-Team
cscott created T343849: OutputPage::setPageTitle() should take a Message in all cases.
Aug 8 2023, 7:19 PM · MW-1.41-notes (1.41.0-wmf.29; 2023-10-03), Patch-For-Review, phan-taint-check-plugin, MediaWiki-Internationalization
cscott added a comment to T310526: Parsoid read views doesn't support -{T|...}- page title markup.

Note that we likely *do* support the {{DISPLAYTITLE}} markup, since that is processed by CoreParserFunctions in integrated mode and writes directly into the ParserOutput. However, we should double check that handling while we're writing test cases for -{T|....}-.

Aug 8 2023, 6:14 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Editing-team (Tracking), Parsing-Active-Work, DiscussionTools, Parsoid
cscott added a comment to T316424: There is an additional <span> tag in displaytitle field of the mostread.

The spans were added to the displaytitle in T306440 (f7158c396d376fa12689c39fc3e7b3fffe34c184).

Aug 8 2023, 3:20 PM · Patch-For-Review, RESTBase, Wikifeeds, MediaWiki-extensions-FeaturedFeeds

Aug 7 2023

cscott added a comment to T341320: Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted.

Hm, weird. Can't reproduce it locally even if I crank pcre.backtrack_limit all the way down to 9, although it does crash if pcre.backtrack_limit is 8 -- but the default PHP value is 1000000.

Aug 7 2023, 5:16 PM · MW-on-K8s, Maintenance-Worktype, RemexHtml, Wikimedia-production-error
cscott added a comment to T341320: Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted.

This should be the HTML in question:


But I don't seem to have any trouble parsing it with DOMUtils::parseHTML() from the maintenance/run shell.php command line. Perhaps my CLI pcre backtrack limit is higher than the one set in production?

Aug 7 2023, 5:08 PM · MW-on-K8s, Maintenance-Worktype, RemexHtml, Wikimedia-production-error
cscott updated subscribers of T341320: Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted.
Aug 7 2023, 3:32 PM · MW-on-K8s, Maintenance-Worktype, RemexHtml, Wikimedia-production-error
cscott added a comment to T341320: Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted.

ooh, i would love to see the HTML involved in this one, since it looks like an "ordinary" call to ::parseHTML triggered it.

Aug 7 2023, 3:32 PM · MW-on-K8s, Maintenance-Worktype, RemexHtml, Wikimedia-production-error
cscott added a comment to T339195: ArgumentCountError: Too few arguments to function Wikimedia\LangConv\BacktrackState::__construct(), 4 passed in /srv/mediawiki/php-1.41.0-wmf.12/vendor/wikimedia/langconv/src/FST.php on line 168 and exactly 4 expected.

Possibly a transient during production scap when the library is briefly out of sync? Although there haven't been any change to the langconv library since Jan 30, 2023, so I don't see how we've even get into a bad sync situation.

Aug 7 2023, 3:31 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Parsoid, MediaWiki-Language-converter, Wikimedia-production-error
cscott moved T339195: ArgumentCountError: Too few arguments to function Wikimedia\LangConv\BacktrackState::__construct(), 4 passed in /srv/mediawiki/php-1.41.0-wmf.12/vendor/wikimedia/langconv/src/FST.php on line 168 and exactly 4 expected from Backlog to In Progress on the Content-Transform-Team-WIP board.
Aug 7 2023, 3:24 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Parsoid, MediaWiki-Language-converter, Wikimedia-production-error
cscott claimed T339195: ArgumentCountError: Too few arguments to function Wikimedia\LangConv\BacktrackState::__construct(), 4 passed in /srv/mediawiki/php-1.41.0-wmf.12/vendor/wikimedia/langconv/src/FST.php on line 168 and exactly 4 expected.
Aug 7 2023, 3:24 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Parsoid, MediaWiki-Language-converter, Wikimedia-production-error
cscott moved T341320: Wikimedia\RemexHtml\Tokenizer\TokenizerError: Wikimedia\RemexHtml\Tokenizer\Tokenizer: pcre.backtrack_limit exhausted from Backlog to In Progress on the Content-Transform-Team-WIP board.
Aug 7 2023, 3:23 PM · MW-on-K8s, Maintenance-Worktype, RemexHtml, Wikimedia-production-error
cscott moved T333179: (Re)deploy ParserMigration extension to production from To Deploy to To Verify on the Content-Transform-Team-WIP board.
Aug 7 2023, 3:20 PM · Patch-For-Review, Parsoid-Read-Views (Phase 1 - DiscussionTools support), Wikimedia-extension-review-queue, Wikimedia-Extension-setup, MediaWiki-extensions-ParserMigration, Content-Transform-Team-WIP, Parsoid

Aug 3 2023

cscott added a comment to T341013: Pages that were edited very recently may show the old revision content, but with the new revision ID in permalinks/wgRevisionId etc..

The fact that logged out users (and other low-priority accesses) can possibly see "old" revisions is a deliberate decision for performance reasons. There is a flag to insist that a request go to the primary DB (not a secondary read replica) which we use in some very specific places to ensure that we don't run into this problem in editing workflows, but in general APIs are allowed to return stale revisions to protect against the Michael Jackson effect, as @Krinkle notes.

Aug 3 2023, 2:48 PM · MW-1.41-notes (1.41.0-wmf.22; 2023-08-15), Wikimedia Enterprise, MediaWiki-Platform-Team, MediaWiki-General
cscott updated subscribers of T214530: Need a stable marker for title pronunciation sources in HTML.

Related to a huge number of other bugs making tweaks in summary stripping, eg T330188: Remove duplicate parenthesis stripping in /page/summary logic.

Aug 3 2023, 2:23 PM · Page Content Service
cscott renamed T259893: Content template using EasyTimeline not working properly in the app from Content template not working properly in the app to Content template using EasyTimeline not working properly in the app.
Aug 3 2023, 2:18 PM · Parsoid-Read-Views, Parsoid, Page Content Service, Product-Infrastructure-Team-Backlog-Deprecated
cscott added a comment to T272946: Make timeline extension compatible with Parsoid.

See T259893: Content template using EasyTimeline not working properly in the app, which seems to indicate an incompatibility with the <area> tags generated by EasyTimeline and Parsoid output. If so then "the legacy HTML is good enough" as @ssastry wrote above might /not/ be accurate? Needs investigation.

Aug 3 2023, 2:17 PM · EasyTimeline, Parsoid-Rendering, Parsoid
cscott added a subtask for T259893: Content template using EasyTimeline not working properly in the app: T272946: Make timeline extension compatible with Parsoid.
Aug 3 2023, 2:17 PM · Parsoid-Read-Views, Parsoid, Page Content Service, Product-Infrastructure-Team-Backlog-Deprecated
cscott added a parent task for T272946: Make timeline extension compatible with Parsoid: T259893: Content template using EasyTimeline not working properly in the app.
Aug 3 2023, 2:16 PM · EasyTimeline, Parsoid-Rendering, Parsoid
cscott added a comment to T259893: Content template using EasyTimeline not working properly in the app.

The actual "bug" here is "Parsoid support for Extension:EasyTimeline"; I don't know if that's been created yet.

Aug 3 2023, 2:15 PM · Parsoid-Read-Views, Parsoid, Page Content Service, Product-Infrastructure-Team-Backlog-Deprecated
cscott added a comment to T343245: PHP Notice: Undefined index: mwf6.

From RefGroup.php in the Cite implementation:

		if ( $refContentId ) {
			// `sup` is the wrapper created by Ref::sourceToDom()'s call to
			// `extApi->extTagToDOM()`.  Only its contents are relevant.
			$sup = $extApi->getContentDOM( $refContentId )->firstChild;
			DOMUtils::migrateChildren( $sup, $reftextSpan );

the refContentId is mwf6 which *looks* reasonable as a content id, but it is not being found in the look up. It could possibly be due to VE giving us bogus input, but it's also possible it's our fault. To inventigate.

Aug 3 2023, 2:14 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid, Wikimedia-production-error
cscott added a comment to T343387: /page/summary/ and /page/mobile-html/ do not get the latest article description in zhwiki.

I wonder if this would work if you used Accept-Language: zh-hant-tw instead of zh-tw? Both ought to be valid, but the former is the "more correct" BCP-47 code.

Aug 3 2023, 2:09 PM · Maintenance-Worktype, Page Content Service, Chinese-Sites, Content-Transform-Team-WIP

Aug 2 2023

cscott committed rEPPRa171347a2c78: Use ParserOutput::{get,set}ExtensionData instead of deprecated methods (authored by cscott).
Use ParserOutput::{get,set}ExtensionData instead of deprecated methods
Aug 2 2023, 5:41 PM
cscott added a comment to T343314: Refactor Linter config values into one method getLinterConfig().

Sorry, I did a quick first draft of this, patch above. It still needs some work, though, so happy to hand it off:

  • no test suite!
  • not actually hooked up in core.
Aug 2 2023, 4:00 PM · Maintenance-Worktype, Content-Transform-Team-WIP, Patch-For-Review, Parsoid, Web-Team-Backlog
cscott added a comment to T343226: TemplateStyles's use of dynamic Parser::$extTemplateStylesCache should be rewritten/removed.

TemplateStyles parsing is not actually context-independent (as we want the first occurrence of a <templatestyles> reference to a given CSS page to turn into a <style> tag but any subsequent ones into a no-op reference to that styles tag) and Parsoid will have to accommodate that somehow, but this specific attribute is not related to that.

Aug 2 2023, 3:29 AM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), TemplateStyles, Parsoid

Aug 1 2023

cscott added a comment to T114640: make Parser::getTargetLanguage aware of multilingual wikis.

One strawdog proposal is something like:

{{#wrapLang|<new-lang-code>|<content>}}

(which improves with heredocs, T114432)
which, in addition to properly setting the lang and dir tags on a <div> or <span> wrapper around the content, would also reset the Parser::getTargetLanguage() when parsing the content.

Aug 1 2023, 2:52 PM · Patch-Needs-Improvement, User-Daniel, Wikidata-Sprint-2016-01-19, Wikidata-Sprint-2015-12-01, MediaWiki-Internationalization
cscott updated the task description for T343227: Remove #[AllowDynamicProperties] from Parser class.
Aug 1 2023, 2:35 PM · Parsoid, MediaWiki-Parser, Parsoid-Read-Views
cscott added a project to T343229: Scribunto's use of dynamic Parser::$scribunto_engine should be rewritten/removed: Parsoid-Read-Views.
Aug 1 2023, 2:33 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid, MediaWiki-extensions-Scribunto
cscott added a parent task for T324891: Deprecated: Creation of dynamic property Parser::$scribunto_engine is deprecated: T343229: Scribunto's use of dynamic Parser::$scribunto_engine should be rewritten/removed.
Aug 1 2023, 2:33 PM · MediaWiki-extensions-Scribunto, PHP 8.2 support
cscott added a subtask for T343229: Scribunto's use of dynamic Parser::$scribunto_engine should be rewritten/removed: T324891: Deprecated: Creation of dynamic property Parser::$scribunto_engine is deprecated.
Aug 1 2023, 2:33 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid, MediaWiki-extensions-Scribunto
cscott renamed T343226: TemplateStyles's use of dynamic Parser::$extTemplateStylesCache should be rewritten/removed from Cite's use of dynamic Parser::$extTemplateStylesCache should be rewritten/removed to TemplateStyles's use of dynamic Parser::$extTemplateStylesCache should be rewritten/removed.
Aug 1 2023, 2:32 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), TemplateStyles, Parsoid
cscott added a parent task for T324901: Deprecated: Creation of dynamic property Parser::$extCite is deprecated in CiteParserTagHooks.php on line 94: T343230: Cite's use of dynamic Parser::$extCite should be rewritten/removed.
Aug 1 2023, 2:31 PM · MW-1.39-notes, MW-1.35-notes, MW-1.38-notes, MW-1.40-notes (1.40.0-wmf.14; 2022-12-12), Cite, PHP 8.2 support
cscott added a subtask for T343230: Cite's use of dynamic Parser::$extCite should be rewritten/removed: T324901: Deprecated: Creation of dynamic property Parser::$extCite is deprecated in CiteParserTagHooks.php on line 94.
Aug 1 2023, 2:31 PM · Parsoid-Read-Views (Phase 1 - DiscussionTools support), Parsoid, Cite
cscott added a subtask for T343227: Remove #[AllowDynamicProperties] from Parser class: T343230: Cite's use of dynamic Parser::$extCite should be rewritten/removed.
Aug 1 2023, 2:31 PM · Parsoid, MediaWiki-Parser, Parsoid-Read-Views