Page MenuHomePhabricator

cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (314 w, 1 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Today

cscott updated the task description for T266707: MediaHandler::addMeta() can't decide if values are escaped HTML, literal strings, or wikitext.
Wed, Oct 28, 9:03 PM · Patch-For-Review, Commons, MediaWiki-File-management
cscott created T266707: MediaHandler::addMeta() can't decide if values are escaped HTML, literal strings, or wikitext.
Wed, Oct 28, 8:29 PM · Patch-For-Review, Commons, MediaWiki-File-management
cscott updated subscribers of T266677: Use of FormatMetadata::formatNum with non-numeric value was deprecated in MediaWiki 1.36. [Called from FormatMetadata::makeFormattedData].

Investigation of this class of warning by @Reedy in T263592#6575833.

Wed, Oct 28, 3:02 PM · Patch-For-Review, Commons, MediaWiki-File-management, MediaWiki-extensions-PdfHandler, Wikimedia-production-error
cscott added a comment to T263592: Use of Language::commafy with a non-numeric string was deprecated in MediaWiki 1.36. [Called from Language::formatNum].

This is now sending deprecation warnings on the beta cluster. 4 seen in the last 24 hours: https://logstash-beta.wmflabs.org/goto/bf153a71f0c6f9d43bb9c20a112cccf4

Wed, Oct 28, 3:01 PM · MW-1.35-notes, MW-1.31-release-notes, MediaWiki-extensions-PdfHandler, Commons, MediaWiki-File-management, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), MediaWiki-extensions-Scribunto, MediaWiki-Parser, MediaWiki-General, Patch-For-Review
cscott created T266677: Use of FormatMetadata::formatNum with non-numeric value was deprecated in MediaWiki 1.36. [Called from FormatMetadata::makeFormattedData].
Wed, Oct 28, 2:57 PM · Patch-For-Review, Commons, MediaWiki-File-management, MediaWiki-extensions-PdfHandler, Wikimedia-production-error
cscott updated the task description for T266666: Parsoid needs access to basic localization functionality (DOM postprocessing).
Wed, Oct 28, 1:47 PM · Parsoid
cscott created T266666: Parsoid needs access to basic localization functionality (DOM postprocessing).
Wed, Oct 28, 1:20 PM · Parsoid
cscott added a comment to T236811: Parser creation should always use factory.

See also T257800: Replace direct constructor of Parser with calls to ParserFactory in extensions -- there seems to be some overlap here! Help always appreciated, of course...

Wed, Oct 28, 1:00 PM · MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), Patch-For-Review, Parsoid, MediaWiki-Parser

Mon, Oct 26

cscott added a comment to T263592: Use of Language::commafy with a non-numeric string was deprecated in MediaWiki 1.36. [Called from Language::formatNum].

Change 636120 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/core@master] [DNM] Add some more debugging for badly formatted numbers which probably come from unknown exif tags

https://gerrit.wikimedia.org/r/636120

Something like this would be potentially useful... Maybe rather than the deprecation here and in Language::commafy we should be disabling the deprecation again, and adding some better structured logging so we can play whack-a-mole for a while, and then when those logs are cleaner, re-enable the deprecation. At least in this case in FormatMetadata (more so than Language::commafy with it's many callers all over the place), we can easily add a parameter for referencing the exif tag (as done in the DNM patch atm; which also fudges the wfDeprecated text which is kinda nasty. Hence DNM). And as the function is private, easily remove it again a little down the way without causing any breaking changes

Or in the case of formatNum in FormatMetadata, maybe passing in non numbers is fine to this private method... Just returning them verbatim if they're not numerical and therefore can't be formatted.

Certainly, it's going to need something like the aformentioned hook to allow the extra stuff that MediaWiki-extensions-PdfHandler
adds in to be able to deal with formatting them. Or passing in like a Message object, and then not calling formatNum() with it passed (and we're in the switch ( $tag ) default condition if $val instanceof Message
Otherwise we're shooting fairly blind with these deprecation warnings, which in WMF prod, could be fairly prolific

Mon, Oct 26, 4:06 PM · MW-1.35-notes, MW-1.31-release-notes, MediaWiki-extensions-PdfHandler, Commons, MediaWiki-File-management, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), MediaWiki-extensions-Scribunto, MediaWiki-Parser, MediaWiki-General, Patch-For-Review

Fri, Oct 23

cscott added a comment to T263592: Use of Language::commafy with a non-numeric string was deprecated in MediaWiki 1.36. [Called from Language::formatNum].

Look to be media file related
and 3 are MediaWiki-extensions-PdfHandler related

Fri, Oct 23, 4:29 PM · MW-1.35-notes, MW-1.31-release-notes, MediaWiki-extensions-PdfHandler, Commons, MediaWiki-File-management, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), MediaWiki-extensions-Scribunto, MediaWiki-Parser, MediaWiki-General, Patch-For-Review
cscott committed rELUA6a999fb6aaa0: Fix a lua test which uses a deprecated Message formatter feature (authored by cscott).
Fix a lua test which uses a deprecated Message formatter feature
Fri, Oct 23, 4:24 PM
cscott added a comment to T263592: Use of Language::commafy with a non-numeric string was deprecated in MediaWiki 1.36. [Called from Language::formatNum].

Thanks for looking into this. Just to throw it out there, another option is to undefine the -value message, which will ensure the output is reported literally and not formatted. This would work for "scribunto-limitreport-estmemusage-value" which currently has the value $1. For scribunto-limitreport-virtmemusage-value and scribunto-limitreport-memusage-value you could do the $1/$2 formatting when those values are defined, and the undefine the i18n message to prevent the formatting from being applied twice.

Fri, Oct 23, 2:37 PM · MW-1.35-notes, MW-1.31-release-notes, MediaWiki-extensions-PdfHandler, Commons, MediaWiki-File-management, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), MediaWiki-extensions-Scribunto, MediaWiki-Parser, MediaWiki-General, Patch-For-Review
cscott updated subscribers of T237467: Invariant failed: Bad UTF-8 (full string verification).

@Quiddity worth notifying the wikis about the new Category:Pages with non-numeric formatnum arguments. The English wiki page has a good description of what causes pages to be added to this category and how to fix them; thanks x100 to @Jonesey95 for writing that up so well. The initiating bug (T237467) was some poorly-written code dealing with languages with non-default numeric grouping, which could cause invalid UTF-8 when fed non-numeric data (which it wasn't expecting). Cleaning up this crufty code (in part by more-rigorously defining expected inputs to formatNum/commafy etc) allowed (at the end of a long chain of clean up patches) https://gerrit.wikimedia.org/r/c/mediawiki/core/+/384006, which brings mediawiki's number formatting up to date with TR35 and latest CLDR using the native i18n features of PHP7 (T167088).

Fri, Oct 23, 1:53 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott closed T167088: Replace formatnum implementation with PHP NumberFormatter as Resolved.
Fri, Oct 23, 1:49 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Language-Team, MediaWiki-Internationalization, I18n
cscott closed T167088: Replace formatnum implementation with PHP NumberFormatter , a subtask of T213072: Language tools maintenance intervention: Improve processes for i18n support to be more fluent, as Resolved.
Fri, Oct 23, 1:48 AM · MediaWiki-Internationalization

Thu, Oct 22

cscott added a comment to T259832: mediawiki-vendor submodule doesn't get automatically bumped on release branches.

This is going to be an issue again on Monday (2020-10-26) as we've got a Parsoid patch to swat via mediawiki/vendor.

Thu, Oct 22, 10:24 PM · Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), User-brennen, Release-Engineering-Team (Deployment services), Patch-For-Review, Parsoid
cscott created T266285: Deploy 6-element DSR to prod.
Thu, Oct 22, 9:50 PM · Parsoid
cscott renamed T262500: MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk from MediaWiki $minimumGroupingDigits is off-by-one to MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk.
Thu, Oct 22, 3:19 PM · Patch-For-Review, I18n, MediaWiki-Internationalization
cscott added a comment to T262500: MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk.

Huh. You're right. CLDR does define minimumGroupingDigits as 1 for hy, ru, and uk. It looks like I misinterpreted the comments left in the patch I took over (ce8d0e9599a84565d53965481d1c163a90c4e6dd) as a commentary on the definition on minimumGroupingDigits, not on the correctnes of our hy/ru/uk settings. I'll update the patches and this task title to reflect correcting the discrepancy between CLDR and Mediawiki for hy/ru/uk.

Thu, Oct 22, 3:16 PM · Patch-For-Review, I18n, MediaWiki-Internationalization

Wed, Oct 21

cscott closed T237467: Invariant failed: Bad UTF-8 (full string verification) as Resolved.

Resolving this issue, which was bad UTF-8 generated by commafy (indirectly from formatnum). Opened new task T266129: Invariant failed: Bad UTF-8 (full string verification) -- bad UTF-8 from database for the case which @thcipriani described in T237467#6543711 as that is a different root cause (bad UTF-8 in wikitext stored in the DB). As @Jonesey95 pointed out, T10327: Language::formatNum() should prefix negative values with − (minus sign U+2212) already exists for the issue of U+2212 in the *output* of formatnum; I suppose that's a reasonable place to continue the discussion of U+2212 in the *input* of formatnum as well if more needs to be said on that.

Wed, Oct 21, 1:37 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott closed T237467: Invariant failed: Bad UTF-8 (full string verification), a subtask of T229015: Tracking: Direct live production traffic at Parsoid/PHP, as Resolved.
Wed, Oct 21, 1:37 PM · User-notice, Platform Engineering, User-WDoran, Parsoid-PHP
cscott added a subtask for T229015: Tracking: Direct live production traffic at Parsoid/PHP: T266129: Invariant failed: Bad UTF-8 (full string verification) -- bad UTF-8 from database.
Wed, Oct 21, 1:35 PM · User-notice, Platform Engineering, User-WDoran, Parsoid-PHP
cscott added a parent task for T266129: Invariant failed: Bad UTF-8 (full string verification) -- bad UTF-8 from database: T229015: Tracking: Direct live production traffic at Parsoid/PHP.
Wed, Oct 21, 1:35 PM · Parsing-Active-Work, Parsoid, Wikimedia-production-error
cscott created T266129: Invariant failed: Bad UTF-8 (full string verification) -- bad UTF-8 from database.
Wed, Oct 21, 1:33 PM · Parsing-Active-Work, Parsoid, Wikimedia-production-error
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

@Jonesey95 the issue is that the underlying unicode number-formatting code doesn't recognize a non-ascii sign there. As we move into the global templates era, where the same template code is going to (hopefully) be shared by multiple wikis, it's important to clearly document the expected input and output so that localization works correctly. If you use a non-ascii minus sign, it will get copied literally to the output, but it will not be understood as a minus sign and thus not localized. This means that (for instance) an Arabic ALM character will be generated in the wrong place -- between the U+2212 and the number, instead of before the minus sign. As it turns out, we strip the ALM/RLM/LRM characters currently, but this set of bug fixes is motivated by issues in commafy where folks were passing non-numeric data to commafy and it ended up generating bad UTF-8 as a result -- but only on wikis which didn't use the standard "3 digit grouping" rules for commas. Those sorts of bugs are hard to identify because they disproportionately affect non-US/non-Euro wikis. A bit of strictness here helps ensure we can adequately support wikis in all languages, even if they have "unusual" requirements for how negative numbers are formatted, or how commas are inserted, etc.

Wed, Oct 21, 1:22 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T10327: Language::formatNum() should prefix negative values with − (minus sign U+2212).

I'd support changing the output of {{formatnum}} to use U+2212 minus. This will probably break a bunch of pages which pass the output of formatnum back to expr but as T10327#126094 says this is broken anyway for (a) fractional values when user preference is for comma as decimal separator, (b) values greater than 100 when locale specifies two-digit grouping, (c) values greater than 1000. Adding "(d) when value is negative" may actually prompt needed cleanup.

Wed, Oct 21, 1:22 PM · User-notice, Patch-For-Review, MediaWiki-Interface
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

Still seen in production today for wmf.11:

reqId: f6017f38-e1bf-4bf5-a3a1-5388d1acfa6e

#0 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Utils/PHPUtils.php(258): Wikimedia\Assert\Assert::invariant(boolean, string)
#1 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/PegTokenizer.php(115): Wikimedia\Parsoid\Utils\PHPUtils::assertValidUTF8(string)
#2 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/TokenTransformManager.php(189): Wikimedia\Parsoid\Wt2Html\PegTokenizer->processChunkily(string, array)
#3 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/TokenTransformManager.php(189): Wikimedia\Parsoid\Wt2Html\TokenTransformManager->processChunkily(string, array)
#4 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/TokenTransformManager.php(189): Wikimedia\Parsoid\Wt2Html\TokenTransformManager->processChunkily(string, array)
#5 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/HTML5TreeBuilder.php(420): Wikimedia\Parsoid\Wt2Html\TokenTransformManager->processChunkily(string, array)
#6 [internal function]: Wikimedia\Parsoid\Wt2Html\HTML5TreeBuilder->processChunkily(string, array)
#7 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/DOMPostProcessor.php(900): Generator->current()
#8 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(152): Wikimedia\Parsoid\Wt2Html\DOMPostProcessor->processChunkily(string, array)
#9 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(202): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#10 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipelineFactory.php(299): Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseToplevelDoc(string, array)
#11 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Core/WikitextContentModelHandler.php(81): Wikimedia\Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#12 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Parsoid.php(161): Wikimedia\Parsoid\Core\WikitextContentModelHandler->toDOM(Wikimedia\Parsoid\Config\Env)
#13 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/src/Parsoid.php(193): Wikimedia\Parsoid\Parsoid->parseWikitext(MWParsoid\Config\PageConfig, array)
#14 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/extension/src/Rest/Handler/ParsoidHandler.php(588): Wikimedia\Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, NULL)
#15 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(88): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(MWParsoid\Config\PageConfig, array)
#16 /srv/mediawiki/php-1.36.0-wmf.11/vendor/wikimedia/parsoid/extension/src/Rest/Handler/ParsoidHandler.php(1047): MWParsoid\Rest\Handler\PageHandler->realExecute()
#17 /srv/mediawiki/php-1.36.0-wmf.11/includes/Rest/Router.php(381): MWParsoid\Rest\Handler\ParsoidHandler->execute()
#18 /srv/mediawiki/php-1.36.0-wmf.11/includes/Rest/Router.php(316): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#19 /srv/mediawiki/php-1.36.0-wmf.11/includes/Rest/EntryPoint.php(155): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#20 /srv/mediawiki/php-1.36.0-wmf.11/includes/Rest/EntryPoint.php(119): MediaWiki\Rest\EntryPoint->execute()
#21 /srv/mediawiki/php-1.36.0-wmf.11/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#22 /srv/mediawiki/w/rest.php(3): require(string)
#23 {main}
Wed, Oct 21, 3:26 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

Sorry, that's not how formatnum works. It takes a PHP numeric string and formats it for human consumption, as per http://unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns and https://www.php.net/manual/en/class.numberformatter.php. It is intended for making the output of computed values more human-friendly.
This is how it has always worked, it just used to be more tolerant and pass through non-numeric characters (like U+2212) unmodified, bypassing localization and user numeric preferences. Please open a different phab task if you would like to change it (I'd be supportive of making the *output* of formatnum use a U+2212 for instance).

Wed, Oct 21, 3:06 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

The following wikicode samples assign the non-numeric error category on en.WP:

{{formatnum:−9.1}}
{{formatnum:−9000000}}

The input, negative 9.1 or negative nine million, are valid numbers in the appropriate format. Should they throw an error? My apologies if I am reporting this in the wrong ticket.

I can't reproduce this: https://en.wikipedia.org/wiki/User:Cscott/T237467

I can reproduce it in article space. Try searching for insource:/formatnum:−/

The problem currently appears to manifest here:
https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
https://en.wikipedia.org/wiki/Almirante_Latorre-class_battleship

Wed, Oct 21, 2:40 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid

Tue, Oct 20

cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

For {{formatnum:1,234}} it gives warning tracking category, so to preserve the old behavior editors would need to start with reverse format, followed by formatting e,: {{formatnum::{{{x}}|}} -> {{formatnum:{{formatnum:{{{x}}|R}}}} ?

or is there a different better way to format unknown input (with or without commas) into nicely formatted number?

Tue, Oct 20, 6:16 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

The following wikicode samples assign the non-numeric error category on en.WP:

{{formatnum:−9.1}}
{{formatnum:−9000000}}

The input, negative 9.1 or negative nine million, are valid numbers in the appropriate format. Should they throw an error? My apologies if I am reporting this in the wrong ticket.

Tue, Oct 20, 6:12 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid

Mon, Oct 19

cscott added a comment to T263594: CI: Ensure SonarQubeBot code coverage metrics captures all test modes run during Parsoid CI tasks.

One of the issues here is (AIUI) coverage metrics right now are only computed for *unit tests* not *integration tests*. The vast majority of parsoid's tests use the parserTests infrastructure, which is an integration test framework. This was done deliberately in the coverage configuration for some reasons I don't fully understand.

Mon, Oct 19, 10:46 PM · Parsoid, Quality-and-Test-Engineering-Team (QTE)

Thu, Oct 15

cscott added a comment to T263928: VisualEditor in 1.35 not working (404 / Permanent Loading).

Here is the log of the server:

162.158.156.190 - - [08/Oct/2020:15:18:06 -0400] "GET /Main_Page HTTP/1.1" 200 8475 "https://[domain]/Main_Page?action=edit&veswitched=1" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 OPR/71.0.3770.228 (Edition avira-2)"
[...]
162.158.73.250 - - [08/Oct/2020:15:18:11 -0400] "GET /rest.php/[domain]/v3/page/html/Main_Page/95?redirect=false&stash=true HTTP/1.1" 403 455 "-" "VisualEditor-MediaWiki/1.35.0"
Thu, Oct 15, 9:23 PM · Editing-team (Tracking), RESTBase-API, RESTBase, Parsoid, VisualEditor, MW-1.35-release

Wed, Oct 14

cscott updated the task description for T265518: Move Parsoid ServiceWorker.php and extension/src/Config into core.
Wed, Oct 14, 6:08 PM · Platform Team Workboards (Green), Parsoid
cscott updated the task description for T265518: Move Parsoid ServiceWorker.php and extension/src/Config into core.
Wed, Oct 14, 6:07 PM · Platform Team Workboards (Green), Parsoid
cscott created T265518: Move Parsoid ServiceWorker.php and extension/src/Config into core.
Wed, Oct 14, 6:04 PM · Platform Team Workboards (Green), Parsoid

Thu, Oct 8

cscott merged task T264643: Whitespace trimming after rendering transparent nodes (comments, category links, etc.) into T264921: Deprecate/remove "whitespace stripped after HTML-style comment in wikitext".
Thu, Oct 8, 4:59 PM · Parsoid
cscott merged T264643: Whitespace trimming after rendering transparent nodes (comments, category links, etc.) into T264921: Deprecate/remove "whitespace stripped after HTML-style comment in wikitext".
Thu, Oct 8, 4:59 PM · Parsoid
cscott reopened T264643: Whitespace trimming after rendering transparent nodes (comments, category links, etc.) as "Open".
Thu, Oct 8, 4:58 PM · Parsoid

Wed, Oct 7

cscott added a parent task for T264921: Deprecate/remove "whitespace stripped after HTML-style comment in wikitext": T264919: Deprecate and remove 'rendering-transparent nodes'.
Wed, Oct 7, 6:12 PM · Parsoid
cscott added a subtask for T264919: Deprecate and remove 'rendering-transparent nodes': T264921: Deprecate/remove "whitespace stripped after HTML-style comment in wikitext".
Wed, Oct 7, 6:12 PM · Parsoid
cscott created T264921: Deprecate/remove "whitespace stripped after HTML-style comment in wikitext".
Wed, Oct 7, 6:12 PM · Parsoid
cscott created T264919: Deprecate and remove 'rendering-transparent nodes'.
Wed, Oct 7, 6:09 PM · Parsoid

Tue, Oct 6

cscott added a comment to T264804: Flesh out Parsoid's interface / boundary wrt MediaWiki that lets it operate in standalone mode in the face of increasing MediaWiki integration.

As a meta-comment, this phab task seems to be conflating two different things. One of them is abstracting out a number of different interfaces *in core* in a clean way (and even there, we have Parser, ParserOutput, CacheTime, Content, etc), and the other is abstracting runtime modes *in Parsoid* (standalone, integrated, mocked, API testing, etc). The former is the 'Parser API'; the latter is the 'Config API' (well, it's the stuff living in Parsoid\Config namespace right now).

Tue, Oct 6, 9:32 PM · MediaWiki-Parser, Parsoid
cscott renamed T264782: Parsoid.php entry points should accept PageBundles for (html/dom)2wikitext from Parsoid.php entry points should accept DOM as well as HTML to Parsoid.php entry points should accept DOM objects as well as HTML strings.
Tue, Oct 6, 7:42 PM · Technical-Debt, Performance Issue, Parsoid
cscott added a comment to T264782: Parsoid.php entry points should accept PageBundles for (html/dom)2wikitext.

I think our entry points should actually take PageBundles, and we should have factory methods to create PageBundles efficiently from DOM without serializing to string.

Tue, Oct 6, 7:38 PM · Technical-Debt, Performance Issue, Parsoid
cscott added a comment to T262409: Space at end of another list item removed.

According to editing team, this patch *halved* the number of dirty diffs, from 3.7% before this patch rolled out to less than 1%. Hopefully the rest of the patches in the 'preserve trimmed whitespace' series will crush this even lower.

Tue, Oct 6, 4:00 PM · DiscussionTools, Parsoid
cscott added a comment to T263928: VisualEditor in 1.35 not working (404 / Permanent Loading).

I think this explains why the Parsoid routes were not loaded when I had

$wgVirtualRestConfig['modules']['parsoid']['forwardCookies'] = true;

in LocalSettings.php

Tue, Oct 6, 5:00 AM · Editing-team (Tracking), RESTBase-API, RESTBase, Parsoid, VisualEditor, MW-1.35-release

Mon, Oct 5

cscott added a comment to T263928: VisualEditor in 1.35 not working (404 / Permanent Loading).

It looks like a number of separate issues are being mixed together in this task now. Some of them apparently are due to folks following outdated instructions in [[Parsoid/PHP]] instead of the @updated docs on the main [[Parsoid]] page. @ti_infotrad seems like they managed to address their issues. @Ciencia_Al_Poder was discussing differences between the configuration when RESTBase is used and when it is not.

Mon, Oct 5, 5:47 PM · Editing-team (Tracking), RESTBase-API, RESTBase, Parsoid, VisualEditor, MW-1.35-release
cscott updated the task description for T264643: Whitespace trimming after rendering transparent nodes (comments, category links, etc.).
Mon, Oct 5, 4:43 PM · Parsoid
cscott created T264643: Whitespace trimming after rendering transparent nodes (comments, category links, etc.).
Mon, Oct 5, 4:41 PM · Parsoid

Wed, Sep 30

cscott added a comment to T264241: Class 'LathMathML' not found.

@ssastry noticed that the crash is reported on line 356. That's the line with MathMathML::batchEvaluate *in wmf.10*. @DannyS712's patches changed the line numbers just enough so that the line with MathMathML::batchEvaluate is now line 352.

Wed, Sep 30, 9:26 PM · serviceops, Math, Wikimedia-production-error
cscott added a comment to T264241: Class 'LathMathML' not found.

This might be a corruption bug. There's a call to MathMathML::batchEvaluate on line 352 of MathHooks::onParserAfterTidy. Seems like PHP is corrupting its strings somehow to turn MathMathML into LathMathML, and then crashing?

Wed, Sep 30, 9:20 PM · serviceops, Math, Wikimedia-production-error
cscott updated subscribers of T264241: Class 'LathMathML' not found.

Maybe 3f16b9f1c2af13dfc4e9758e17fa9d086f9c35a4 (@DannyS712 )? That was the most recent change to MathHooks...

Wed, Sep 30, 9:17 PM · serviceops, Math, Wikimedia-production-error
cscott added a comment to T264241: Class 'LathMathML' not found.

This is a bug in the Math extension's ParserAfterTidy hook:

#0 /srv/mediawiki/php-1.36.0-wmf.11/includes/HookContainer/HookContainer.php(333): MathHooks::onParserAfterTidy(Parser, string)
Wed, Sep 30, 9:15 PM · serviceops, Math, Wikimedia-production-error
cscott added a comment to T264241: Class 'LathMathML' not found.

Hm, strange, codesearch doesn't know anything about LathMathML...
https://codesearch.wmcloud.org/search/?q=LathMathML&i=nope&files=&repos=
Must be a constructed class name somehow?

Wed, Sep 30, 9:14 PM · serviceops, Math, Wikimedia-production-error

Tue, Sep 29

cscott reopened T258719: Parsoid adds mw-content-ltr / mw-content-rtl class to <body> tag unconditionally whereas core parser doesn't seem to, a subtask of T264123: Ensure Parsoid's outermost element matches expectations of CSS and skins, as Open.
Tue, Sep 29, 8:02 PM · Readers-Web-Backlog, Parsoid-Rendering, Parsoid
cscott reopened T258719: Parsoid adds mw-content-ltr / mw-content-rtl class to <body> tag unconditionally whereas core parser doesn't seem to as "Open".

I'm going to reopen this as a child of T264123, instead of closing as a dup, since I think this is a more actionable task than the parent.

Tue, Sep 29, 8:02 PM · Parsoid, Parsoid-Rendering
cscott added a parent task for T258719: Parsoid adds mw-content-ltr / mw-content-rtl class to <body> tag unconditionally whereas core parser doesn't seem to: T264123: Ensure Parsoid's outermost element matches expectations of CSS and skins.
Tue, Sep 29, 8:00 PM · Parsoid, Parsoid-Rendering
cscott added a subtask for T264123: Ensure Parsoid's outermost element matches expectations of CSS and skins: T258719: Parsoid adds mw-content-ltr / mw-content-rtl class to <body> tag unconditionally whereas core parser doesn't seem to.
Tue, Sep 29, 8:00 PM · Readers-Web-Backlog, Parsoid-Rendering, Parsoid

Sep 22 2020

cscott added a comment to T263592: Use of Language::commafy with a non-numeric string was deprecated in MediaWiki 1.36. [Called from Language::formatNum].

The culprit here seems to be Scribunto. Eg:

Sep 22 2020, 10:07 PM · MW-1.35-notes, MW-1.31-release-notes, MediaWiki-extensions-PdfHandler, Commons, MediaWiki-File-management, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), MediaWiki-extensions-Scribunto, MediaWiki-Parser, MediaWiki-General, Patch-For-Review

Sep 21 2020

cscott added a comment to T260960: Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit.

Since the typeof changes, the heading is considered as "children-changed" and selser isn't used for it. [...] However, any cached pages will still have that problem.

Sep 21 2020, 10:18 PM · User-Ryasmeen, Patch-For-Review, Editing-team (FY2020-21 Kanban Board), Parsing-Active-Work, Regression, Parsoid, VisualEditor
cscott updated the task description for T260960: Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit.
Sep 21 2020, 10:16 PM · User-Ryasmeen, Patch-For-Review, Editing-team (FY2020-21 Kanban Board), Parsing-Active-Work, Regression, Parsoid, VisualEditor
cscott renamed T260960: Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit from Visual editor adds unwanted whitespaces at the end of some section headings at each edition to Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit.
Sep 21 2020, 10:16 PM · User-Ryasmeen, Patch-For-Review, Editing-team (FY2020-21 Kanban Board), Parsing-Active-Work, Regression, Parsoid, VisualEditor
cscott added a comment to T260960: Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit.

So only a bug in headings which contain french spacing, most often a space before a colon. The mention of plain == Heading == in the original bug summary was throwing me off.

Sep 21 2020, 10:15 PM · User-Ryasmeen, Patch-For-Review, Editing-team (FY2020-21 Kanban Board), Parsing-Active-Work, Regression, Parsoid, VisualEditor
cscott assigned T263500: phan-taint-check-plugin: Undefined constant 'ast\AST_LIST' to hashar.
Sep 21 2020, 9:53 PM · Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Patch-For-Review, Release-Engineering-Team (CI & Testing services), Quibble, Platform Team Initiatives (Parsoid REST API in PHP (CDP2)), phan-taint-check-plugin
cscott added a comment to T263500: phan-taint-check-plugin: Undefined constant 'ast\AST_LIST'.
[17:42:21] <subbu> cscott, yes .. i saw a phab email over the weekend .. hashar iirc.
[17:43:50] <subbu> https://phabricator.wikimedia.org/T227352#6474369
[17:45:30] <James_F> cscott: Ah, yes.
[17:45:44] <cscott> https://phabricator.wikimedia.org/T263500 <- parsoid CI breaking
[17:46:26] <wikibugs> (CR) jerkins-bot: [V: -1] WIP: phan: Remove --allow-polyfill-parser [services/parsoid] - https://gerrit.wikimedia.org/r/628952 (owner: C. Scott Ananian)
[17:46:39] <cscott> i removed the --allow-polyfill-parser option in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/628952 we'll see if that's enough to fix it
[17:47:54] <James_F> It won't, it'll just fail earlier.
[17:49:03] <cscott> James_F: gr8
[17:49:42] <James_F> The polyfill isn't complete, for some reason.
[17:49:46] -*- James_F rolls his eyes.
[17:50:09] <cscott> it's actually failing when running `composer test` AFAICT, but only in the quibble-noselenium case, not in the parsoidsvv-composer-package jobs, etc.
[17:50:30] <James_F> Yeah, the new quibble job is probably mis-built by hashar.
[17:50:43] <James_F> Mark the task as UBN and make it his problem when he wakes up tomorrow?
[17:51:40] <cscott> James_F: Daimona already closed this as a dup of T262451
[17:51:41] <stashbot> T262451: Release bugfix for ast\AST_LIST in phan-taint-check-plugin to unstuck libup updates on some repos - https://phabricator.wikimedia.org/T262451
[17:51:51] <James_F> cscott: Tough.
[17:52:04] <cscott> which is two weeks old
[17:52:38] <James_F> Yeah, that's a minor "nice to have". This is UBN.
Sep 21 2020, 9:53 PM · Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Patch-For-Review, Release-Engineering-Team (CI & Testing services), Quibble, Platform Team Initiatives (Parsoid REST API in PHP (CDP2)), phan-taint-check-plugin
cscott added a comment to T263313: Provide a job template for phan jobs for php libraries.

Running phan from composer test is fine for us -- we just need to (apparently) make sure ext-ast is installed when running composer test.

Sep 21 2020, 9:48 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1)), phan, Continuous-Integration-Config
cscott reopened T263500: phan-taint-check-plugin: Undefined constant 'ast\AST_LIST' as "Open".
Sep 21 2020, 9:47 PM · Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Patch-For-Review, Release-Engineering-Team (CI & Testing services), Quibble, Platform Team Initiatives (Parsoid REST API in PHP (CDP2)), phan-taint-check-plugin
cscott created T263500: phan-taint-check-plugin: Undefined constant 'ast\AST_LIST'.
Sep 21 2020, 9:45 PM · Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), Patch-For-Review, Release-Engineering-Team (CI & Testing services), Quibble, Platform Team Initiatives (Parsoid REST API in PHP (CDP2)), phan-taint-check-plugin
cscott added a comment to T262726: testreduce server fails inserting results in db in some cases.

Three options:

  1. The field is declared as text, and maybe declaring as binary would fix this w/o further hard work
  2. We're truncating test results in a UTF-8-unsafe manner, like w/ JS surrogates
  3. The core parser is actually giving us bad UTF-8 due to some preexisting bug (like the formatNum bug I recently fixed) and so even though we're not doing anything "wrong" we still end up with bad UTF-8 in our output.
Sep 21 2020, 4:45 PM · Parsoid, Internet-Archive, Parsoid-Tests

Sep 17 2020

cscott added a comment to T249743: PHP Notice: Uninitialized string offset: 0.

7bdf506bce9d6e8e0572b30b2de0dd1f3498b79a and 202b6a31a72f9f7608a0070702a66ba97daa0641 did land in Parsoid this week, so it's not impossible the most recent log incidents are related to that.

Sep 17 2020, 7:03 PM · User-brennen, Wikimedia-production-error, Parsoid
cscott updated the task description for T263033: Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey().
Sep 17 2020, 3:17 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), AbuseFilter, Platform Team Workboards (External Code Reviews), User-DannyS712, Technical-Debt (Deprecation process), MediaWiki-File-management

Sep 16 2020

cscott renamed T263033: Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey() from Remove fallback to `$wgUser` in LocalRepo::findFiles() to Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey().
Sep 16 2020, 4:30 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), AbuseFilter, Platform Team Workboards (External Code Reviews), User-DannyS712, Technical-Debt (Deprecation process), MediaWiki-File-management
cscott added a subtask for T245331: Remove core fallbacks to global $wgUser [1.36]: T263033: Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey().
Sep 16 2020, 3:12 PM · MW-1.36-release, Technical-Debt (Deprecation process), User-DannyS712, MediaWiki-General
cscott added a parent task for T263033: Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey(): T245331: Remove core fallbacks to global $wgUser [1.36].
Sep 16 2020, 3:12 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), AbuseFilter, Platform Team Workboards (External Code Reviews), User-DannyS712, Technical-Debt (Deprecation process), MediaWiki-File-management
cscott created T263033: Remove fallback to `$wgUser` in LocalRepo::findFiles(), FileRepo::findFiles(), and FileRepo::findFileFromKey().
Sep 16 2020, 3:12 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), AbuseFilter, Platform Team Workboards (External Code Reviews), User-DannyS712, Technical-Debt (Deprecation process), MediaWiki-File-management
cscott added a comment to T263014: Argument 2 passed to File::userCan() must be an instance of User, null given, called in /srv/mediawiki/php-1.36.0-wmf.9/includes/filerepo/LocalRepo.php on line 275.

Ah, found it: the issue is that File::userCan(...$user=null) wasn't previously hard-deprecated, even though the null case in ArchivedFile::userCan and OldLocalFile::userCan were. So we weren't seeing the hard-deprecation message (and probably there are a bunch of other users which weren't previously tickling the hard deprecation).

Sep 16 2020, 2:43 PM · MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), User-DannyS712, Patch-For-Review, Commons, MediaWiki-File-management, Wikimedia-production-error
cscott added a comment to T263014: Argument 2 passed to File::userCan() must be an instance of User, null given, called in /srv/mediawiki/php-1.36.0-wmf.9/includes/filerepo/LocalRepo.php on line 275.

A simple fix would be to use $user = $wgUser in that section in LocalRepo, which would match the old behavior. But it seems like we should figure out where those wfDeprecated messages disappeared to in case there are other lurking issues here which weren't previously caught.

Sep 16 2020, 2:36 PM · MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), User-DannyS712, Patch-For-Review, Commons, MediaWiki-File-management, Wikimedia-production-error
cscott added a comment to T263014: Argument 2 passed to File::userCan() must be an instance of User, null given, called in /srv/mediawiki/php-1.36.0-wmf.9/includes/filerepo/LocalRepo.php on line 275.

lowering urgency and removing as train blocker after @Jdforrester-WMF backported revert fix for train. This still needs fixing in parsoid properly, though.

Sep 16 2020, 2:27 PM · MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), User-DannyS712, Patch-For-Review, Commons, MediaWiki-File-management, Wikimedia-production-error

Sep 15 2020

cscott added a comment to T262838: Creating a new page on Wikitech using VE fails with a Parsoid/RESTBase error message.

@cscott We may want to backport to 1.35 as this affects the claimed VE native support in that release.

Sep 15 2020, 10:13 PM · User-Ryasmeen, Skipped QA, MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Editing-team (FY2020-21 Kanban Board), VisualEditor, wikitech.wikimedia.org, Parsoid
cscott updated subscribers of T262943: Parser tests does not cover the "indented table" case.

@Esanders, @matmarex : this difference in behavior might be relevant to your comment-parsing code. If you parse the page using the legacy parser output, you'll end up with a different list structure when you re-parse it using the parsoid output.

Sep 15 2020, 7:28 PM · Patch-For-Review, MediaWiki-Parser, Parsoid
cscott updated subscribers of T262943: Parser tests does not cover the "indented table" case.

@Arlolra thanks! I guess the regexp search I was using in parserTests.txt wasn't correct. We don't seem to cover the case the DiscussionTools team was interested in:

: parent
:: here is a comment
:: {|
|foo
|bar
|}

As I understand it (need to actually write the test case), the legacy parser closes the parent list context and then opens a brand new list for the indented comment, instead of including the table in the list containing 'parent' and 'here is a comment'. I don't know if Parsoid matches that behavior or not.

Sep 15 2020, 3:36 PM · Patch-For-Review, MediaWiki-Parser, Parsoid
cscott created T262943: Parser tests does not cover the "indented table" case.
Sep 15 2020, 3:17 PM · Patch-For-Review, MediaWiki-Parser, Parsoid

Sep 14 2020

cscott added a comment to T262838: Creating a new page on Wikitech using VE fails with a Parsoid/RESTBase error message.

Probably has more to do with RESTBase (not in use on officewiki) proxying the connecting and "fixing" VisualEditor's "revision 0" URL.

Sep 14 2020, 6:23 PM · User-Ryasmeen, Skipped QA, MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Editing-team (FY2020-21 Kanban Board), VisualEditor, wikitech.wikimedia.org, Parsoid
cscott created T262833: Refactor Link handling out of <figure> handling.
Sep 14 2020, 4:50 PM · Parsoid

Sep 10 2020

cscott added a comment to T262410: Space at start of parent list item (before template) removed.

My thinking was perhaps to revisit the strict hierarchy of selser and recognize that in some situations we can still use selser on a parent element even if the child was dirtied. ie:

foo
* bar
** bat

becomes:

foo
<ul><li>bar
<ul><li>bat
</li></ul></li></ul>

and the <ul> which contains bar contains bat as well, but there's a sort of tail-call optimization we could do perhaps to selser the : bar part of that as long as the modified child means certain requirements.

Sep 10 2020, 7:25 PM · DiscussionTools, Parsoid
cscott updated the task description for T262500: MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk.
Sep 10 2020, 4:51 PM · Patch-For-Review, I18n, MediaWiki-Internationalization
cscott committed rELUA2bec230e3d08: Use Language::formatNumNoSeparators where appropriate (authored by cscott).
Use Language::formatNumNoSeparators where appropriate
Sep 10 2020, 9:08 AM
cscott created T262500: MediaWiki $minimumGroupingDigits is differs from CLDR for hy, ru, uk.
Sep 10 2020, 1:54 AM · Patch-For-Review, I18n, MediaWiki-Internationalization

Sep 9 2020

cscott added a comment to T167088: Replace formatnum implementation with PHP NumberFormatter .

Interesting issues discovered during T237467 is that commafy/formatNum have been used to apply to arbitrary strings, not just numeric strings, and there's a decent amount of sloppiness wrt whether the string it applies to is in latin digits or native digits/separators. This is a side-effect of there being no difference between them in us/euro wikis, presumably, so most people "don't worry" about whether they are using formatNum correctly. Anyway, the cleanups to formatNum/commafy for T237467 are tightening the interfaces which will hopefully make this task easier as well.

Sep 9 2020, 1:57 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Language-Team, MediaWiki-Internationalization, I18n

Sep 3 2020

cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

However, I would like to point out that Language::formatNum() previously accepted any string, not only numeric ones. While this is not obvious from the name formatNum – even counter-intuitive – it is how formatNum always worked, and how it is used in several places. It doesn't really format numbers. For example, it's unable to change the number of leading or trailing zeros, unable to change the number of decimal places, unable to round numbers, unable to add units. That's what we expect from something called "format number", but that's not what formatNum does. What it does is converting numeric characters and delimiters into localized ones, and optionally adding delimiters. For example, it will happily turn a string like "In May, 2500 users did 1900000 edits" into "In May, 2,500 users did 1,900,000 edits".

formatNum is just badly named.

As the patch is now, it returns an empty string instead. As far as I'm concerned this is a breaking change that would need to be announced as such. I'm not arguing against this change. We can make it. But when reading this tasks description it appears like it's not necessary to make this change. Maybe we can fix the UTF-8 issue without that additional change? I will happily give it a try if it helps.

Sep 3 2020, 4:28 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid

Aug 31 2020

cscott updated the task description for T250500: ParserCache / RESTBase / Parsoid integration.
Aug 31 2020, 10:33 PM · Platform Engineering Roadmap Decision Making, Code-Health-Objective, MW-1.36-notes (1.36.0-wmf.12; 2020-10-05; NEVER DEPLOYED), Platform Engineering Roadmap, MediaWiki-Parser, Platform Engineering, Parsoid
cscott added a comment to T250500: ParserCache / RESTBase / Parsoid integration.

The ParserOutput object also extends a base class, CacheTime, which contains a bunch of ParserCache-specific expiry code. If this is appropriate for the new MPC implementation, we can include it in the base class we'd like to factor out of ParserOutput; if it is not, then we should keep it out of the base class of ParserOutput and include it (maybe as a trait) in the LegacyParserOutput used by the legacy parsercache and legacy parser.

Aug 31 2020, 10:24 PM · Platform Engineering Roadmap Decision Making, Code-Health-Objective, MW-1.36-notes (1.36.0-wmf.12; 2020-10-05; NEVER DEPLOYED), Platform Engineering Roadmap, MediaWiki-Parser, Platform Engineering, Parsoid
cscott added a comment to T167088: Replace formatnum implementation with PHP NumberFormatter .

One thing to consider is to have consistent formatting both in PHP and JavaScript. For this we actually need generic formatters in both languages, for which we can pass the actual format string (defined in CLDR or specified locally in MediaWiki).

Aug 31 2020, 7:33 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Language-Team, MediaWiki-Internationalization, I18n
cscott added a comment to T167088: Replace formatnum implementation with PHP NumberFormatter .

See also T237467: Invariant failed: Bad UTF-8 (full string verification) which was tracked down to a bug in the current commafy implementation that produced corrupt UTF-8 sequences.

Aug 31 2020, 7:31 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Language-Team, MediaWiki-Internationalization, I18n
cscott added a comment to T258743: Change what symbol is prepended to pings on ar.wiki.

Our recommendation is to forward this issue to Language-Team (Language-2020-October-December) to address. The "proper" solution is (in our opinion) is:

  1. Delegate the Language Team to interact w/ the local wiki communities on each of the <13 wikis which have link prefixes enabled but are currently using the (bogus) default link prefix, to have them either turn off link prefixes or else define a more appropriate prefix charset for their language.
  2. Once zero wikis are using the default link prefix setting, patch core to change the default to match nothing; this would force any future wiki which wants to use link prefixes to define a sane prefix.
  3. Use RELEASE-NOTES and a note in the DefaultSettings.php to explain which characters should definitely *not* be included in link prefix (ie, defining a link prefix that matches &nbsp; is a bad idea).
Aug 31 2020, 7:27 PM · User-Ryasmeen, User-Dyolf77, OWC2020 (Reply Tool Opt Out), Editing-team (FY2020-21 Kanban Board), DiscussionTools, VisualEditor
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).
$ php maintenance/shell.php
>>> $l=Language::factory('kn')
>>> $l->formatNum('7')
=> "೭"
>>> $l->formatNum('a')
=> "" $l->formatNum("\xE0\xB3\xAD\xEF\xBF\xBD\x30")
=> b"à"

And of course (b"à" is \xE0)... at least this is easy to reproduce, I guess.

Aug 31 2020, 5:58 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

Looks like a problem with https://kn.wikipedia.org/wiki/Template:Rnd -- first off, which is unnecessary, because there's a round operator to #expr so you be able to just use that -- but it appears that the rnd template is invoking a scribunto module:

{{#invoke:Math|precision_format| {{{1}}} | {{{2|0}}}}}}}

in this case:

{{#invoke:Math|precision_format| 7.0435637216 | 1}}}}

which then emits \xE0\xB3\xAD\xEF\xBF\xBD\x30 -- which is still valid UTF-8, although certainly not what is expected (which would be 7 AFAICT). That valid-but-bogus string is given to {{formatnum}}, though, which expects a string in *latin script numerals*, and somehow massacres this to \xE0 (just the first character of the input, breaking apart the UTF-8 multibyte character).

Aug 31 2020, 5:47 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid
cscott added a comment to T237467: Invariant failed: Bad UTF-8 (full string verification).

It appears that in https://kn.wikipedia.org/wiki/%E0%B2%9F%E0%B3%86%E0%B2%82%E0%B2%AA%E0%B3%8D%E0%B2%B2%E0%B3%87%E0%B2%9F%E0%B3%81:USCensusPop the fragment

{{#ifexpr: {{#if:{{{1810|}}}|{{{1810}}}|0}} | {{formatnum:{{rnd| (100 * {{{1820}}}/{{{1810}}} - 100) | 1}}}}% | <center>—</center> }}

is generating \340%, (aka "\xE0\x25") which is bad UTF-8.

Aug 31 2020, 5:21 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), User-notice, Parsing-Active-Work, WMDE-QWERTY-Sprint-2020-08-26, Patch-For-Review, Wikimedia-production-error, User-brennen, Parsoid