cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (221 w, 5 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Fri, Jan 11

cscott added a comment to T25932: Enable, whitelist, and incorporate semantic HTML5 elements.

Parsoid uses <section>, <figcaption>, and <figure> already in its output (and thus the main parser will too, as Parsoid is merged into core). <picture> could be considered as part of media layout, but we are using other better-supported mechanisms for responsive images at the moment. These tags should not be whitelisted in article content as they conflict with wikitext features.

Fri, Jan 11, 3:08 AM · Epic, Accessibility, MediaWiki-Parser

Tue, Jan 8

Man77 awarded T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms) a Heartbreak token.
Tue, Jan 8, 7:38 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser

Thu, Jan 3

cscott added a comment to T101841: Value-less extension attributes not preserved.

I think HTML5 says that an empty attribute is actually itself, not the empty string? (Let me check...)

Thu, Jan 3, 10:39 PM · Parsoid

Wed, Jan 2

cscott added a comment to T212124: Consider adding decoding=async to our img tags.

These sort of patches should be ported to Parsoid / at least have Parsoid ping'ed regarding them.

Wed, Jan 2, 9:28 PM · MW-1.33-notes (1.33.0-wmf.13; 2019-01-15), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-extensions-General, Parsing-Team, Performance-Team

Dec 18 2018

cscott added a comment to T148274: Implement a convenient way to link to ISBNs without magic links.

If you're using Visual Editor, it would auto-complete the proper ISBN markup without anyone having to learn it specifically.

Dec 18 2018, 9:28 PM · Patch-For-Review, MediaWiki-Parser
cscott added a comment to T204694: cloudvps: telnet project trusty deprecation.

I'm inclined to let this age out. In addition to being based on the OCG framework and the old VM infrastructure, it's based on the Parsoid/JS service, which is also due to be replaced in the new year. It would be a good April 1, 2019 project to resurrect it based on Parsoid/PHP (and no OCG), perhaps.

Dec 18 2018, 4:38 PM · Cloud-VPS (Ubuntu Trusty Deprecation)

Dec 11 2018

cscott added a comment to T211527: Notice: Undefined variable: wgTidyConf in /srv/mediawiki/wmf-config/CommonSettings.php on line 3672.

Thanks, all. Step by step...

Dec 11 2018, 1:28 PM · Core Platform Team Backlog (Watching / External), Parsing-Team

Dec 9 2018

takidelfin awarded T208620: Parsoid should support SVG thumbnails in page language a Like token.
Dec 9 2018, 3:43 PM · Parsoid

Dec 6 2018

cscott added a comment to T197242: Transition citoid to use Zotero's translation-server-v2.

If you could provide more details, I'd certainly be interested in helping debug the XPath library interaction. Domino is pretty heavily performance-optimized at this point.

Dec 6 2018, 10:49 PM · Patch-For-Review, Services (done), VisualEditor (Current work), Citoid, Operations
cscott committed rMLLC19084f38081a: Enforce "no-buffer-constructor" (authored by cscott).
Enforce "no-buffer-constructor"
Dec 6 2018, 10:44 PM

Dec 4 2018

cscott closed T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms) as Resolved.

I think this bug is fixed everywhere now. Resolving; reopen if you find a problem that can't be fixed by purging parsercache.

Dec 4 2018, 4:18 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser
cscott renamed T211074: Support BCP 47 codes in Parsoid from Sync baseconfigs to Support BCP 47 codes in Parsoid.
Dec 4 2018, 5:17 AM · Parsoid
cscott added a comment to T211074: Support BCP 47 codes in Parsoid.

Yeah, I need to port https://gerrit.wikimedia.org/r/445664 to Parsoid. The BCP 47 codes are exported from siteinfo as of https://gerrit.wikimedia.org/r/460038 which was merged in October, so if we haven't sync'ed the baseconfigs since then we probably need to.

Dec 4 2018, 5:17 AM · Parsoid

Dec 3 2018

Krinkle awarded T209902: Requesting +2 rights for Prtksxna on jsdoc/wmf-theme a Orange Medal token.
Dec 3 2018, 3:08 AM · JSDoc WMF theme, Repository-Ownership-Requests

Nov 29 2018

cscott added a comment to T208620: Parsoid should support SVG thumbnails in page language.

Starting points, copied from IRC:

Nov 29 2018, 4:46 PM · Parsoid

Nov 28 2018

cscott committed rMLLCa90967f7eba2: Add AUTHORS.txt file (authored by cscott).
Add AUTHORS.txt file
Nov 28 2018, 10:56 PM
cscott committed rMLLC630d4a200d5e: Update repo URL in package.json after migration to WMF gerrit (authored by cscott).
Update repo URL in package.json after migration to WMF gerrit
Nov 28 2018, 10:56 PM
cscott added a comment to T210548: gzip-encoded page properties can't be exported from the API.

Yes, true. But we need some way to disambiguate between returning \xFF meaning the literal byte 255 and \u00FF meaning the UTF-8 byte sequence denoting codepoint 255.

Nov 28 2018, 10:05 PM · Patch-For-Review, Core Platform Team Kanban (Waiting for Review), Maps (Kartographer), MediaWiki-API
cscott added a comment to T210490: Some names like "Aachen" are sorted wrongly in Norwegian.

I don't have any problem with using the unicode codepoint. I just don't think we should invent a bogus entity name for it. You can use the codepoint as a hex or decimal character reference in wikitext. Perhaps even file a bug with the W3C/WHATWG to add the &cgj; entity upstream. Many places in MW assume that the MW entity names correspond to valid HTML entity names; I don't think wikitext shouldn't have its own nonstandard entities.

Nov 28 2018, 8:24 PM · MediaWiki-Categories, MediaWiki-Internationalization
cscott added a comment to T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms).

https://en.wikipedia.org/wiki/User:Cscott/T209236 is my test case, but I agree: even after purging the page I'm still seeing a &para; on the figure link. But that makes sense 'cuz you rolled it back...

Nov 28 2018, 1:14 AM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser

Nov 27 2018

cscott added a comment to T210548: gzip-encoded page properties can't be exported from the API.

Probably the most straightforward solution would be to deprecate the "properties" value in ApiParse.php and replace it with a "binProperties" value which is the bin2hex'ed value of each property. That would ensure that there was some means of exporting even binary page properties, such as those used by Kartographer, TemplateData, Graph, etc.

Nov 27 2018, 10:42 PM · Patch-For-Review, Core Platform Team Kanban (Waiting for Review), Maps (Kartographer), MediaWiki-API
cscott created T210550: "Empty JSON response" from ParsoidBatchAPI when content includes <mapframe>.
Nov 27 2018, 9:41 PM · MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), Patch-For-Review, Maps (Kartographer), Parsoid
cscott added a comment to T210548: gzip-encoded page properties can't be exported from the API.

As an example, PHP var_export says the raw value is:

'kartographer' => ' � ' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . '' . "\0" . ' E�� �0 E�e�HJKm� \\�7��1 "P2� ���e�nϹ��pW�h!�uZ��y�E�}� 3��ʣc�83!� ���}�GC �S �@ 3&�g�R�>̼���' . "\0" . '' . "\0" . '' . "\0" . '',

but the value exported by the API for this is:

"kartographer": "\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffdE\ufffd\ufffd\ufffd\ufffd0\ufffdE\ufffde\ufffdHJKm\ufffd\ufffd\\\ufffd7\ufffd\ufffd1\ufffd\"P2\ufffd\ufffd\ufffd\ufffd\ufffde\ufffdn\u03f9\ufffd\ufffdpW\ufffdh!\ufffduZ\ufffd\ufffdy\ufffdE\ufffd}\ufffd\ufffd3\ufffd\ufffd\u02a3\u436e\ufffdU\ufffd\ufffdv\ufffd\u06c2\ufffd\ufffd\ufffdM|\ufffdB\ufffd\ufffd\ufffd\ufffdF\u06a0\ufffd\ufffdk\ufffd\u6615\ufffd\ufffd\ufffd0\ufffd\ufffdk\ufffd\ufffdR\ufffdJ\ufffd\ufffdY\ufffd\ufffdn\ufffd\ufffdR\ufffd\ufffd\ufffd\ufffd)\ufffd\ufffd`\ufffd\ufffd\ufffd\ufffd\ufffdn\ufffd\rc\ufffd83!\ufffd\ufffd\ufffd\ufffd}\ufffdGC\ufffd\ufffdS\ufffd\ufffd@\ufffd3&\ufffdg\ufffdR\ufffd>\u033c\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd",

Note that all the \0 have become \ufffd.

Nov 27 2018, 9:30 PM · Patch-For-Review, Core Platform Team Kanban (Waiting for Review), Maps (Kartographer), MediaWiki-API
cscott created T210548: gzip-encoded page properties can't be exported from the API.
Nov 27 2018, 9:27 PM · Patch-For-Review, Core Platform Team Kanban (Waiting for Review), Maps (Kartographer), MediaWiki-API
cscott created T210511: Parser Performance Benchmark: Short Strings.
Nov 27 2018, 3:39 PM · Parsoid, MediaWiki-Parser

Nov 26 2018

cscott added a comment to T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms).

Script to generate regexp matching all semicolon-less HTML entities is at P7844; this was used in the patch linked above.

Nov 26 2018, 8:57 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser
cscott created P7844 Generate regexp for semicolon-less HTML entities.
Nov 26 2018, 8:56 PM · MediaWiki-Parser
cscott created T210437: Sanitizer::stripAllTags shouldn't expand legacy "semicolon-less" HTML5 entities.
Nov 26 2018, 7:34 PM · Patch-For-Review, MediaWiki-Parser
cscott added a comment to T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms).

Confirmed the bug exists in core (but not in <gallery>) and in Parsoid (both in native links and <gallery>). &para[A-Za-z0-9=] should never be entity decoded, according to https://www.w3.org/TR/html5/syntax.html#character-reference-state ; investigating what's going on here.

Nov 26 2018, 4:56 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser
cscott added a comment to T209236: "&params" URL parameter (used in a link parameter in [[File]] markup) incorrectly parsed as "¶ms" (%C2%B6ms).

Let me take a quick look. Apologies to German WP, this was a long holiday weekend in the US which probably delayed attention to this problem.

Nov 26 2018, 4:31 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Regression, MediaWiki-Parser

Nov 22 2018

abian awarded T207168: Provide JSON-LD support for Wikidata a Like token.
Nov 22 2018, 2:41 PM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata, MediaWiki-extensions-WikibaseRepository

Nov 21 2018

freephile awarded T100841: Support for dynamically enabling new wikis a Mountain of Wealth token.
Nov 21 2018, 2:43 PM · Patch-For-Review, Parsoid
Kghbln awarded T100841: Support for dynamically enabling new wikis a Yellow Medal token.
Nov 21 2018, 8:46 AM · Patch-For-Review, Parsoid

Nov 20 2018

cscott closed T209902: Requesting +2 rights for Prtksxna on jsdoc/wmf-theme as Resolved.

Well, I added you as co-owner of the jsdoc-wmf-theme group. That should do it, I hope?

Nov 20 2018, 5:36 PM · JSDoc WMF theme, Repository-Ownership-Requests
cscott added a comment to T209902: Requesting +2 rights for Prtksxna on jsdoc/wmf-theme .

Fine with me! I'm the owner of the group, how do I go about giving you +2 rights I wonder?

Nov 20 2018, 5:34 PM · JSDoc WMF theme, Repository-Ownership-Requests

Nov 17 2018

ToBeFree awarded T113004: Make it easy to fork, branch, and merge pages (or more) a Piece of Eight token.
Nov 17 2018, 12:33 AM · Community-Wishlist-Survey-2015, Contributors-Team, Wikimedia-Developer-Summit-2016

Nov 13 2018

cscott created T209420: Parsoid should either support wgAllowExternalImages or we should deprecate it in core.
Nov 13 2018, 9:38 PM · MediaWiki-Parser, Parsoid

Nov 11 2018

Liuxinyu970226 awarded T149667: Amazing Article Annotations a Love token.
Nov 11 2018, 8:14 AM · Parsing-Team, Cite, VisualEditor, ContentTranslation, MediaWiki-extensions-Translate, Wikispeech, Wikimedia-Developer-Summit (2017)
Liuxinyu970226 awarded T112984: Real Time Collaborative Editing a Like token.
Nov 11 2018, 7:33 AM · Contributors-Team, Wikimedia-Developer-Summit-2016
Liuxinyu970226 awarded T113004: Make it easy to fork, branch, and merge pages (or more) a Like token.
Nov 11 2018, 7:11 AM · Community-Wishlist-Survey-2015, Contributors-Team, Wikimedia-Developer-Summit-2016

Nov 8 2018

cscott added a comment to T199332: PHP Warning: count(): Parameter must be an array or an object that implements Countable in Serializer.php.

I need to dig into PHP semantics. Is it possible you might be suppressing the notice on $parent->children because of the =& instead of plain =? (line 246)

Nov 8 2018, 4:22 PM · Core Platform Team Kanban (Blocked Externally), Core Platform Team (Security, stability, performance and scalability (TEC1)), RemexHtml

Nov 7 2018

cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Did you check the other issues from https://phabricator.wikimedia.org/T206940#4670526 ? Sounds like you're saying that wt2html is fixed (both PHP and Parsoid agree and do something sensible) but that html2wt for video specifically is still broken since it treats the (invisible) embedded alt attribute as plaintext rather than HTML. Or maybe html2wt is alright but we shouldn't be embedding the invisible alt as HTML but should be doing the same tag-stripping that we would do for a "real" alt attribute. (I think I'm a little partial to this latter.)

Nov 7 2018, 10:45 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor

Nov 3 2018

Seb35 awarded T100841: Support for dynamically enabling new wikis a Yellow Medal token.
Nov 3 2018, 4:00 PM · Patch-For-Review, Parsoid

Nov 2 2018

cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

Copied from a discussion at https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/467531/6/lib/wt2html/tt/LinkHandler.js@1014 wrt how [[File:Foo.jpg|{{sometemplate}}]] gets parsed:

Nov 2 2018, 8:31 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott created T208620: Parsoid should support SVG thumbnails in page language.
Nov 2 2018, 7:51 PM · Parsoid
cscott created T208619: Parsoid missing an empty row in table output.
Nov 2 2018, 7:36 PM · Parsoid
cscott added a comment to T208549: HHVM CPU usage when deploying MediaWiki.
  • I didn't check the size of the bytecode cache databases to see if that exploded. If that's the case, I might want to purge it today.
Nov 2 2018, 5:19 PM · Wikimedia-production-error, Release-Engineering-Team, Operations
cscott added a comment to T208549: HHVM CPU usage when deploying MediaWiki.

A bunch of new messages are getting tidied and going through remex now ( https://gerrit.wikimedia.org/r/#/q/topic:deprecate-wgtidy ) -- although that *shouldn't* cause problems, here are a couple of wild theories:

  1. MessageCache is missing somehow ( 4b1db1190bb8f2a115c6a81a5ee487b7d18cd303 ) and tying up CPU (but I'm pretty sure the misses would show in logs). Rolling back would cause misses on the updated messages so "not solve the issue".
  2. ParserCache is missing somehow ( 58abac2d1489cdfaaf2ffdf2f9e1214509760b31 ) but I'm pretty sure that would show up in analytics. Same thing, roll back could cause misses of the previous misses.
  3. Some system message is triggering some weird infinite-loop bug in Remex. Of course, remex has been used for tidy in all article content for months now, but I'm always ready to be surprised. If there's a specific URL which triggers the CPU hog behavior, that would be a big clue. But wouldn't explain why rolling back didn't help.
  4. It's not my fault at all. (I like this one the best, but it doesn't help you fine folks in ops any.) ;)
Nov 2 2018, 5:18 PM · Wikimedia-production-error, Release-Engineering-Team, Operations

Oct 30 2018

cscott added a comment to T198970: Epic: Implement SEO improvements suggested by Go Fish Digital.

SEO optimization came up on the Audiences 1 QCI presentation, and it was mentioned that one question we had was whether Google used the same ingestion pipeline for all languages / wikis, or whether there were certain things that would work differently on (say) English wikipedia -vs- Spanish wikisource.

Oct 30 2018, 3:52 PM · SEO, Epic
cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

I think I could be satisfied with a good documentation comment for this method elaborating these points:
(a) state explicitly that this uses the content language, not the user interface language
(b) is intended for parser functions and tag hooks which appear in article content, not UX (special pages, warnings, etc)
(c) the result will be subject to language conversion (which is unusual for system messages, which are more usually pre-converted to a specific variant)

Oct 30 2018, 3:40 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser

Oct 29 2018

cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

See also T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis, which goes in to more detail about the problems one has in general when trying to make "user interface elements" from "content" markup.

Oct 29 2018, 9:50 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser
cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

I'm not convinced the helper should be in Parser.php. Parser functions and tag hooks are a bit of a special case, since they actually are in the content language. In 95% of the cases where system messages are used, they should be in the user interface language -- compare https://codesearch.wmflabs.org/search/?q=addWikiTextAsContent with https://codesearch.wmflabs.org/search/?q=addWikiTextAsInterface for example. Even when you do want the content language, it's probably because the thing you're adding would be better off as an interface message, for example see bacd87e4942baa34808a1b77d3b29bfdb566cc17.

Oct 29 2018, 9:48 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser

Oct 25 2018

cscott added a comment to T207930: Moving or deleting a translatable page on mediawiki.org triggers an error message.

Verified that rETRAb2586aebd94d: Avoid untidy calls to OutputPage::addWikiText() didn't make it to the REL1_32 branch of Translate, and so no further backporting should be needed.

Oct 25 2018, 8:59 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Language-Team (Language-2018-October-December), Wikimedia-production-error, MediaWiki Language Extension Bundle, Operations, MediaWiki-extensions-Translate

Oct 24 2018

cscott added a comment to T144467: Security review for Google MT for Content Translation.

I think you still need to sanitize -- for example, the <script> tag has its payload delivered by the node children, not the attributes. Potentially <style> is dangerous in the same way, if there is a CSS vulnerability. But again, a tag whitelist should be sufficient for this.

Oct 24 2018, 8:40 PM · Core Platform Team Backlog (Watching / External), Language-Team (Language-2018-October-December), Security, CX-deployments, Language-2017-Oct-Dec, Services (watching), Parsing-Team, Language-Q1-2016-17 Sprint 6, Language-Engineering July-September 2016, Security-Team-Reviews, Security-Extensions

Oct 23 2018

cscott added a project to T207791: Can't select Simple English wikipedia from ULS in MW 1.32: MW-1.32-release.
Oct 23 2018, 8:24 PM · UniversalLanguageSelector
cscott added a comment to T207791: Can't select Simple English wikipedia from ULS in MW 1.32.

Patch at https://github.com/wikimedia/language-data/pull/37

Oct 23 2018, 8:15 PM · UniversalLanguageSelector
cscott renamed T207791: Can't select Simple English wikipedia from ULS in MW 1.32 from Can't select Simple English wikipedia from ULS to Can't select Simple English wikipedia from ULS in MW 1.32.
Oct 23 2018, 8:14 PM · UniversalLanguageSelector
cscott created T207791: Can't select Simple English wikipedia from ULS in MW 1.32.
Oct 23 2018, 8:13 PM · UniversalLanguageSelector
cscott awarded T190129: Consolidate language metadata into language-data and use it in MediaWiki core a Like token.
Oct 23 2018, 7:55 PM · TechCom-RFC, Epic, MediaWiki-Installer, I18n
cscott updated the task description for T191925: Discuss use of Finite State Transducer based formalism for language variant implementations.
Oct 23 2018, 2:42 PM · Core Platform Team Backlog (Watching / External), Services (watching), TechCom, Parsoid

Oct 22 2018

cscott committed rEMOOCdf40b22d8a27: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Oct 22 2018, 7:14 PM
cscott added a comment to T196968: Re-organize the apache configuration for MediaWiki in puppet.

w00t! Now we can do https://gerrit.wikimedia.org/r/368248 (T117845) ?

Oct 22 2018, 5:54 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
cscott added a comment to T191771: [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy.

There's a lot of backlog to read through but -- yeah, Wikimedia has *always* used their own patched version of tidy. The stock debian libtidy has never exactly matched the WMF version, although it's possible the precise differences weren't previously well-covered by parserTests. All use of libtidy is deprecated and is being removed ( https://gerrit.wikimedia.org/r/467972 ) so it's just a matter of keeping our tests of long-term-supported releases sane by maintaining access to the WMF-patched-version-of-old-tidy.

Oct 22 2018, 5:04 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Quibble, Tidy, MediaWiki-Core-Tests, MediaWiki-Parser
cscott committed rEPFM0fb9ec493211: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Oct 22 2018, 4:23 PM
cscott closed T207483: Release remex 2.0.1 as Resolved.
Oct 22 2018, 3:12 PM · RemexHtml
cscott assigned T207483: Release remex 2.0.1 to Legoktm.

Thanks, @Legoktm!

Oct 22 2018, 3:11 PM · RemexHtml

Oct 19 2018

Restricted Application updated subscribers of T94826: Don't crash when MediaWiki returns a page title different from the query because of normalization (Arabic and Malayalam normalization in particular).
Oct 19 2018, 7:37 PM · Pywikibot
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

OK, two patches to fix the issue: belt and suspenders. Both of them are potential candidates for cherry-picking to 1.32, but let's get them merged on 1.33 first.

Oct 19 2018, 7:21 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207483: Release remex 2.0.1.

I bet I technically have all the required permission bits to do this myself -- looks like @Legoktm made the last release -- but it would be helpful to have some instructions, either on-wiki or in the README, since I've never published a composer package before.

Oct 19 2018, 3:49 PM · RemexHtml
cscott created T207483: Release remex 2.0.1.
Oct 19 2018, 3:48 PM · RemexHtml
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Ah, ok, thanks! That means I don't have to worry about this being an "unbreak now" sort of bug.

Oct 19 2018, 2:46 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Arrrrgh. WikiBase/lib/includes/LanguageWithConversion.php contains code cut-and-pasted from mediawiki-core. Anyone want to guess the odds that it wasn't updated when the code from core was updated?

Oct 19 2018, 2:23 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Ok, merged T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into this one, because the stack trace is pretty much identical:

Oct 19 2018, 2:17 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott merged T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into T207433: uselang=sr-cyrl causes fatal exception of type "MWException".
Oct 19 2018, 2:09 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott merged task T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into T207433: uselang=sr-cyrl causes fatal exception of type "MWException".
Oct 19 2018, 2:09 PM · Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Also -- sr-cyrl, zh-hans-tw, etc are not actually the mediawiki-internal names for these languages. https://gerrit.wikimedia.org/r/460039 would have added support for using the standard names, but I'm a little bit surprised that anything is generating links to these "non-mediawiki" (but BCP 47 standard) codes. I mean, it's good -- we *should* be trying to move to using the proper BCP 47 codes -- but it's worth trying to track down where these links are coming from, since they wouldn't have worked prior to 460039.

Oct 19 2018, 11:38 AM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Given that uselang processing is involved, perhaps https://gerrit.wikimedia.org/r/460039 is the proximate cause. Can we get a stack trace for that exception?

Oct 19 2018, 11:28 AM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error

Oct 18 2018

cscott added a comment to T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.

Re question #3 wrt JS-vs-PHP: it's important to understand the role of the social ecosystem involved. One reason why markdown is widespread (and wikitext is not used at all outside the WMF environment) is that we've never had a really good/fast standalone parser. Even our own WMF research department uses mwparserfromhell (written in Python) instead of either of the two "official" WMF parsers. (And they probably won't migrate to an official parser even once/if Parsoid is in PHP.)

Oct 18 2018, 7:43 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.

Re the first question, I feel the "significance" section presupposes a particular answer (ie, that WYSIWYG editors aren't appropriate for templates, and so wikitext editing should be limited). I don't agree (T114454), but regardless, here's my attempt at rephrasing the prompt in a neutral manner:

Oct 18 2018, 7:17 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott updated the task description for T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.
Oct 18 2018, 7:09 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T197690: Check the status of v8.js PHP extension, and assess its applicability for the needs of service-side rendering of Wikibase UI.

Any updates on this task? As I said, I'd like to write Scribunto/JS using v8js at some point. If v8js would be helpful to wikibase, perhaps we can pool efforts.

Oct 18 2018, 6:41 PM · Wikidata, Wikidata-Frontend
cscott added a comment to T204945: Deprecate one of the Preprocessor implementations for 1.33.

It's not happening in 1.32, we just branched that. Hopefully for 1.33!

Oct 18 2018, 6:40 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review
cscott added subtasks for T204945: Deprecate one of the Preprocessor implementations for 1.33: T176370: Migrate to PHP 7 in WMF production, T192166: Drop HHVM support from MediaWiki.
Oct 18 2018, 6:37 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review
cscott added a parent task for T176370: Migrate to PHP 7 in WMF production: T204945: Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:37 PM · Core Platform Team Kanban (Doing), Core Platform Team (PHP7 (TEC4)), Patch-For-Review, TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
cscott added a parent task for T192166: Drop HHVM support from MediaWiki: T204945: Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:37 PM · Core Platform Team Backlog (Watching / External), Patch-For-Review, HHVM
cscott renamed T204945: Deprecate one of the Preprocessor implementations for 1.33 from Deprecate one of the Preprocessor implementations for 1.32 to Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:36 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review

Oct 17 2018

cscott committed rECKTca01b6929470: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Oct 17 2018, 4:40 PM
cscott added a comment to T205972: Fixup Phan errors in SecurePoll.

I filed T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension out of an abundance of caution, but it's probably a dup. Slightly different errors, though...

Oct 17 2018, 4:28 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, phan-taint-check-plugin, MediaWiki-extensions-SecurePoll
cscott updated subscribers of T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension.
Oct 17 2018, 4:26 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott added a project to T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension: MediaWiki-extensions-SecurePoll.
Oct 17 2018, 4:18 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott created T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension.
Oct 17 2018, 4:17 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott added a comment to T100841: Support for dynamically enabling new wikis.

@Osnard Could you see if my updated patchset works for you? Rather than explicitly test reverseMwApiMap again, I just unconditionally called ParsoidConfig#getPrefixFor() to set the prefix (even if it was already set). That's a little more consistent with the direction we really want to go (T206764: Remove `prefix` from Parsoid and use `domain` consistently as configuration key).

Oct 17 2018, 1:43 PM · Patch-For-Review, Parsoid
cscott updated the task description for T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.
Oct 17 2018, 2:37 AM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T63993: Babel language codes should be normalised to lower case when used in categories.

In https://gerrit.wikimedia.org/r/446766 I introduced BabelLanguageCodes::getCategoryCode() which maps mediawiki-internal language codes to appropriate category names. The current algorithm is to use the (lowercased) mediawiki internal code if it doesn't contain a hyphen (eg en, simple, de), otherwise use the properly-capitalized BCP 47 code (zh-Hans, etc). This matched previous expectations as canonized in the extensions phpunit tests. If we wanted some other behavior for category codes it ought to be straightforward to patch getCategoryCode() for whatever is desired.

Oct 17 2018, 2:30 AM · MW-1.28-release (WMF-deploy-2016-08-16_(1.28.0-wmf.15)), MediaWiki-extensions-Babel
cscott added a comment to T207088: Remex double-decodes HTML entities on PHP (not HHVM).

The patch is merged, but we're going to need a new version of remex released to composer and mediawiki-core updated to require the new version before this bug is actually fixed in production uses of php 7 (ie, when running the php 7 jenkins tests).

Oct 17 2018, 2:12 AM · MW-1.31-release-notes, MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), MW-1.32-release, MW-1.31-release, PHP 7.3 support, PHP 7.0 support, Patch-For-Review, RemexHtml
cscott updated the task description for T93715: [EPIC] Make Parsoid HTML output completely deterministic.
Oct 17 2018, 2:10 AM · Parsoid

Oct 16 2018

cscott added a comment to T207168: Provide JSON-LD support for Wikidata.

We only emit the "@graph" form in purtle if the API is used to annotate more than a single entity: https://github.com/wikimedia/purtle/blob/master/src/JsonLdRdfWriter.php#L37

Oct 16 2018, 7:52 PM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata, MediaWiki-extensions-WikibaseRepository
cscott added a comment to T189966: Audit and simplify MediaWiki initialisation code (Spring 2018).
  • Special CDN integration with the Key header (formerly X-Vary-Options). X-Vary-Options was a feature I introduced to avoid splitting the CDN cache between IE and everything else, which were sending slightly different Accept-Encoding request headers (differing by a space)
Oct 16 2018, 2:23 PM · Core Platform Team Backlog (Watching / External), Technical-Debt, MediaWiki-General-or-Unknown, Performance-Team
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

This turned into a little rathole, but I've come out the other end fixing (a) how Parsoid parses alt/link options (wikitext markup including <nowiki> is allowed), (b) how Parsoid renders alt/link options (consistent stripping), (c) how core renders link options (<nowiki> expansion and stripping consistent with alt) , (d) how core handles ampersands in alt/link options (bug in remex), and (e) how Parsoid handles ampersands in alt/link options. Now we've just got to get those three patches merged, starting with the remex bug (T207088: Remex double-decodes HTML entities on PHP (not HHVM)) because the newly-added test cases won't pass on jenkins until remex is fixed and the fix is packaged and released so composer can get to it.

Oct 16 2018, 1:34 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor

Oct 15 2018

cscott updated subscribers of T207088: Remex double-decodes HTML entities on PHP (not HHVM).
Oct 15 2018, 8:32 PM · MW-1.31-release-notes, MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), MW-1.32-release, MW-1.31-release, PHP 7.3 support, PHP 7.0 support, Patch-For-Review, RemexHtml