cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (212 w, 6 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Sat, Nov 17

ToBeFree awarded T113004: Make it easy to fork, branch, and merge pages (or more) a Piece of Eight token.
Sat, Nov 17, 12:33 AM · Community-Wishlist-Survey-2015, Contributors-Team, Wikimedia-Developer-Summit-2016

Tue, Nov 13

cscott created T209420: Parsoid should either support wgAllowExternalImages or we should deprecate it in core.
Tue, Nov 13, 9:38 PM · MediaWiki-Parser, Parsoid

Sun, Nov 11

Liuxinyu970226 awarded T149667: Amazing Article Annotations a Love token.
Sun, Nov 11, 8:14 AM · Parsing-Team, Cite, VisualEditor, ContentTranslation, MediaWiki-extensions-Translate, Wikispeech, Wikimedia-Developer-Summit (2017)
Liuxinyu970226 awarded T112984: Real Time Collaborative Editing a Like token.
Sun, Nov 11, 7:33 AM · Contributors-Team, Wikimedia-Developer-Summit-2016
Liuxinyu970226 awarded T113004: Make it easy to fork, branch, and merge pages (or more) a Like token.
Sun, Nov 11, 7:11 AM · Community-Wishlist-Survey-2015, Contributors-Team, Wikimedia-Developer-Summit-2016

Thu, Nov 8

cscott added a comment to T199332: PHP Warning: count(): Parameter must be an array or an object that implements Countable in Serializer.php.

I need to dig into PHP semantics. Is it possible you might be suppressing the notice on $parent->children because of the =& instead of plain =? (line 246)

Thu, Nov 8, 4:22 PM · Core Platform Team Kanban, RemexHtml

Wed, Nov 7

cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Did you check the other issues from https://phabricator.wikimedia.org/T206940#4670526 ? Sounds like you're saying that wt2html is fixed (both PHP and Parsoid agree and do something sensible) but that html2wt for video specifically is still broken since it treats the (invisible) embedded alt attribute as plaintext rather than HTML. Or maybe html2wt is alright but we shouldn't be embedding the invisible alt as HTML but should be doing the same tag-stripping that we would do for a "real" alt attribute. (I think I'm a little partial to this latter.)

Wed, Nov 7, 10:45 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor

Sat, Nov 3

Seb35 awarded T100841: Support for dynamically enabling new wikis a Yellow Medal token.
Sat, Nov 3, 4:00 PM · Patch-For-Review, Parsoid

Fri, Nov 2

cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

Copied from a discussion at https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/467531/6/lib/wt2html/tt/LinkHandler.js@1014 wrt how [[File:Foo.jpg|{{sometemplate}}]] gets parsed:

Fri, Nov 2, 8:31 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott created T208620: Parsoid should support SVG thumbnails in page language.
Fri, Nov 2, 7:51 PM · Parsoid
cscott created T208619: Parsoid missing an empty row in table output.
Fri, Nov 2, 7:36 PM · Parsoid
cscott added a comment to T208549: HHVM CPU usage when deploying MediaWiki.
  • I didn't check the size of the bytecode cache databases to see if that exploded. If that's the case, I might want to purge it today.
Fri, Nov 2, 5:19 PM · Wikimedia-production-error, Release-Engineering-Team, Operations
cscott added a comment to T208549: HHVM CPU usage when deploying MediaWiki.

A bunch of new messages are getting tidied and going through remex now ( https://gerrit.wikimedia.org/r/#/q/topic:deprecate-wgtidy ) -- although that *shouldn't* cause problems, here are a couple of wild theories:

  1. MessageCache is missing somehow ( 4b1db1190bb8f2a115c6a81a5ee487b7d18cd303 ) and tying up CPU (but I'm pretty sure the misses would show in logs). Rolling back would cause misses on the updated messages so "not solve the issue".
  2. ParserCache is missing somehow ( 58abac2d1489cdfaaf2ffdf2f9e1214509760b31 ) but I'm pretty sure that would show up in analytics. Same thing, roll back could cause misses of the previous misses.
  3. Some system message is triggering some weird infinite-loop bug in Remex. Of course, remex has been used for tidy in all article content for months now, but I'm always ready to be surprised. If there's a specific URL which triggers the CPU hog behavior, that would be a big clue. But wouldn't explain why rolling back didn't help.
  4. It's not my fault at all. (I like this one the best, but it doesn't help you fine folks in ops any.) ;)
Fri, Nov 2, 5:18 PM · Wikimedia-production-error, Release-Engineering-Team, Operations

Tue, Oct 30

cscott added a comment to T198970: Epic: Implement SEO improvements suggested by Go Fish Digital.

SEO optimization came up on the Audiences 1 QCI presentation, and it was mentioned that one question we had was whether Google used the same ingestion pipeline for all languages / wikis, or whether there were certain things that would work differently on (say) English wikipedia -vs- Spanish wikisource.

Tue, Oct 30, 3:52 PM · SEO, Epic
cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

I think I could be satisfied with a good documentation comment for this method elaborating these points:
(a) state explicitly that this uses the content language, not the user interface language
(b) is intended for parser functions and tag hooks which appear in article content, not UX (special pages, warnings, etc)
(c) the result will be subject to language conversion (which is unusual for system messages, which are more usually pre-converted to a specific variant)

Tue, Oct 30, 3:40 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser

Mon, Oct 29

cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

See also T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis, which goes in to more detail about the problems one has in general when trying to make "user interface elements" from "content" markup.

Mon, Oct 29, 9:50 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser
cscott added a comment to T202481: Parser should have a msg() helper function so people don't localize messages improperly.

I'm not convinced the helper should be in Parser.php. Parser functions and tag hooks are a bit of a special case, since they actually are in the content language. In 95% of the cases where system messages are used, they should be in the user interface language -- compare https://codesearch.wmflabs.org/search/?q=addWikiTextAsContent with https://codesearch.wmflabs.org/search/?q=addWikiTextAsInterface for example. Even when you do want the content language, it's probably because the thing you're adding would be better off as an interface message, for example see bacd87e4942baa34808a1b77d3b29bfdb566cc17.

Mon, Oct 29, 9:48 PM · Patch-For-Review, Google-Code-in-2018, MediaWiki-Parser

Thu, Oct 25

cscott added a comment to T207930: Moving or deleting a translatable page on mediawiki.org triggers an error message.

Verified that rETRAb2586aebd94d: Avoid untidy calls to OutputPage::addWikiText() didn't make it to the REL1_32 branch of Translate, and so no further backporting should be needed.

Thu, Oct 25, 8:59 PM · MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Language-Team (Language-2018-October-December), Wikimedia-production-error, MediaWiki Language Extension Bundle, Operations, MediaWiki-extensions-Translate

Wed, Oct 24

cscott added a comment to T144467: Security review for Google MT for Content Translation.

I think you still need to sanitize -- for example, the <script> tag has its payload delivered by the node children, not the attributes. Potentially <style> is dangerous in the same way, if there is a CSS vulnerability. But again, a tag whitelist should be sufficient for this.

Wed, Oct 24, 8:40 PM · Language-Team (Language-2018-October-December), Security, CX-deployments, Language-2017-Oct-Dec, Services (watching), Parsing-Team, Language-Q1-2016-17 Sprint 6, Language-Engineering July-September 2016, Security-Reviews, Security-Extensions

Tue, Oct 23

cscott added a project to T207791: Can't select Simple English wikipedia from ULS in MW 1.32: MW-1.32-release.
Tue, Oct 23, 8:24 PM · UniversalLanguageSelector
cscott added a comment to T207791: Can't select Simple English wikipedia from ULS in MW 1.32.

Patch at https://github.com/wikimedia/language-data/pull/37

Tue, Oct 23, 8:15 PM · UniversalLanguageSelector
cscott renamed T207791: Can't select Simple English wikipedia from ULS in MW 1.32 from Can't select Simple English wikipedia from ULS to Can't select Simple English wikipedia from ULS in MW 1.32.
Tue, Oct 23, 8:14 PM · UniversalLanguageSelector
cscott created T207791: Can't select Simple English wikipedia from ULS in MW 1.32.
Tue, Oct 23, 8:13 PM · UniversalLanguageSelector
cscott awarded T190129: Consolidate language metadata into language-data and use it in MediaWiki core a Like token.
Tue, Oct 23, 7:55 PM · Epic, MediaWiki-Installer, I18n
cscott updated the task description for T191925: Discuss use of Finite State Transducer based formalism for language variant implementations.
Tue, Oct 23, 2:42 PM · Services (watching), TechCom, Parsoid

Mon, Oct 22

cscott committed rEMOOCdf40b22d8a27: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Mon, Oct 22, 7:14 PM
cscott added a comment to T196968: Re-organize the apache configuration for MediaWiki in puppet.

w00t! Now we can do https://gerrit.wikimedia.org/r/368248 (T117845) ?

Mon, Oct 22, 5:54 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
cscott added a comment to T191771: [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy.

There's a lot of backlog to read through but -- yeah, Wikimedia has *always* used their own patched version of tidy. The stock debian libtidy has never exactly matched the WMF version, although it's possible the precise differences weren't previously well-covered by parserTests. All use of libtidy is deprecated and is being removed ( https://gerrit.wikimedia.org/r/467972 ) so it's just a matter of keeping our tests of long-term-supported releases sane by maintaining access to the WMF-patched-version-of-old-tidy.

Mon, Oct 22, 5:04 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Quibble, Tidy, MediaWiki-Core-Tests, MediaWiki-Parser
cscott committed rEPFM0fb9ec493211: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Mon, Oct 22, 4:23 PM
cscott closed T207483: Release remex 2.0.1 as Resolved.
Mon, Oct 22, 3:12 PM · RemexHtml
cscott assigned T207483: Release remex 2.0.1 to Legoktm.

Thanks, @Legoktm!

Mon, Oct 22, 3:11 PM · RemexHtml

Oct 19 2018

Restricted Application updated subscribers of T94826: Don't crash when MediaWiki returns a page title different from the query because of normalization (Arabic and Malayalam normalization in particular).
Oct 19 2018, 7:37 PM · Pywikibot
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

OK, two patches to fix the issue: belt and suspenders. Both of them are potential candidates for cherry-picking to 1.32, but let's get them merged on 1.33 first.

Oct 19 2018, 7:21 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207483: Release remex 2.0.1.

I bet I technically have all the required permission bits to do this myself -- looks like @Legoktm made the last release -- but it would be helpful to have some instructions, either on-wiki or in the README, since I've never published a composer package before.

Oct 19 2018, 3:49 PM · RemexHtml
cscott created T207483: Release remex 2.0.1.
Oct 19 2018, 3:48 PM · RemexHtml
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Ah, ok, thanks! That means I don't have to worry about this being an "unbreak now" sort of bug.

Oct 19 2018, 2:46 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Arrrrgh. WikiBase/lib/includes/LanguageWithConversion.php contains code cut-and-pasted from mediawiki-core. Anyone want to guess the odds that it wasn't updated when the code from core was updated?

Oct 19 2018, 2:23 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Ok, merged T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into this one, because the stack trace is pretty much identical:

Oct 19 2018, 2:17 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott merged T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into T207433: uselang=sr-cyrl causes fatal exception of type "MWException".
Oct 19 2018, 2:09 PM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott merged task T207447: uselang=zh-hant-hk causes fatal exception of type "BadMethodCallException" into T207433: uselang=sr-cyrl causes fatal exception of type "MWException".
Oct 19 2018, 2:09 PM · Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Also -- sr-cyrl, zh-hans-tw, etc are not actually the mediawiki-internal names for these languages. https://gerrit.wikimedia.org/r/460039 would have added support for using the standard names, but I'm a little bit surprised that anything is generating links to these "non-mediawiki" (but BCP 47 standard) codes. I mean, it's good -- we *should* be trying to move to using the proper BCP 47 codes -- but it's worth trying to track down where these links are coming from, since they wouldn't have worked prior to 460039.

Oct 19 2018, 11:38 AM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error
cscott added a comment to T207433: uselang=sr-cyrl causes fatal exception of type "MWException".

Given that uselang processing is involved, perhaps https://gerrit.wikimedia.org/r/460039 is the proximate cause. Can we get a stack trace for that exception?

Oct 19 2018, 11:28 AM · MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata, Wikimedia-production-error

Oct 18 2018

cscott added a comment to T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.

Re question #3 wrt JS-vs-PHP: it's important to understand the role of the social ecosystem involved. One reason why markdown is widespread (and wikitext is not used at all outside the WMF environment) is that we've never had a really good/fast standalone parser. Even our own WMF research department uses mwparserfromhell (written in Python) instead of either of the two "official" WMF parsers. (And they probably won't migrate to an official parser even once/if Parsoid is in PHP.)

Oct 18 2018, 7:43 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.

Re the first question, I feel the "significance" section presupposes a particular answer (ie, that WYSIWYG editors aren't appropriate for templates, and so wikitext editing should be limited). I don't agree (T114454), but regardless, here's my attempt at rephrasing the prompt in a neutral manner:

Oct 18 2018, 7:17 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott updated the task description for T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.
Oct 18 2018, 7:09 PM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T197690: Check the status of v8.js PHP extension, and assess its applicability for the needs of service-side rendering of Wikibase UI.

Any updates on this task? As I said, I'd like to write Scribunto/JS using v8js at some point. If v8js would be helpful to wikibase, perhaps we can pool efforts.

Oct 18 2018, 6:41 PM · Wikidata, Wikidata-Frontend
cscott added a comment to T204945: Deprecate one of the Preprocessor implementations for 1.33.

It's not happening in 1.32, we just branched that. Hopefully for 1.33!

Oct 18 2018, 6:40 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review
cscott added subtasks for T204945: Deprecate one of the Preprocessor implementations for 1.33: T176370: Migrate to PHP 7 in WMF production, T192166: Drop HHVM support from MediaWiki.
Oct 18 2018, 6:37 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review
cscott added a parent task for T176370: Migrate to PHP 7 in WMF production: T204945: Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:37 PM · Patch-For-Review, Core Platform Team Backlog (Watching / External), TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
cscott added a parent task for T192166: Drop HHVM support from MediaWiki: T204945: Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:37 PM · Core Platform Team Backlog (Watching / External), Patch-For-Review, HHVM
cscott renamed T204945: Deprecate one of the Preprocessor implementations for 1.33 from Deprecate one of the Preprocessor implementations for 1.32 to Deprecate one of the Preprocessor implementations for 1.33.
Oct 18 2018, 6:36 PM · Technical-Debt (Deprecation), MediaWiki-Parser, Patch-For-Review

Oct 17 2018

cscott committed rECKTca01b6929470: Replace deprecated untidy OutputPage::addWikiText() method (authored by cscott).
Replace deprecated untidy OutputPage::addWikiText() method
Oct 17 2018, 4:40 PM
cscott added a comment to T205972: Fixup Phan errors in SecurePoll.

I filed T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension out of an abundance of caution, but it's probably a dup. Slightly different errors, though...

Oct 17 2018, 4:28 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), Patch-For-Review, phan-taint-check-plugin, MediaWiki-extensions-SecurePoll
cscott updated subscribers of T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension.
Oct 17 2018, 4:26 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott added a project to T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension: MediaWiki-extensions-SecurePoll.
Oct 17 2018, 4:18 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott created T207297: Phan SecurityCheck-XSS and SecurityCheck-SQLInjection errors in SecurePoll extension.
Oct 17 2018, 4:17 PM · Patch-For-Review, MediaWiki-extensions-SecurePoll, Security
cscott added a comment to T100841: Support for dynamically enabling new wikis.

@Osnard Could you see if my updated patchset works for you? Rather than explicitly test reverseMwApiMap again, I just unconditionally called ParsoidConfig#getPrefixFor() to set the prefix (even if it was already set). That's a little more consistent with the direction we really want to go (T206764: Remove `prefix` from Parsoid and use `domain` consistently as configuration key).

Oct 17 2018, 1:43 PM · Patch-For-Review, Parsoid
cscott updated the task description for T206066: Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.
Oct 17 2018, 2:37 AM · Parsing-Team, Wikimedia-Technical-Conference-2018
cscott added a comment to T63993: Babel language codes should be normalised to lower case when used in categories.

In https://gerrit.wikimedia.org/r/446766 I introduced BabelLanguageCodes::getCategoryCode() which maps mediawiki-internal language codes to appropriate category names. The current algorithm is to use the (lowercased) mediawiki internal code if it doesn't contain a hyphen (eg en, simple, de), otherwise use the properly-capitalized BCP 47 code (zh-Hans, etc). This matched previous expectations as canonized in the extensions phpunit tests. If we wanted some other behavior for category codes it ought to be straightforward to patch getCategoryCode() for whatever is desired.

Oct 17 2018, 2:30 AM · MW-1.28-release (WMF-deploy-2016-08-16_(1.28.0-wmf.15)), MediaWiki-extensions-Babel
cscott added a comment to T207088: Remex double-decodes HTML entities on PHP (not HHVM).

The patch is merged, but we're going to need a new version of remex released to composer and mediawiki-core updated to require the new version before this bug is actually fixed in production uses of php 7 (ie, when running the php 7 jenkins tests).

Oct 17 2018, 2:12 AM · MW-1.31-release-notes, MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), MW-1.32-release, MW-1.31-release, PHP 7.3 support, PHP 7.0 support, Patch-For-Review, RemexHtml
cscott updated the task description for T93715: [EPIC] Make Parsoid HTML output completely deterministic.
Oct 17 2018, 2:10 AM · Parsoid

Oct 16 2018

cscott added a comment to T207168: Provide JSON-LD support for Wikidata.

We only emit the "@graph" form in purtle if the API is used to annotate more than a single entity: https://github.com/wikimedia/purtle/blob/master/src/JsonLdRdfWriter.php#L37

Oct 16 2018, 7:52 PM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata, MediaWiki-extensions-WikibaseRepository
cscott added a comment to T189966: Audit and simplify MediaWiki initialisation code (Spring 2018).
  • Special CDN integration with the Key header (formerly X-Vary-Options). X-Vary-Options was a feature I introduced to avoid splitting the CDN cache between IE and everything else, which were sending slightly different Accept-Encoding request headers (differing by a space)
Oct 16 2018, 2:23 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Patch-For-Review, Core Platform Team Backlog (Watching / External), Technical-Debt, MediaWiki-General-or-Unknown, Performance-Team
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

This turned into a little rathole, but I've come out the other end fixing (a) how Parsoid parses alt/link options (wikitext markup including <nowiki> is allowed), (b) how Parsoid renders alt/link options (consistent stripping), (c) how core renders link options (<nowiki> expansion and stripping consistent with alt) , (d) how core handles ampersands in alt/link options (bug in remex), and (e) how Parsoid handles ampersands in alt/link options. Now we've just got to get those three patches merged, starting with the remex bug (T207088: Remex double-decodes HTML entities on PHP (not HHVM)) because the newly-added test cases won't pass on jenkins until remex is fixed and the fix is packaged and released so composer can get to it.

Oct 16 2018, 1:34 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor

Oct 15 2018

cscott updated subscribers of T207088: Remex double-decodes HTML entities on PHP (not HHVM).
Oct 15 2018, 8:32 PM · MW-1.31-release-notes, MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), MW-1.32-release, MW-1.31-release, PHP 7.3 support, PHP 7.0 support, Patch-For-Review, RemexHtml
cscott created T207088: Remex double-decodes HTML entities on PHP (not HHVM).
Oct 15 2018, 8:15 PM · MW-1.31-release-notes, MW-1.32-notes, MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), MW-1.32-release, MW-1.31-release, PHP 7.3 support, PHP 7.0 support, Patch-For-Review, RemexHtml
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Turns out there's a bug in how core PHP parses [[File:Foo.jpg|link=Foo''s bar''s]] (which is a valid title) or [[File:Foo.jpg|link=''Main Page'']] (where the italics apparently should be stripped). So now I've got a patch for core as well as one for Parsoid...

Oct 15 2018, 5:50 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Well, a combination of things. It's triggered by the quote marks, but then we're not handling the <nowiki> in the serialized version either. [[File:Foo.jpg|alt=<nowiki>''alt''</nowiki>]] ought to be the "correct" way to get embedded single quotes into the alt value. It looks like we might have a similar issue with the link option as well. I've got a working patch for alt, working on understanding the link issue.

Oct 15 2018, 5:26 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

i'm working on a patch

Oct 15 2018, 1:59 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott created T207032: Lint away 'pxpx'.
Oct 15 2018, 1:53 PM · Parsoid-Linter
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Hm, line breaks appear to be a red herring, see: https://en.wikipedia.org/w/index.php?title=User:Cscott/T206940&oldid=864158059

Oct 15 2018, 1:29 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

Hey, look at this:

$ (echo "[[File:Foo.jpg|caption|alt=''"; echo "alt'']]") | php maintenance/parse.php 
<p><a href="/~cananian/mediawiki/index.php/File:Foo.jpg" class="image" title="alt= alt"><img alt="alt= alt" src="/~cananian/mediawiki/images/0/06/Foo.jpg" width="400" height="267" /></a>
</p>

PHP doesn't handle this either. That is, it *is* an invalid alt, and *should* be treated as a caption AFAICT. However, it shouldn't break round-tripping...

Oct 15 2018, 1:19 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott renamed T206940: Quote marks in "alt" text break media attribute parsing from Media "alt" text wrongly interpreted as caption to Newlines in "alt" text between double-quotes breaks media attribute parsing.
Oct 15 2018, 1:11 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

And in particular it seems to be the newline between the double-quotes in the caption which is causing the alt attribute to fail.

$ (echo "[[File:Foo.jpg|caption|alt=''alt'']]") | bin/parse.js --normalize=parsoid
Oct 15 2018, 1:07 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T206940: Quote marks in "alt" text break media attribute parsing.

There's a misfeature in the PHP parser where "anything which doesn't properly parse as an option" is assumed to be a caption. Parsoid mimics this behavior, for better or for worse. (Predictably, see a new syntax proposal here.)

Oct 15 2018, 1:02 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Parsoid, VisualEditor
cscott added a comment to T103624: Semantic media roles.

I've been using T90914 as the main task for this problem.

Oct 15 2018, 12:59 PM · Commons, Parsing-Team, Multimedia, MediaWiki-File-management

Oct 11 2018

cscott added a comment to T122711: Appending ".json" to an entity url should work (Feature Request).

...and adding .jsonld at the end should also work, right?

Oct 11 2018, 7:02 PM · goodfirstbug, MediaWiki-extensions-WikibaseRepository, Wikidata
cscott added a comment to T198946: Add Schema property 'sameAs' pointing to Wikidata entries.

I also worked on proper JSON+LD statements in T44063: [Epic] Provide a plain linked data interface for accessing entities and T164655: Store and serve annotations in W3C standard format. FWIW, with https://gerrit.wikimedia.org/r/384050 you get the following JSONLD for Q100:

{
    "@graph": [
        {
            "@id": "wdata:Q100",
            "@type": "schema:Dataset",
            "about": "wd:Q100",
            "license": "http://creativecommons.org/publicdomain/zero/1.0/",
            "softwareVersion": "0.1.0",
            "version": 2799,
            "dateModified": "2018-06-21T00:08:11Z",
            "statements": 85,
            "identifiers": 0,
            "sitelinks": 184
        },
        {
            "@id": "wd:Q100",
            "@type": "wikibase:Item"
        },
        {
            "@id": "https://sv.wikivoyage.org/wiki/Boston",
            "@type": "schema:Article",
            "about": "wd:Q100",
            "inLanguage": "sv",
            "isPartOf": "https://sv.wikivoyage.org/",
            "name": {
                "@language": "sv",
                "@value": "Boston"
            }
        },
  ...etc...
}
Oct 11 2018, 3:39 PM · Performance-Team (Radar), MW-1.33-notes (1.33.0-wmf.1; 2018-10-23), Wikidata-Campsite, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), Wikidata, MediaWiki-extensions-WikibaseClient, SEO
cscott added a comment to T206764: Remove `prefix` from Parsoid and use `domain` consistently as configuration key.

We should really have a 'Parsoid-Code-Debt' tag?

Oct 11 2018, 1:59 PM · Parsoid
cscott created T206764: Remove `prefix` from Parsoid and use `domain` consistently as configuration key.
Oct 11 2018, 1:59 PM · Parsoid
cscott updated the task description for T206574: Replace `addWikiText( $this->msg(....)->text() )`.
Oct 11 2018, 12:33 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Patch-For-Review, MediaWiki-Parser
cscott closed T206738: Wikibase appears to be failing phan on all builds now as Resolved.

Well, I rebased my two patches on top of @Legoktm's patch and the builds are passing now, so I'll call that fixed.

Oct 11 2018, 12:31 PM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Patch-For-Review, Wikidata
cscott added a comment to T206738: Wikibase appears to be failing phan on all builds now.

Ah.

Oct 11 2018, 5:56 AM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Patch-For-Review, Wikidata
cscott created T206738: Wikibase appears to be failing phan on all builds now.
Oct 11 2018, 5:30 AM · MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Patch-For-Review, Wikidata

Oct 9 2018

cscott created T206574: Replace `addWikiText( $this->msg(....)->text() )`.
Oct 9 2018, 8:33 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Patch-For-Review, MediaWiki-Parser
cscott added a comment to T100841: Support for dynamically enabling new wikis.

Thanks for the patch, looks good! And I'm fine with amending the original patch. I made a few minor suggestions. Thanks for testing this out in practice!

Oct 9 2018, 5:02 PM · Patch-For-Review, Parsoid

Oct 4 2018

cscott created T206243: Core tests aren't run with extensions installed.
Oct 4 2018, 5:21 PM · Continuous-Integration-Infrastructure (shipyard)
cscott added a comment to T197469: Migrate gated extensions jobs to Quibble/Docker.

This bug is referenced from https://github.com/wikimedia/integration-config/blob/41e1e9a25d416e20ef4d236d5bff68c1e815538a/jjb/mediawiki.yaml#L265 :

# We do not run mediawiki/core tests with extensions installed
# https://phabricator.wikimedia.org/T197469#4293142

...but why?

Oct 4 2018, 5:16 PM · MW-1.32-notes (WMF-deploy-2018-06-12 (1.32.0-wmf.8)), Patch-For-Review, Epic, Release-Engineering-Team (Kanban), releng-201718-q3, Continuous-Integration-Infrastructure (shipyard)
cscott added a comment to T205834: Parts of text where language variant conversion is disabled are missing in VE.

It is a bit interesting though that they are invisible, though. I would expect this issue to just make the variant annotations non-editable. Apparently Parsoid represents these nodes as empty <span> tags: https://sr.wikipedia.org/api/rest_v1/page/html/Papirus_1

<span typeof="mw:LanguageVariant" data-mw-variant="{&quot;disabled&quot;:{&quot;t&quot;:&quot;I&quot;}}" id="mwHQ"></span>
Oct 4 2018, 3:54 PM · Editing QA, MW-1.32-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), VisualEditor (Current work), VisualEditor-ContentLanguage
cscott updated the task description for T93715: [EPIC] Make Parsoid HTML output completely deterministic.
Oct 4 2018, 2:51 PM · Parsoid
cscott updated the task description for T93715: [EPIC] Make Parsoid HTML output completely deterministic.
Oct 4 2018, 2:48 PM · Parsoid
cscott renamed T93715: [EPIC] Make Parsoid HTML output completely deterministic from Make HTML output as deterministic / stable as possible to [EPIC] Make Parsoid HTML output completely deterministic.
Oct 4 2018, 2:44 PM · Parsoid
cscott renamed T206222: Make "about" attribute IDs deterministic from Make about IDs deterministic to Make "about" attribute IDs deterministic.
Oct 4 2018, 2:10 PM · Parsoid
cscott triaged T206222: Make "about" attribute IDs deterministic as Normal priority.
Oct 4 2018, 2:10 PM · Parsoid
cscott added a comment to T93715: [EPIC] Make Parsoid HTML output completely deterministic.

This was brought up again as desiderata due to caching/storage concerns. Since we don't actually provide data-parsoid to VE, we currently need to *guarantee* persistent storage of a matched set of Parsoid HTML/data-parsoid for that HTML for the entire duration of an editing session, to be certain that the html2wt phase can get back the appropriate data-parsoid.

Oct 4 2018, 2:07 PM · Parsoid
cscott closed T187848: Fix token transformer return types as Resolved.

I think I've fixed this! October's a little later than July but...

Oct 4 2018, 1:58 PM · Patch-For-Review, Performance, Technical-Debt, Parsoid

Oct 3 2018

cscott added a comment to T100841: Support for dynamically enabling new wikis.

@Krenair SIGKILL is more reliable and would cost about the same as SIGHUP. Adding SIGHUP support would probably involve a lot of tedious tracking down various bits of cached data. Besides, much of this configuration management code is going away with the upcoming Parsoid/PHP port & integration with core.

Oct 3 2018, 3:30 PM · Patch-For-Review, Parsoid

Oct 2 2018

cscott merged task T206038: Use proper ES6 classes into T204622: Use native Javascript (ES6) classes instead of prototype-based definition pattern in the Parsoid codebase.
Oct 2 2018, 9:09 PM · Patch-For-Review, Parsoid
cscott merged T206038: Use proper ES6 classes into T204622: Use native Javascript (ES6) classes instead of prototype-based definition pattern in the Parsoid codebase.
Oct 2 2018, 9:09 PM · Patch-For-Review, Parsoid-PHP
cscott renamed T204622: Use native Javascript (ES6) classes instead of prototype-based definition pattern in the Parsoid codebase from Use native Javascript classes instead of prototype-based definition pattern in the Parsoid codebase to Use native Javascript (ES6) classes instead of prototype-based definition pattern in the Parsoid codebase.
Oct 2 2018, 9:08 PM · Patch-For-Review, Parsoid-PHP