Page MenuHomePhabricator

cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (256 w, 6 d)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Thu, Sep 19

cscott added a comment to T232183: Potential bugs in SiteConfig.php regular expressions used in WikitextSerializer.

The JS equivalent regexp is:

/^((?:(?:#REDIRECT|#redirect)[ \t\n\r\x0c]*(?::[ \t\n\r\x0c]*)?\[\[[^\]]+\]\])?(?:\[\[(?:Category)\:[^\]]*?\]\]|__(?:NOGLOBAL|DISAMBIG|NOCOLLABORATIONHUBTOC|nocollaborationhubtoc|NOTOC|notoc|NOGALLERY|nogallery|FORCETOC|forcetoc|TOC|toc|NOEDITSECTION|noeditsection|NOTITLECONVERT|notitleconvert|NOTC|notc|NOCONTENTCONVERT|nocontentconvert|NOCC|nocc|NEWSECTIONLINK|NONEWSECTIONLINK|HIDDENCAT|EXPECTUNUSEDCATEGORY|INDEX|NOINDEX|STATICREDIRECT)__|<!--(?:[^-]|-(?!->))*-->)*)(<nowiki>\s+<\/nowiki>)([^\n]*(?:\n|$))/im

Note that the PHP version, in addition to adding unnecessary ?P<a_notoc> groupings, is also doubling the leading and trailing underscores, so instead of matching __NOTOC__ the PHP version is looking for ____NOTOC____ etc.
The part of the regexp which matches the HTML comment is also over-escaped in PHP, so it's matching literal parens instead of grouping.

Thu, Sep 19, 7:02 PM · Patch-For-Review, Parsoid-PHP
cscott added a comment to T232183: Potential bugs in SiteConfig.php regular expressions used in WikitextSerializer.
Thu, Sep 19, 6:50 PM · Patch-For-Review, Parsoid-PHP

Wed, Sep 18

cscott closed T233062: Tag new release of remex-html, a subtask of T233012: Make MediaWiki core compatible with PHP 7.4, as Resolved.
Wed, Sep 18, 4:16 PM · MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), MediaWiki-General, Patch-For-Review, PHP 7.4 support
cscott closed T233062: Tag new release of remex-html as Resolved.

Tagged and pushed 2.1.0 to composer.

Wed, Sep 18, 4:16 PM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Patch-For-Review, RemexHtml

Tue, Sep 17

cscott added a comment to T231945: UTF-8 validity assertion failure.

T233136: Code debt: should extension API know about frames? is related.

Tue, Sep 17, 7:13 PM · Parsoid-PHP
cscott created T233136: Code debt: should extension API know about frames?.
Tue, Sep 17, 5:56 PM · Parsoid
cscott updated the task description for T230653: Use a parser function to encapsulate signatures.
Tue, Sep 17, 4:43 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230653: Use a parser function to encapsulate signatures.
Tue, Sep 17, 4:42 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230653: Use a parser function to encapsulate signatures.

@Catrope: oops, sorry, I had it in my head that {T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar} was already implemented, instead of still future work.

Tue, Sep 17, 4:41 PM · OWC2020, MediaWiki-Parser
Restricted Application updated subscribers of T201878: Add `#or:` parser function.
Tue, Sep 17, 4:40 PM · MediaWiki-extensions-Variables, MediaWiki-Templates, ParserFunctions

Thu, Sep 12

cscott added a comment to T171073: Promote FLOSS libraries developed by the Foundation/movement.

There's also wikimedia/zest-css
Perhaps even wikimedia/assert as well.

Thu, Sep 12, 12:02 AM · Librarization, Wikimedia-Blog-Content

Mon, Sep 9

cscott added a comment to T232390: Remex does not use \DOMElement::setIdAttribute('id') by default.

This is a workaround for https://bugs.php.net/bug.php?id=77686 and other issues related to inconsistent indexing behavior. See also the implementation of DOMCompat::getElementById in 10dedbbeb626aa614b5931c07035675e01590259 and the implementation of getElementById in Zest, for example in d118dcc45b803917e5e34d60712b893b3892d5ac, and the discussion in T215000#4993172 and T215000#4994273 and the PHP documentation in https://www.php.net/manual/en/domdocument.getelementbyid.php and https://www.php.net/manual/en/domelement.setidattribute.php

Mon, Sep 9, 9:36 PM · RemexHtml
cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.

(More of the workaround is in T232390: Remex does not use \DOMElement::setIdAttribute('id') by default.)

Mon, Sep 9, 9:28 PM · Parsoid-PHP
cscott added a comment to T230861: PHP 7.2 is very slow on an allocation-intensive benchmark.

Might be related to T232390: Remex does not use \DOMElement::setIdAttribute('id') by default.

Mon, Sep 9, 9:25 PM · PHP 7.3 support, PHP 7.2 support, serviceops, Operations
cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.

So this is another case where T215000: Fill gaps in PHP DOM's functionality bites us again and we need T217867: Port domino (or another spec-compliant DOM library) to PHP for a proper fix. But I've got a reasonable workaround.

Mon, Sep 9, 9:03 PM · Parsoid-PHP
cscott renamed T232390: Remex does not use \DOMElement::setIdAttribute('id') by default from Remex does not use \DOMDocument::setIdAttribute('id') by default to Remex does not use \DOMElement::setIdAttribute('id') by default.
Mon, Sep 9, 8:38 PM · RemexHtml
cscott created T232390: Remex does not use \DOMElement::setIdAttribute('id') by default.
Mon, Sep 9, 8:28 PM · RemexHtml
cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.

No, I found it. From DOMDataUtils.php:

	public static function storeInPageBundle( DOMElement $node, Env $env, stdClass $data ): void {
		$uid = $node->getAttribute( 'id' ) ?? '';
		$document = $node->ownerDocument;
		$pb = self::getPageBundle( $document );
		$docDp = $pb->parsoid;
		$origId = $uid ?: null;
		if ( array_key_exists( $uid, $docDp->ids ) ) {
			$uid = null;
			// FIXME: Protect mw ids while tokenizing to avoid false positives.
			$env->log( 'info', 'Wikitext for this page has duplicate ids: ' . $origId );
		}
		if ( !$uid ) {
			do {
				$docDp->counter += 1;
				$uid = 'mw' . PHPUtils::counterToBase64( $docDp->counter );
			} while ( DOMCompat::getElementById( $document, $uid ) );
			self::addNormalizedAttribute( $node, 'id', $uid, $origId );
		}
		$docDp->ids[$uid] = $data->parsoid;
		if ( isset( $data->mw ) ) {
			$pb->mw->ids[$uid] = $data->mw;
		}
	}

Note the loop in the middle, which calls getElementById() for every id it assigns, in order to ensure it's unused.
But DOMComat::getElementsById() calls ZestInst::getElementsById, and here's that code:

	public static function getElementsById( DOMNode $context, string $id ): array {
		$doc = ( $context instanceof \DOMDocument ) ?
			$context : $context->ownerDocument;
		// PHP doesn't provide an DOMElement-scoped version of
		// getElementById, so we can't call this directly on $context --
		// but that's okay because (1) IDs should be unique, and
		// (2) we verify the scope of the returned element below
		// anyway (to work around bugs with deleted-but-not-gc'ed
		// nodes).
		$r = $doc->getElementById( $id );
		// Note that $r could be null here because the
		// DOMDocument hasn't had an "id attribute" set, even if the id
		// exists in the document. See:
		// http://php.net/manual/en/domdocument.getelementbyid.php
		if ( $r !== null ) {
			// Verify that this node is actually rooted in the
			// document (or in the context), since the element
			// isn't removed from the index immediately when it
			// is deleted. (Also PHP's call is not scoped.)
			for ( $parent = $r; $parent; $parent = $parent->parentNode ) {
				if ( $parent === $context ) {
					return [ $r ];
				}
			}
			// It's possible a deleted-but-still-indexed element was
			// shadowing a later-added element, so we can't return
			// null here directly; fallback to a full search.
		}
		// Do an xpath search, which is still a full traversal of the tree
		// (sigh) but 25% faster than traversing it wholly in PHP.
		$xpath = new \DOMXPath( $doc );
		$query = './/*[@id=' . self::xpathQuote( $id ) . ']';
		return iterator_to_array( $xpath->query( $query, $context ) );
	}
Mon, Sep 9, 8:10 PM · Parsoid-PHP
cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.

The only obvious use of Zest in the pagebundle path I can see is:

		$dpScriptElt = DOMCompat::getElementById( $doc, 'mw-pagebundle' );

from DOMDataUtils::extractPageBundle(), called from ContentUtils::extractDpAndSerialize(), called from Parsoid::wikitext2html(), called from ParsoidHandler::wt2html().

Mon, Sep 9, 7:51 PM · Parsoid-PHP
cscott claimed T231945: UTF-8 validity assertion failure.
Mon, Sep 9, 4:15 PM · Parsoid-PHP

Fri, Sep 6

cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.
$ time php bin/parse.php --pageName 'Ken Schrader' --domain en.wikipedia.org < /dev/null
real	0m35.269s
user	0m7.081s
sys	0m0.332s
Fri, Sep 6, 9:48 PM · Parsoid-PHP
cscott added a comment to T232180: Zest.php: Pagebundle routes timeout for some pages.

Is there any way to get a fuller stack trace for the failure?

Fri, Sep 6, 9:29 PM · Parsoid-PHP
cscott added a comment to T198214: Deprecate and remove non-remex Tidy modes of the core parser.

I have a multi-stage plan in my head for conversion, starting with 1) deleting existing html/php clauses where we already have an html/php+tidy clause, 2) if a marker at the top of the file is found, always read html/php as html/php+tidy, 3) one by one in core and extensions add the marker to the top of the file and update the tests to match (which should ensure that the diff is readable and only includes actual differences introduced by tidying the output, not bookkeeping changes), and then 4) require the marker/normalizing the naming of the clauses/remove the no-tidy paths in the test case.

Fri, Sep 6, 9:25 PM · MW-1.35-release, MediaWiki-Parser, Technical-Debt (Deprecation), Patch-For-Review, Tidy, Parsing-Team

Aug 22 2019

cscott added a comment to T230659: Automatically-assigned id attributes for list items.

In T230683#5432585 @Anomie proposes using the revision ID of the edit that creates the comment as its persistent identifier.

Aug 22 2019, 10:46 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230683: New syntax for multiline list items / talk page comments.

The fact is that we have a huge amount of preexisting content, and editors trained on our current markup. Regardless of how you might *prefer* comments be formatted, the existing content on our talk pages uses list item syntax, and new comments by our existing editors will use list item syntax as well. I'm not opposed to adding new syntax for comments, but it needs to interoperate really well with existing-and-future list-item talk page markup. That, combined with the long standing bugs&wishes around existing single-line list item syntax, seems a reasonable basis for considering general improvements.

Aug 22 2019, 10:44 PM · MediaWiki-Parser
cscott added a comment to T230665: Multilingual JavaScript.

This is prevented by the runtime type system. What looks like a string is actually a Symbol; you can't add symbols or concatenate them.

You're proposing a language with no string concatenation? That seems nearly unusable for something to be used with wikitext.

Aug 22 2019, 7:14 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott added a comment to T230683: New syntax for multiline list items / talk page comments.

Reasons are in T230658: in particular it provides extensibility for outdent or other special formatting,

As I said there, IMO manual outdenting should just go away. No idea what "other special formatting" someone might want on a comment itself rather than the content of the comment (which then belongs as wikitext in the content of the comment).

Aug 22 2019, 6:57 PM · MediaWiki-Parser
cscott added a comment to T230659: Automatically-assigned id attributes for list items.

For completeness, another proposal is to automatically scan for a trailing signature (perhaps using the {{#~|user|date}} syntax from T230653) and using the timestamp from this as part of the automatically-generated ID. I'm not a huge fan of this proposal because (a) requires non-local effects on list item markup, and (b) seems to be too talk-page specific, but it's certainly worth mentioning for discussion that automatic ID generation based on content doesn't have to quite as simplistic as the ID generation for headings is.

Aug 22 2019, 6:38 PM · OWC2020, MediaWiki-Parser
cscott renamed T230683: New syntax for multiline list items / talk page comments from New syntax for multiline talk page comments to New syntax for multiline list items / talk page comments.
Aug 22 2019, 6:35 PM · MediaWiki-Parser
cscott added a subtask for T230654: Parser support for talk pages: T231037: Add <div> around non-nested list item content.
Aug 22 2019, 6:33 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a parent task for T231037: Add <div> around non-nested list item content: T230654: Parser support for talk pages.
Aug 22 2019, 6:33 PM · MediaWiki-Parser
cscott created T231037: Add <div> around non-nested list item content.
Aug 22 2019, 6:33 PM · MediaWiki-Parser
cscott renamed T230659: Automatically-assigned id attributes for list items from id attributes for list items to Automatically-assigned id attributes for list items.
Aug 22 2019, 6:10 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230658: Syntax for list item attributes.

That's a reasonable alternative; I've added it to the list in the task description. There are some weird corner cases w/r/t properly closing the list; I think we want some sort of multiline list syntax anyway (T230683: New syntax for multiline list items / talk page comments), so it might make sense to tie the attribute syntax to that. But there are multiple proposals for multiline lists, too.

Aug 22 2019, 6:09 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230658: Syntax for list item attributes.
Aug 22 2019, 6:00 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230653: Use a parser function to encapsulate signatures.

My point is that there are a number of levels of indirection between local username and some stable notion of "real user". Over time the local username can map to a different local userid, and in term the local userid can map to a different SUL userid. I don't think we should try to solve the global identity problem.

Aug 22 2019, 5:57 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230659: Automatically-assigned id attributes for list items.

Edit to clarify: If this were expected for all or most comments I would consider it a non-starter. If this is some rare special purpose feature, equivalent to how on rare occasion we put an anchor link on a section, then I withdraw that concern.

Aug 22 2019, 5:51 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230659: Automatically-assigned id attributes for list items.
Aug 22 2019, 5:42 PM · OWC2020, MediaWiki-Parser

Aug 20 2019

cscott added a comment to T230683: New syntax for multiline list items / talk page comments.

Also it needs a companion proposal for T230658 that's not awful. Since > doesn't have a unique "list start" indicator (the same character is used for line continuation) it seems awkward to specify where list item properties would or would not belong.

Why? This isn't a list at all, it's markup for a comment on a talk page. Why would you need class and data and such on a talk page comment (rather than in its contents)?

Aug 20 2019, 10:35 AM · MediaWiki-Parser
cscott added a comment to T230653: Use a parser function to encapsulate signatures.

We usually leave a redirect page in place for user renames. For example: https://en.wikipedia.org/w/index.php?title=User:Cananian&redirect=no

I'm not sure a redirect would be the best thing to do. What if someone retargets it? What if someone usurps the old name?

Aug 20 2019, 10:22 AM · OWC2020, MediaWiki-Parser

Aug 19 2019

cscott added a comment to T230659: Automatically-assigned id attributes for list items.

If you do new syntax, then you still have to figure out how to add attributes (at least outdent information, and ideally an I'd as well) to that new syntax.

Aug 19 2019, 7:11 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230665: Multilingual JavaScript.
Aug 19 2019, 7:03 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott added a comment to T230653: Use a parser function to encapsulate signatures.

We usually leave a redirect page in place for user renames. For example: https://en.wikipedia.org/w/index.php?title=User:Cananian&redirect=no

Aug 19 2019, 6:42 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230683: New syntax for multiline list items / talk page comments.

Two open questions for > syntax are: how to actually express a break between list items, since usually

>>> This is an item
Aug 19 2019, 6:38 PM · MediaWiki-Parser
cscott removed a project from T149659: Grunge, or "zoom": Wikimedia-Developer-Summit (2017).
Aug 19 2019, 7:48 AM · MediaWiki-Parser, Parsing-Team
cscott reopened T149659: Grunge, or "zoom", a subtask of T151950: Wikitext 2.0 Session at Wikidev'17, as Open.
Aug 19 2019, 7:47 AM · MediaWiki-Templates, MediaWiki-Parser, Wikimedia-Developer-Summit (2017), Parsing-Team
cscott reopened T149659: Grunge, or "zoom" as "Open".

Re-opening, to stand for the idea of a cleaner wikitext 2.0 syntax, which keeps coming up from time to time. The desired properties in the task description are still valid.

Aug 19 2019, 7:47 AM · MediaWiki-Parser, Parsing-Team

Aug 18 2019

cscott added a comment to T230665: Multilingual JavaScript.

I've added this as a subtask of T150417. That task seemed to focus specifically on localizing Lua; I'm looking at JavaScript in this task. Because we don't (yet) have a Scribunto API for JavaScript (T61101), we're not quite as constrained by backwards compatibility of existing code.

Aug 18 2019, 5:18 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott added a subtask for T150417: Allow users to code in localized programming languages: T230665: Multilingual JavaScript.
Aug 18 2019, 5:14 PM · MediaWiki-extensions-Scribunto, I18n, MediaWiki-Internationalization
cscott added a parent task for T230665: Multilingual JavaScript: T150417: Allow users to code in localized programming languages.
Aug 18 2019, 5:14 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott added a comment to T230653: Use a parser function to encapsulate signatures.

It would also be nice if the parser function output contained this structured data, e.g. <span data-timestamp="YYYYMMDDHHMMSS" data-user="Bob">....</span>

Aug 18 2019, 5:12 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230683: New syntax for multiline list items / talk page comments.
Aug 18 2019, 1:41 PM · MediaWiki-Parser
cscott updated the task description for T230653: Use a parser function to encapsulate signatures.
Aug 18 2019, 1:40 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230683: New syntax for multiline list items / talk page comments.
Aug 18 2019, 1:36 PM · MediaWiki-Parser
cscott updated the task description for T230683: New syntax for multiline list items / talk page comments.
Aug 18 2019, 1:33 PM · MediaWiki-Parser
cscott edited parent tasks for T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments), added: T230683: New syntax for multiline list items / talk page comments; removed: T230654: Parser support for talk pages.
Aug 18 2019, 11:54 AM · TechCom-RFC (TechCom-Approved), Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016
cscott removed a subtask for T230654: Parser support for talk pages: T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).
Aug 18 2019, 11:54 AM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a subtask for T230683: New syntax for multiline list items / talk page comments: T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).
Aug 18 2019, 11:54 AM · MediaWiki-Parser
cscott added a subtask for T230654: Parser support for talk pages: T230683: New syntax for multiline list items / talk page comments.
Aug 18 2019, 11:49 AM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a parent task for T230683: New syntax for multiline list items / talk page comments: T230654: Parser support for talk pages.
Aug 18 2019, 11:49 AM · MediaWiki-Parser
cscott created T230683: New syntax for multiline list items / talk page comments.
Aug 18 2019, 11:49 AM · MediaWiki-Parser

Aug 17 2019

ToBeFree awarded T118517: [RFC] Use <figure> for media a Like token.
Aug 17 2019, 11:23 PM · Accessibility, Parsing-Team, Wikipedia-Android-App-Backlog, MediaWiki-Parser, TechCom-RFC
cscott updated the task description for T230659: Automatically-assigned id attributes for list items.
Aug 17 2019, 10:24 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230659: Automatically-assigned id attributes for list items.
Aug 17 2019, 10:19 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T230659: Automatically-assigned id attributes for list items.
Aug 17 2019, 10:08 PM · OWC2020, MediaWiki-Parser
cscott updated the task description for T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).
Aug 17 2019, 10:06 PM · TechCom-RFC (TechCom-Approved), Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016
cscott updated the task description for T230653: Use a parser function to encapsulate signatures.
Aug 17 2019, 10:02 PM · OWC2020, MediaWiki-Parser
cscott added a comment to T230665: Multilingual JavaScript.

The basic idea is to remap:

var foo = "bar";
var obj = {};
obj.bar = 3;
return obj[foo];

to

const $1 = Symbol("bar");
Aug 17 2019, 9:05 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott created T230665: Multilingual JavaScript.
Aug 17 2019, 8:57 PM · Developer-Advocacy, MediaWiki-extensions-Scribunto
cscott added a comment to T149659: Grunge, or "zoom".

Slides presenting some syntax examples at
https://commons.wikimedia.org/wiki/File:Wikitext_2.0.wikimedia.devsummit.2017.pdf

Aug 17 2019, 3:13 PM · MediaWiki-Parser, Parsing-Team
cscott added a comment to T230654: Parser support for talk pages.

Very short presentation on this: https://commons.wikimedia.org/wiki/File:Wikimania_2019_-_New_Wikitext_for_Chat_Pages.pdf

Aug 17 2019, 2:25 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott updated the task description for T230658: Syntax for list item attributes.
Aug 17 2019, 2:17 PM · OWC2020, MediaWiki-Parser
cscott added a subtask for T230654: Parser support for talk pages: T230659: Automatically-assigned id attributes for list items.
Aug 17 2019, 2:15 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a parent task for T230659: Automatically-assigned id attributes for list items: T230654: Parser support for talk pages.
Aug 17 2019, 2:15 PM · OWC2020, MediaWiki-Parser
cscott created T230659: Automatically-assigned id attributes for list items.
Aug 17 2019, 2:15 PM · OWC2020, MediaWiki-Parser
cscott added a subtask for T230654: Parser support for talk pages: T230658: Syntax for list item attributes.
Aug 17 2019, 2:09 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a parent task for T230658: Syntax for list item attributes: T230654: Parser support for talk pages.
Aug 17 2019, 2:09 PM · OWC2020, MediaWiki-Parser
cscott created T230658: Syntax for list item attributes.
Aug 17 2019, 2:08 PM · OWC2020, MediaWiki-Parser
cscott added a parent task for T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments): T230654: Parser support for talk pages.
Aug 17 2019, 1:32 PM · TechCom-RFC (TechCom-Approved), Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016
cscott added a subtask for T230654: Parser support for talk pages: T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).
Aug 17 2019, 1:32 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a subtask for T230654: Parser support for talk pages: T230653: Use a parser function to encapsulate signatures.
Aug 17 2019, 1:31 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott added a parent task for T230653: Use a parser function to encapsulate signatures: T230654: Parser support for talk pages.
Aug 17 2019, 1:31 PM · OWC2020, MediaWiki-Parser
cscott created T230654: Parser support for talk pages.
Aug 17 2019, 1:30 PM · OWC2020, Parsoid, MediaWiki-Parser
cscott created T230653: Use a parser function to encapsulate signatures.
Aug 17 2019, 1:27 PM · OWC2020, MediaWiki-Parser
cscott created T230652: Tilde stripping in signatures is inadequate.
Aug 17 2019, 12:23 PM · MediaWiki-Parser
cscott added a comment to T198970: Epic: Implement SEO improvements suggested by Go Fish Digital.

ZZ from Google is at Wikimania 2019 and will be on the panel at https://wikimania.wikimedia.org/wiki/2019%3AQuality/Idea_jam_on_quality on Sunday.

Aug 17 2019, 10:25 AM · SEO, Epic

Aug 15 2019

cscott added a comment to T156876: Structured data side channel for wikitext.

(And I agree with @Tgr's response to @Lydia_Pintscher that this issue isn't directly related to MCR. In my mind it is about what "output types" and "input types" are available for templates/extensions during preprocessing. Currently extensions can output a wikitext string or raw HTML, and as parameters they can only have a string (usually interpreted as a wikitext string). This task broadens both the possible output types and the possible input type to include structured data.)

Aug 15 2019, 3:15 AM · Wikidata, MediaWiki-Parser, SDC General, Developer-Wishlist (2017)

Aug 14 2019

cscott updated the task description for T230372: Extension:Translate workshop at Wikimania hackathon 2019.
Aug 14 2019, 11:08 PM · Wikimania-Hackathon-2019
cscott added a comment to T28396: Permit all lowercase (uncapitalized) usernames and user pages.

You can use the {{lowercase}} template on your user name, which will ensure that the first letter is displayed lowercase on your user page at least. This uses the {{DISPLAYTITLE}} magic word. See https://web.archive.org/web/20190410170303/https://en.wikipedia.org/wiki/User:cscott .

Aug 14 2019, 4:57 PM · WorkType-NewFunctionality, MediaWiki-General
cscott closed T230318: Undeploy LanguageConverter from unilingual uniscript projects as Resolved.

Duplicate for the -{ change and intended feature for the Special:MyLanguage change.

Aug 14 2019, 4:55 PM · MediaWiki-Language-converter, Wikimedia-Site-requests
cscott added a comment to T230318: Undeploy LanguageConverter from unilingual uniscript projects.

I suspect Special:MyLanguage was enabled for all projects by T68762: Move "Special:MyLanguage" from Extension:Translate to MediaWiki core in June 2014.

Aug 14 2019, 4:54 PM · MediaWiki-Language-converter, Wikimedia-Site-requests
cscott added a comment to T189095: Template inclusion with -{ (minus followed by an opening curly brace) stops parser and prevents inclusion.

Tech news notifications documented in T165175, which is linked from the page I cited above.

Aug 14 2019, 2:33 PM · MediaWiki-Templates, MediaWiki-Parser
cscott added a comment to T230318: Undeploy LanguageConverter from unilingual uniscript projects.

Duplicate of T189095 for the first part. Not a new change.

Aug 14 2019, 2:31 PM · MediaWiki-Language-converter, Wikimedia-Site-requests

Aug 13 2019

Dalba awarded T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments) a Dislike token.
Aug 13 2019, 11:33 AM · TechCom-RFC (TechCom-Approved), Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016

Aug 12 2019

cscott added a comment to T189095: Template inclusion with -{ (minus followed by an opening curly brace) stops parser and prevents inclusion.

This was an intentional change to make preprocessor behavior more uniform. Just entity-escape the - or the { if you don't want language converter syntax, or wrap the -{ in a <nowiki>. See T54661: Preprocessor/Parser irregularities with -{...}- variant constructs. and T146304: Preprocessor should handle -{...}- variant constructs in template arguments. We had wikilint volunteers fix these issues with out existing content, they can probably help you out with tools if that's helpful. https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups discusses the wikitext fixup process in more detail.

Aug 12 2019, 5:55 PM · MediaWiki-Templates, MediaWiki-Parser

Aug 8 2019

cscott added a comment to T156876: Structured data side channel for wikitext.

FWIW, in T196440#5341715 I moot around the idea of a specialized "arglist" data type to be returned by a template, to make certain argument-list manipulations easier/more robust. I think this "structured data" input/output type would work well for that. {{#arglist}} would emit the arguments to the current template in the side channel key-value format, and {{#filter-arglist|....}} would accept that side channel format as input.

Aug 8 2019, 8:41 AM · Wikidata, MediaWiki-Parser, SDC General, Developer-Wishlist (2017)

Aug 3 2019

cscott added a comment to T213494: Installing composer modules for deployment.

I bet we'd want to go with #3, at least initially. I think we'd prefer not to add Parsoid and it's dependencies to mediawiki/vendor's composer.json (#1) because of the HHVM/PHP 7.2 thing. #2 is attractive, and I suspect we might want to do that once we move past testing and start putting Parsoid into production, but I bet we don't want to jump on the deploy train quite yet.

Aug 3 2019, 3:39 AM · Patch-For-Review, Release-Engineering-Team-TODO, Release-Engineering-Team (Deployment services), Parsoid-PHP

Aug 2 2019

cscott added a comment to T228346: PHP 7.2 garbage collector segfault.

...or we could stop using the DOM extension, I suppose (T217867). I'm not a fan of the quality of that code....

Aug 2 2019, 9:46 PM · Patch-For-Review, Parsoid-PHP, PHP 7.2 support
cscott added a comment to T214651: The extension api seems to want a dom-diff-handler.

I don't know if it needs direct access to dom diff. But if it recursively invokes the serializer for a subtree, it should be a SelectiveSerializer; ie we should be doing selser and domdiff on that subtree. And we should be smart enough that if a subtree of an extension tag mismatches, we should pop up and reserialize the entire extension tag (using the extension tag's handler), not try to splice it in ourselves.

Aug 2 2019, 6:00 PM · Parsoid-Read-Views

Aug 1 2019

cscott added a comment to T213494: Installing composer modules for deployment.

I think there's some longer-term planning considerations here:

Aug 1 2019, 7:43 PM · Patch-For-Review, Release-Engineering-Team-TODO, Release-Engineering-Team (Deployment services), Parsoid-PHP