cscott (C. Scott Ananian)
Parser whisperer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 21 2014, 6:47 PM (204 w, 9 h)
Availability
Available
IRC Nick
cscott
LDAP User
Unknown
MediaWiki User
Cscott [ Global Accounts ]

Editor since 2005; WMF developer since 2013. I work on Parsoid and OCG, and dabble with VE, real-time collaboration, and OOjs.

On github: https://github.com/cscott

See https://en.wikipedia.org/wiki/User:cscott for more.

Recent Activity

Yesterday

cscott added a comment to T204608: Use a bag-on-the-side implementation, rather than an internal .dataobject for node data.

https://stackoverflow.com/questions/8707235/how-to-create-new-property-dynamically has some benchmarks showing how slow "expando" properties are.

Tue, Sep 18, 3:40 PM · Parsoid-PHP
cscott added a comment to T204624: Parsoid is misbehaving in Beta cluster .

The last Parsoid deploy to beta was Wed or Thursday of last week. Assuming this problem started recently it's probably not a code or configuration change on the Parsoid end...

Tue, Sep 18, 3:22 PM · User-Ryasmeen, Services (done), Parsoid, Beta-Cluster-Infrastructure, VisualEditor
cscott added a comment to T161278: Add default gadget styling to Parsoid's output.

Other users of the BeforePageDisplay hook, via codesearch:

  • AdvancedMeta, for example
    • Uses: $out->addJsConfigVars(), $out->getTitle()
    • Don't move to new hook: $out->addMeta(), $out->setIndexPolicy(), $out->setHTMLTitle()
  • Wikibase, for example
    • Uses: $out->addJsConfigVars(), $out->getProperty( 'wikibase_item' ), $skin, $out->addModules()
  • CentralAuth, for example
    • Uses: $out->addJsConfigVars(). $out->addModules()
    • Don't move: $out->addHeadItem(), $out->addHTML(), $out->getRequest()->getSessionData()
  • ContentTranslation, for example
    • Uses: $title, $user, $out->getRequest()->getCookie(), $out->addModules(), Action::getActionName( $out->getContext() )
  • Echo, for example
    • Uses: $user, $skin->getSkinName(), $out->addModules(), $out->addModuleStyles()
  • EventLogging, for example
    • Uses: $out->addModules(), $user, $out->getRevisionId(), $out->addSubtitle() (don't move this one?)
  • MobileFrontend, for example
    • Don't move? $out->addLink(), $out->addVaryHeader(), etc
  • MultiLanguageManager, for example
    • usual stuff, jsvars, modules, modulestyles, $title
  • ORES, for example
    • Uses: $out->getProperty( 'oresData' ), Helpers::isHighlightEnabled( $out ) ($user/$title, aka $context)
  • PageImages, for example
    • Don't move: $out->addMeta()
Tue, Sep 18, 2:27 PM · Parsoid-Read-Views, Patch-For-Review, Gadgets, MediaWiki-API
cscott added a comment to T161278: Add default gadget styling to Parsoid's output.

@Krinkle, @Legoktm does the above plan seem reasonable? there's a WIP patch in gerrit, but I don't want to develop it further before getting signoff on the theory.

Tue, Sep 18, 1:28 PM · Parsoid-Read-Views, Patch-For-Review, Gadgets, MediaWiki-API

Mon, Sep 17

cscott added a comment to T189261: Lightweight parse mode where roundtripping is not required.

OK. We should probably do a memory audit at some point; we'll probably be force to do it by the PHP port, since we'll have to pre-declare all our token fields instead of just adding them on-the-fly like JavaScript allows. Paying attention to removing unused fields should help with memory usage, and reducing memory usage should help with performance ... my intuition is that this is another 10-20% though, not like 2x speedup or something.

Mon, Sep 17, 8:22 PM · VisualEditor, VisualEditor-MediaWiki-2017WikitextEditor, Parsoid
cscott added a comment to T189261: Lightweight parse mode where roundtripping is not required.

@ssastry I wonder if removing rt-related fields from the token objects would also help, for not too much more surgery? I suspect memory allocation is a surprisingly large fraction of our runtime costs, and I'm guessing that turning off the computations you mention didn't actually remove the related fields from the tokens? But maybe your costs already included slimming down the token objects....

Mon, Sep 17, 8:11 PM · VisualEditor, VisualEditor-MediaWiki-2017WikitextEditor, Parsoid
cscott added a comment to T204370: Behavior switch/magic word uniformity.

The preprocessor refactoring is a distraction, and it's not needed for this task. (And the dispatching actually occurs in Parser::doBraceExpansion / Parser::doDoubleUndercoreExpanion / etc, not the preprocessor.)

Mon, Sep 17, 6:14 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204566: cloudvps: wikitextexp project trusty deprecation.

Most of the setup work was IIUC manual import of big chunks of existing wikis (with associated dependent templates, modules, etc) so that we can render pages faithfully and thus detect regressions. Is there a better way to do that now, maybe a "standard" way to get a read-only snapshot of the master DB onto a labs machine? We don't want to point directly at the master DBs, since then ordinary user edits create rendering changes that look like regressions in our rendering; but we'd like to pull the newest DB snapshot from time to time to ensure we don't fall too far behind the current state of the projects.

Mon, Sep 17, 5:48 PM · Parsing-Team, Cloud-VPS
cscott added a comment to T204370: Behavior switch/magic word uniformity.

@Anomie The refactoring I am proposing would just make that distinction clearer: the "dispatcher" component would have zero wikitext specifics baked into it, so the "parse the wikitext looking for template invocations" code would be completely separate from the "register a namespace and dispatch invocations with arguments" code.

Mon, Sep 17, 5:45 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering.

This is very much WIP and not settled. I just wanted to document the conversations we had, since I think the bandwidth-reduction part is a genuinely new use-case/design criterion that we hadn't encountered before, and it could also motivate some of the "types for templates" ideas @ssastry has been working on.

Mon, Sep 17, 5:43 PM · Parsoid

Fri, Sep 14

cscott updated subscribers of T204366: Better varargs for templates.
Fri, Sep 14, 8:44 PM · MediaWiki-Parser, Parsoid
cscott updated the task description for T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering.
Fri, Sep 14, 7:58 PM · Parsoid
cscott added a comment to T204370: Behavior switch/magic word uniformity.

I'm cheating and squeezing other parts of my Evil Master Plan with that {{#expand}} thing. The goal there is to refactor the template expansion away from the preprocessor, which would just be a dispatcher. I can do that without actually making {{#expand}} callable by mere mortals... but why not?

Fri, Sep 14, 7:32 PM · MediaWiki-Parser, Parsoid
cscott updated the task description for T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering.
Fri, Sep 14, 6:50 PM · Parsoid
cscott created T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering.
Fri, Sep 14, 6:49 PM · Parsoid
cscott added a comment to T204370: Behavior switch/magic word uniformity.

Also at the stage where it handles parsing the arguments from the wikitext it doesn't know whether {{foo:bar is going to be referring to a variable or parser function named "foo" (with "bar" as the parameter) versus Template:Foo:bar versus page 'bar' in namespace Foo (versus, maybe, interwiki transclusion from site 'foo').

Fri, Sep 14, 6:31 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204370: Behavior switch/magic word uniformity.

Note that double-underscore magic words are automatically added as page properties (in the page_props table). You should be sure not to lose that behavior if someone uses your alternative syntax. Considering that they can't structurally have arguments, the justification for changing them given here ("with reliable quoting for arguments, instead of having a weird collection of ad hoc mechanisms for allowing arguments") doesn't seem to apply.

Fri, Sep 14, 6:29 PM · MediaWiki-Parser, Parsoid
cscott renamed T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar from Replace initial colon in (hash-prefiexed) parser function invocation with vertical bar to Replace initial colon in (hash-prefixed) parser function invocation with vertical bar.
Fri, Sep 14, 6:19 PM · MediaWiki-Parser, Parsoid
cscott renamed T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar from Replace initial colon in parser function invocation with vertical bar to Replace initial colon in (hash-prefiexed) parser function invocation with vertical bar.
Fri, Sep 14, 6:19 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar.

Yeah, there's a separate issue (T204370) which I think you've already seen which tries to ensure there's equivalent hash-prefixed and pipe-separated forms for the oddball magic words/parser functions. So I don't think I can/should/want to touch the "legacy" forms, if you want uniform syntax use the hash-prefixed alternatives.

Fri, Sep 14, 6:18 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

While thinking through the details of argument expansion, it's probably important to figure out how to pass heredoc-quoted arguments through to child templates safely. That is, if Template:Foo is:

{{SomeOtherTemplate|{{{1}}}}}

and I invoke it like:

{{Foo|<<<bar=bat>>>}}

I probably want to select two different behaviors: (a) deliberately unquoted, so SomeOtherTemplate is given the named argument bar, and (b) deliberately quoted, so SomeOtherTemplate is given a single unnamed argument with the literal value bar=bat.

Fri, Sep 14, 6:15 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott updated the task description for T204370: Behavior switch/magic word uniformity.
Fri, Sep 14, 6:04 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204370: Behavior switch/magic word uniformity.

@Dinoguy1000 thanks! Yeah, this is still a WIP, I'm floating it now especially to get feedback on what appropriate names might be. These names can be/are localized, too, so that's got to be considered when thinking of new names.

Fri, Sep 14, 5:53 PM · MediaWiki-Parser, Parsoid
cscott updated the task description for T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar.
Fri, Sep 14, 5:30 PM · MediaWiki-Parser, Parsoid
cscott renamed T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar from Replace initial colons in parser function invocation with vertical bar to Replace initial colon in parser function invocation with vertical bar.
Fri, Sep 14, 5:30 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204307: Parser Functions should support named parameters.

When opting in to "standard" named argument parsing, you might want to opt-in to replacing the first colon with a vertical bar to look even more like a "standard" template invocation: T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar.

Fri, Sep 14, 5:30 PM · MediaWiki-Parser, Parsoid
cscott created T204371: Replace initial colon in (hash-prefixed) parser function invocation with vertical bar.
Fri, Sep 14, 5:29 PM · MediaWiki-Parser, Parsoid
cscott created T204370: Behavior switch/magic word uniformity.
Fri, Sep 14, 5:22 PM · MediaWiki-Parser, Parsoid
cscott updated the task description for T204366: Better varargs for templates.
Fri, Sep 14, 4:52 PM · MediaWiki-Parser, Parsoid
cscott updated subscribers of T204366: Better varargs for templates.
Fri, Sep 14, 4:52 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204307: Parser Functions should support named parameters.

Thanks for the feedback!

Fri, Sep 14, 4:49 PM · MediaWiki-Parser, Parsoid
cscott updated subscribers of T204366: Better varargs for templates.
Fri, Sep 14, 4:47 PM · MediaWiki-Parser, Parsoid
cscott created T204366: Better varargs for templates.
Fri, Sep 14, 4:47 PM · MediaWiki-Parser, Parsoid
cscott added a comment to T204283: Serializing extension tags using TemplateData.

we might eventually see:

{{#tag:gallery
|<<<File:Detroit Publishing Co. - A Yeoman of the Guard (N.B. actually a Yeoman Warder), full restoration.jpg|1>>>
|<<<File:Official_program_-_Woman_suffrage_procession_March_3,_1913_-_crop.jpg|2>>>
|<<<File:Thurston, the famous magician - East Indian Rope Trick.jpg|3>>>
|<<<File:Joseph Ferdinand Keppler - The Pirate Publisher - Puck Magazine - Restoration by Adam Cuerden.jpg|4>>>
}}

Personally, I don't see that as being better. I find it more confusing, not less.

Fri, Sep 14, 4:14 PM · VisualEditor, TemplateData, Parsoid
cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

@Anomie is the expansion memoized? Presumably if I include {{1}} twice in guaranteed the same contents? (I should check this myself but I'm on mobile at the moment.)

Fri, Sep 14, 4:05 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

@Anomie ah, yes. Current template semantics expand the arguments before evaluating the template. I'd been thinking that would be maintained, but for general use it might be worth thinking through alternatives. For example, if you want to use heredoc syntax for parser functions (and I do: T204283, T204307) then you need to pass the parser function the raw unexpanded text. I think I can deal with treating "standard wikitext template expansion" as a bit of a special case (eager expansion of arguments). In theory {{Foo|{{bar}}}} could be sugar for {{#expand|Template:Foo|{{bar}}}}, where the implementation of the #expandparser function does an explicit eager expansion of its arguments before evaluating it further.

Fri, Sep 14, 9:42 AM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott created T204307: Parser Functions should support named parameters.
Fri, Sep 14, 9:29 AM · MediaWiki-Parser, Parsoid
cscott added a comment to T204283: Serializing extension tags using TemplateData.

Quick note (which maybe belongs better with {T90914: Provide semantic wiki-configurable styles for media display}) that ideally we could unify media layout options with templatedata as well; something like:

{{#media|File:Foobar.jpg|caption=baz}}

would be the desugaring of [[File:Foobar.jpg|baz]]. This would allow more interesting wikitext to be easily embedded in captions as well.

Fri, Sep 14, 9:06 AM · VisualEditor, TemplateData, Parsoid
cscott added a comment to T54607: TemplateData: Implement hook for extensions to document magic words and parser functions.

We should probably define a parser function which will invoke a behavior switch (if there isn't one already) so that we can handle behavior switches/magic words/parser functions uniformly: all macro insertions could be expressed using the {{...}} syntax, and the templatedata would be named based on the name inside the braces.

Fri, Sep 14, 1:35 AM · VisualEditor-MediaWiki, VisualEditor, TemplateData

Thu, Sep 13

cscott added a comment to T192037: Writeup some sort of position statement against subsets of wikitext.

My intuition is that the way we will eventually handle subsets will fall out of the way we add strong types to "trees" and "holes". That is, if a particular extension wants something that fits an "inline"-shaped hole, but is given something which is a "block", there will be a well-defined conversion mechanism (stripping the block tags, presumably). Similarly, if the hole is in a "link caption" context but we're given a "link", we'll strip the link to avoid a link-in-link error.

Thu, Sep 13, 11:09 PM · Parsing-Team
cscott added a comment to T187958: Parsoid and PHP parser parse <gallery caption="…"> differently.

<gallery caption="Foo&#10;&#10;bar"> ... </gallery> might be an interesting test case.

Thu, Sep 13, 11:01 PM · Parsoid-Read-Views, Patch-For-Review, MediaWiki-Parser
cscott updated the task description for T204283: Serializing extension tags using TemplateData.
Thu, Sep 13, 10:59 PM · VisualEditor, TemplateData, Parsoid
cscott updated the task description for T204283: Serializing extension tags using TemplateData.
Thu, Sep 13, 10:48 PM · VisualEditor, TemplateData, Parsoid
cscott renamed T204283: Serializing extension tags using TemplateData from TemplateData for extension tags to Serializing extension tags using TemplateData.
Thu, Sep 13, 10:48 PM · VisualEditor, TemplateData, Parsoid
cscott added a comment to T204283: Serializing extension tags using TemplateData.

Yeah. I guess this part of the bug could (eventually) be made more specific to "serializing extensions using TemplateData (and heredoc syntax)".

Thu, Sep 13, 10:47 PM · VisualEditor, TemplateData, Parsoid
cscott updated the task description for T204283: Serializing extension tags using TemplateData.
Thu, Sep 13, 10:34 PM · VisualEditor, TemplateData, Parsoid
cscott created T204283: Serializing extension tags using TemplateData.
Thu, Sep 13, 9:55 PM · VisualEditor, TemplateData, Parsoid
cscott updated the task description for T204279: Fine-grained Sanitizer control.
Thu, Sep 13, 9:09 PM · Security, Parsing-Team, Parsoid
cscott created T204279: Fine-grained Sanitizer control.
Thu, Sep 13, 9:05 PM · Security, Parsing-Team, Parsoid
cscott added a comment to T161278: Add default gadget styling to Parsoid's output.

The other thing the Gadget extension does in the BeforePageDisplay hook is add warnings about using legacy APIs as raw HTML concatenated to the end of the display page. This is the sort of thing we probably *don't* want to expose/allow in the Parse API?

Thu, Sep 13, 7:57 PM · Parsoid-Read-Views, Patch-For-Review, Gadgets, MediaWiki-API
cscott claimed T161278: Add default gadget styling to Parsoid's output.
Thu, Sep 13, 5:37 PM · Parsoid-Read-Views, Patch-For-Review, Gadgets, MediaWiki-API
cscott added a comment to T161278: Add default gadget styling to Parsoid's output.

Strawman: what if we (a) deprecate using BeforePageDisplay to add modules, but (b) add a new hook (AddParserModules?) which is run at basically the same time, but which is also run during the Parse API when useskin is enabled. That allows us to incrementally migrate users of BeforePageDisplay as a module-hook, without risking breakage by running BeforePageDisplay in the ParseAPI where existing callers may not expect.

Thu, Sep 13, 5:36 PM · Parsoid-Read-Views, Patch-For-Review, Gadgets, MediaWiki-API

Wed, Sep 12

cscott added a project to T202905: Outreach-17 Project: Add a new Linter Category: Links-in-Links: Outreachy (Round 17).
Wed, Sep 12, 6:41 PM · Parsoid-Linter, Outreach-Programs-Projects, Outreachy (Round 17), MediaWiki-extensions-Linter

Mon, Sep 10

cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

inside a template and immediately follows a | or = with no whitespace *and* the >>> is immediately followed by a | or }}

Even if you want to be conservative in the initial version, template arguments need to accept surrounding whitespace. If you don't, people will be very confused why it's broken. There's a strong expectation that we can uses spaces and newlines to format template parameters. A parameter may be surrounded by spaces, and parameters are often split on individual lines.

Mon, Sep 10, 10:22 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
cscott added a comment to T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments).

and the whole point of this syntax is that you should be able to surround literally anything with no additional escaping needed

Doesn't that statement conflict with allowing expansion of templates inside a quoted-argument?

Mon, Sep 10, 10:17 PM · Patch-For-Review, Parsing-Team, Wikimedia-Developer-Summit-2016, TechCom-RFC
Gerrit Code Review <gerrit@wikimedia.org> committed rJWTH862b4bec1350: Merge "Add lunr search" (authored by cscott).
Merge "Add lunr search"
Mon, Sep 10, 7:16 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rJWTH1d4195e26d64: Merge "Open member description when using hash fragments to link to it" (authored by cscott).
Merge "Open member description when using hash fragments to link to it"
Mon, Sep 10, 7:16 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rJWTH6f61509a7e43: CI is not set up, so add 'submit' permission. (authored by cscott).
CI is not set up, so add 'submit' permission.
Mon, Sep 10, 5:16 PM
cscott added a comment to T202905: Outreach-17 Project: Add a new Linter Category: Links-in-Links.

This could be a good outreachy task...

Mon, Sep 10, 3:15 PM · Parsoid-Linter, Outreach-Programs-Projects, Outreachy (Round 17), MediaWiki-extensions-Linter

Thu, Sep 6

cscott added a comment to T203583: {{subst:REVISIONUSER}} no longer substitutes into the current user name, but the username of the last revision .

@Tgr agreed, but I was assuming @Bawolff's issue was just a risk, not an actual vulnerability; I was just putting forward a strawman guess at what he was thinking. Maybe @Bawolff can create a new issue for the privacy risk (security-tagged if it's a vulnerability), and we can discuss whether or how we might mitigate them or deprecate the REVISION* magic words independent of this particular MCR regression.

Thu, Sep 6, 7:54 PM · MW-1.32-release-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), User-notice, Patch-For-Review, Multi-Content-Revisions (MCR-SDC File Caption Support - phase 2), Regression, MediaWiki-Parser, MediaWiki-Page-editing
cscott added a comment to T203583: {{subst:REVISIONUSER}} no longer substitutes into the current user name, but the username of the last revision .

Privacy risk => because a cross-site request can do a GET to an action api endpoint with {{subst}} and get back the current logged in user? I'm guessing here. Maybe @Bawolff can elaborate.

Thu, Sep 6, 7:37 PM · MW-1.32-release-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), User-notice, Patch-For-Review, Multi-Content-Revisions (MCR-SDC File Caption Support - phase 2), Regression, MediaWiki-Parser, MediaWiki-Page-editing

Tue, Aug 28

cscott added a comment to T194651: Smooth out quirks in the <poem> tag to ease consistent rendering across parsers.

My latest reimplementation of <poem> is indent-pre with the following differences:

  • <poem> is displayed as normal text with white-space: pre-wrap, not in monospace with white-space: pre and a box around it. And because it's a tag, it can have inline styles, classes, etc.
  • Multiple consecutive intra-line space characters are folded into a single space. (We generally agree that we want to get rid of this difference in the long term.)
  • Block-level #*:; formatting and horizontal rules ---- are permitted.
Tue, Aug 28, 9:22 PM · Patch-For-Review, MediaWiki-extensions-Poem
cscott added a comment to T202905: Outreach-17 Project: Add a new Linter Category: Links-in-Links.

From the perspective of the wikitext preprocessor, [ and ] are not currently "seen" by the preprocessor. So any [... [[ ... ]] ... ] construct is an invalid link, but the preprocessor can't tell that currently. So that's the specific case which https://gerrit.wikimedia.org/r/396049 would help with.

Tue, Aug 28, 9:16 PM · Parsoid-Linter, Outreach-Programs-Projects, Outreachy (Round 17), MediaWiki-extensions-Linter
cscott added a comment to T193366: Unidirectional Hanyu Pinyin output for Chinese LanguageConverter, as a proof-of-concept/testing system.

I'm working on LanguageConverter these days, so this would indeed be an interesting project. I could advise, but I don't have the linguistic expertise to actually develop the converter.

Tue, Aug 28, 4:55 PM · MediaWiki-Language-converter, I18n, Chinese-Sites
cscott added a comment to T165882: New namespace for zh-min-nan Wikipedia.

It would be interesting to explore the use of LanguageConverter for this wiki, as @Liuxinyu970226 suggests in T193366.

Tue, Aug 28, 4:54 PM · Wikimedia-Site-requests
cscott added a comment to T202905: Outreach-17 Project: Add a new Linter Category: Links-in-Links.

https://gerrit.wikimedia.org/r/396049 could be used to "fix" the behavior in the core parser, once we've wikilinted the problem away.

Tue, Aug 28, 3:44 PM · Parsoid-Linter, Outreach-Programs-Projects, Outreachy (Round 17), MediaWiki-extensions-Linter

Wed, Aug 22

cscott added a comment to T199941: Fatal MWException in Babel: "Language::isValidBuiltInCode must be passed a string" .
  • Started after https://gerrit.wikimedia.org/r/442200 got merged: this caused LanguageCode::bcp47('nrm') == 'nrf', among other normalizations.
  • As @Krinkle noted on https://gerrit.wikimedia.org/r/446766 the fatal exception came from mGenerateContent calling LanguageBabelBox::render() for a LanguageBabelBox instance having an invalid language code that Language::factory rejects.
    • In mParseParameter(): BabelLanguageCodes::getCode('nrm') == 'nrm' ("Narom")
    • In LanguageBabelBox::__construct(): bcp47('nrm') == 'nrf' ("Norman")
    • In LanguageBabelBox::render(): getCode('nrf') == false (not in LanguageCode::deprecatedLanguageCodeMapping() nor MediaWiki\Languages\Data\Names)
    • In LanguageBabelBox::render(): Language:getFactory(false) => CRASH
Wed, Aug 22, 9:35 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Language-Team, Editing-team, Regression, Wikimedia-production-error, MediaWiki-extensions-Babel
cscott awarded T202481: Parser should have a msg() helper function so people don't localize messages improperly a Like token.
Wed, Aug 22, 2:05 PM · Google-Code-in-2018, MediaWiki-Parser

Tue, Aug 21

cscott added a comment to T197879: Fix mw:DisplaySpace to match PHP "armorFrenchSpaces".

From T106561, it appears that our strategy so far (for the space before colons) ends up leaving empty span tags in some VE edits, probably related to some copy/paste operation that isn't preserving the "this is a french space" metadata.

Tue, Aug 21, 10:24 PM · Parsoid-Read-Views, Parsoid-Rendering
cscott updated subscribers of T197879: Fix mw:DisplaySpace to match PHP "armorFrenchSpaces".

@ssastry suggested in comments on https://gerrit.wikimedia.org/r/441410 that Parsoid should probably do this as a post-processing step on the DOM, instead of trying to do this in the tokenizer. That sounds reasonable to me.

Tue, Aug 21, 10:19 PM · Parsoid-Read-Views, Parsoid-Rendering
cscott updated subscribers of T181441: Percent symbol not preceded by non-breaking space..

I agree with @Arlolra's resolution as dup. The "french space" armoring was tweaked in T197902, but has always added non-breaking spaces before ?:;!%». Parsoid hasn't ever really supported this properly.

Tue, Aug 21, 10:15 PM · Parsoid

Aug 15 2018

cscott added a comment to T58656: Create a new, nice logo for Parsoid.

Note: zazzle's indexer is slow; it will probably take 24hrs (from 2018-08-15 2100 UTC) before the above URL has non-broken links.

Aug 15 2018, 9:00 PM · Design, Parsoid

Aug 14 2018

cscott added a comment to T201572: Publish Source Code for wikimediafoundation.org.

Even though it might not be the "preferred" representation of our content, as a matter of good faith we could publish a dump of the wordpress CMS content, either as a raw database dump or in whatever "better" format wordpress may support.

Aug 14 2018, 9:35 PM · wikimediafoundation.org
cscott awarded T200742: partial German translation shows up in English text on wikimediafoundation.org a Heartbreak token.
Aug 14 2018, 3:47 PM · wikimediafoundation.org

Aug 8 2018

cscott added a comment to T198477: [he.wiki] Reference list is emptied when adding a new reference on a wiki that uses template generated references (not <ref> tags).

From the parser side, i think I'd prefer supporting template-generated tags properly, rather than making tag parsing locale-specific. Using templates for localization works both for localizing "global templates" as well as localizing tags, so it seems worth making our template system work better rather than invent new parser features. More at https://phabricator.wikimedia.org/T30980#3618579

Aug 8 2018, 5:49 PM · VisualEditor (Current work)

Aug 7 2018

cscott added a comment to T198511: VisualEditor losing Media: links.

Looks like Media handling was added back in T151277; we probably should have ensured VE handled these correctly at the same time, but it looks like our focus was on read-view functionality.

Aug 7 2018, 2:33 AM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)

Aug 6 2018

cscott added a comment to T198511: VisualEditor losing Media: links.

You should look at the resource attribute of the <a> tag, I believe, instead of the title attribute. And you shouldn't be looking at Media:.., you should be looking at VE's link type or a boolean probably.

Aug 6 2018, 11:30 PM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)
cscott added a comment to T198511: VisualEditor losing Media: links.

re: "on VE side no way to distinguish" -- I mean from the UX perspective. Parsoid does (and should, IMO) return different HTML for the two cases, of course. And in fact the href and label are tightly constrained by the functional aspects, so there's no much room for Parsoid to return different HTML. VE should/must handle these better, with improved UX. It can/should provide the UX from your first example even when given the HTML for your second example (which again, we can't really change much because of the functional requirements on the href).

Aug 6 2018, 10:48 PM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)
cscott added a comment to T198511: VisualEditor losing Media: links.

I expect selective serialization is in play here as well: the original poster and @matmarex were likely testing on a setup w/o RESTBase, which means selective serialization might not have been enabled. That would lead to dirty diffs when even null edits were made. On WMF wikis, I don't expect we'd change/corrupt Media: links unless you actually edited the link itself.

Sorry to derail the discussion a little bit, but is selective serialization something that I should have on now? It defaults to false in the configuration file.

Aug 6 2018, 9:18 PM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)
cscott added a comment to T198511: VisualEditor losing Media: links.

To clarify, I'm suggesting this is primarily (solely?) a VE bug.

Aug 6 2018, 8:33 PM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)
cscott added a comment to T198511: VisualEditor losing Media: links.

From a UX standpoint, I'd expect that "link to file page on commons" vs "link directly to media download" is a checkbox in the link options somehow. I think we'd need some insight from a designer on how to best express this w/in the current VE framework. The magical checkbox would determine whether VE gave us back the mw:MediaLink or mw:ExtLink type.

Aug 6 2018, 7:28 PM · Patch-For-Review, Parsoid, VisualEditor-Media, VisualEditor (Current work)

Aug 3 2018

cscott added a comment to T201208: zhwiki in beta fails with a PHP fatal on main page..

A few more details from a cursory investigation:

Aug 3 2018, 9:50 PM · Beta-Cluster-Infrastructure
cscott created T201208: zhwiki in beta fails with a PHP fatal on main page..
Aug 3 2018, 9:43 PM · Beta-Cluster-Infrastructure
cscott closed T184573: Re-enable documentation for ES6-using classes, a subtask of T138401: Replace jsduck with JSDoc3 across all Wikimedia code bases, as Resolved.
Aug 3 2018, 9:35 PM · Epic, Readers-Web-Backlog (Tracking), Technical-Debt (RW-Tech-Debt), Front-end-Standards-Group, MobileFrontend, Documentation
cscott closed T184573: Re-enable documentation for ES6-using classes as Resolved.

Fixed, we're using jsdoc and the jsdoc-wmf-theme now.

Aug 3 2018, 9:35 PM · Parsoid
cscott closed T184573: Re-enable documentation for ES6-using classes, a subtask of T156469: jsduck doesn't support ES6, as Resolved.
Aug 3 2018, 9:35 PM · Documentation, MediaWiki-Documentation
cscott closed T197902: Be more selective in applying French Space armoring as Resolved.
Aug 3 2018, 9:34 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, MediaWiki-Parser
cscott added a comment to T200142: Create a logo for LanguageConverter.

No worries, I was adding it to the queue for the next hackathon or whatever. @Shizhao has some nice ideas focused around the original traditional/simplified Chinese application. I'd love to see a slightly broader take understandable to folks who don't recognize the difference between simplified and traditional chinese characters. ;) Low priority, but I pledge to use any new logo (a) in the tech talk I will give "soon" about Language Converter, and (b) on a t-shirt I will wear to future wikimanias and other conferences. :)

Aug 3 2018, 9:31 PM · MediaWiki-Language-converter

Jul 25 2018

cscott added a comment to T146397: Edit interface should be nicer.

Some progress at https://wikimania2018.wikimedia.org/wiki/Program/Hackathon_Showcase including https://www.youtube.com/watch?v=j3GBJt6lL7s and https://en.wikipedia.org/wiki/File:FileAnnotation_with_Wikidata_statements_demonstration.ogg

Jul 25 2018, 7:00 PM · Structured-Data-Commons, Wikidata, Multimedia, FileAnnotations (Production release)

Jul 22 2018

cscott added a comment to T58656: Create a new, nice logo for Parsoid.

Teapot is "fun" logo, for team T-shirts. The one @ssastry likes would be the "official" logo, for the wiki, etc. The version with just the brackets and the sunflower (without the word "parsoid") would be appropriate for compact use, for example on the sheet of little hexagon logo stickers that somehow appear at every wikimania/hackathon/allhands.

Jul 22 2018, 4:55 AM · Design, Parsoid
cscott added a comment to T58656: Create a new, nice logo for Parsoid.

I thought I'd see what it would look like with the stems angled the other way...

Jul 22 2018, 3:09 AM · Design, Parsoid
cscott created T200142: Create a logo for LanguageConverter.
Jul 22 2018, 12:45 AM · MediaWiki-Language-converter
cscott updated subscribers of T58656: Create a new, nice logo for Parsoid.

I think @Trevor-at-Wikia probably should get one as well, considering we (ab)used his wordmark.

Jul 22 2018, 12:40 AM · Design, Parsoid

Jul 21 2018

cscott added a comment to T58656: Create a new, nice logo for Parsoid.

@Isarra and @cscott think that a purple T-shirt with the terrible teapot on the front, and <parsoid> on the back would be pretty ... "terrible"? In the best sense of the word.

Jul 21 2018, 11:15 PM · Design, Parsoid
cscott added a comment to T58656: Create a new, nice logo for Parsoid.

I slightly tweaked @Isarra's teapot logo to mash it together with Trevor's logo because, uh, I like slanted stems on my p and d's? So, hey, choices. https://commons.wikimedia.org/wiki/File:Parsoid_terrible_logo.svg

Jul 21 2018, 11:03 PM · Design, Parsoid
cscott added a comment to T58656: Create a new, nice logo for Parsoid.

For the historical record, @Isarra pointed me at https://commons.wikimedia.org/wiki/File:Parsoid_terrible_logo.svg -- terrible horrible wikitext goes in to the pot, beautiful sparklies come out.

Jul 21 2018, 10:37 PM · Design, Parsoid

Jul 20 2018

cscott added a comment to T198970: Epic: Implement SEO improvements suggested by Go Fish Digital.

Z. Z. from Google is at Wikimania. He confirmed they still spider the site at a low rate, but only to check errors (ie sanity check their internal representation against what the site actually displays to keep us honest/validate our parsing/validate their internal pipeline). They use a variety of sources to build their representation, including ores, wikidata, restbase, the recentchanges feed, and direct queries to the action API.

Jul 20 2018, 8:47 AM · SEO, Epic
cscott created T200057: Separate dumps for Items and Properties.
Jul 20 2018, 3:47 AM · Wikidata
cscott updated subscribers of T199941: Fatal MWException in Babel: "Language::isValidBuiltInCode must be passed a string" .

There's a patch with a C+1, but it's not merged (waiting for review from @Nikerabbit I think). I'm not sure what "recent change" @Nemo_bis is referring to; if you wanted to revert core patches I think you'd need to revert both I807dd55d49e9bd19443329231326a5b0d3e6c453 and I8468a56d5b88f5786abd0a17b67bda2f1687fd0c (the latter is on top of the former).

Jul 20 2018, 2:15 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Language-Team, Editing-team, Regression, Wikimedia-production-error, MediaWiki-extensions-Babel
cscott added a comment to T159014: Disambiguator parser tests broken due to parser tests using dummy parser.

I remember when Tim wrote that patch (https://gerrit.wikimedia.org/r/314490). My vague memory is that it was motivated by the difficulty trying to use a parser test to debug the parser. You couldn't set breakpoints or emit sensible logging because every !! article added triggered the parser (to parse the completely-automatically-generated-and-meaningless revision comment), so you'd have to dig through a couple hundred parser invocations before you got to the place where the actual parser test you were interested in was being run.

Jul 20 2018, 12:05 AM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), MediaWiki-Parser, MediaWiki-extensions-Disambiguator