Page MenuHomePhabricator

Parsoid doesn't handle Japanese behavior switches with U+FF3F (full width underscore)
Closed, ResolvedPublic

Description

Parsoid assumes all behavior switches start with __, which isn't the case in practice. We should deprecate behavior switches which don't start with __ or U+FF3F U+FF3F (the full width equivalent) and add support to Parsoid for handling behavior switches which start with U+FF3F

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
mediawiki/vendormaster+27 K -20 K
mediawiki/services/parsoidmaster+25 K -18 K
mediawiki/coremaster+43 -19
mediawiki/coreREL1_39+4 -4
mediawiki/coreREL1_39+323 -84
mediawiki/coreREL1_43+4 -4
mediawiki/coreREL1_44+4 -4
mediawiki/coreREL1_44+56 -37
mediawiki/coreREL1_43+56 -37
mediawiki/coremaster+36 -0
mediawiki/coreREL1_45+4 -4
mediawiki/corewmf/1.45.0-wmf.25+4 -4
mediawiki/coremaster+4 -4
mediawiki/vendorREL1_45+11 K -572
mediawiki/services/parsoidmaster+11 K -545
mediawiki/extensions/Scribuntomaster+13 -2
mediawiki/coremaster+60 -37
mediawiki/coremaster+18 -1
Show related patches Customize query in gerrit

Event Timeline

Change #1196178 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Support U+FF3F as behavior switch prefix, as used on jawiki

https://gerrit.wikimedia.org/r/1196178

Change #1196126 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] parser: Ensure that double-underscore magic words actually begin with `__`

https://gerrit.wikimedia.org/r/1196126

Change #1196182 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] i18n: all double-underscore magic words must start with double-underscore

https://gerrit.wikimedia.org/r/1196182

Bugreporter2 renamed this task from Parsoid doesn't handle japanese behavior switches with U+FF3F (full width underscore) to Parsoid doesn't handle Japanese behavior switches with U+FF3F (full width underscore).Oct 15 2025, 12:37 PM
Bugreporter2 updated the task description. (Show Details)

Change #1196527 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add "double wide underscore" to wikitext escape list

https://gerrit.wikimedia.org/r/1196527

Change #1198419 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] WIP: Get the list of behavior switch magic words from core

https://gerrit.wikimedia.org/r/1198419

Change #1198597 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Scribunto@master] Add "double wide underscore" to wikitext escape list

https://gerrit.wikimedia.org/r/1198597

Change #1198600 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] i18n: Remove deprecated aliases for double-underscore magic words

https://gerrit.wikimedia.org/r/1198600

Change #1196527 merged by jenkins-bot:

[mediawiki/core@master] Add "double wide underscore" to wikitext escape list

https://gerrit.wikimedia.org/r/1196527

Change #1196182 merged by jenkins-bot:

[mediawiki/core@master] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1196182

Change #1198597 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] Add "double wide underscore" to wikitext escape list

https://gerrit.wikimedia.org/r/1198597

Change #1196178 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Support U+FF3F as behavior switch prefix, as used on jawiki

https://gerrit.wikimedia.org/r/1196178

Change #1199793 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@REL1_45] Bump wikimedia/parsoid to 0.22.0

https://gerrit.wikimedia.org/r/1199793

Change #1199793 merged by C. Scott Ananian:

[mediawiki/vendor@REL1_45] Bump wikimedia/parsoid to 0.22.0

https://gerrit.wikimedia.org/r/1199793

Change #1200091 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1200091

Change #1200091 merged by jenkins-bot:

[mediawiki/core@master] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1200091

Change #1201070 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@wmf/1.45.0-wmf.25] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201070

Change #1201070 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.25] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201070

Mentioned in SAL (#wikimedia-operations) [2025-11-03T14:53:50Z] <lucaswerkmeister-wmde@deploy2002> Started scap sync-world: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]]

Mentioned in SAL (#wikimedia-operations) [2025-11-03T14:55:58Z] <lucaswerkmeister-wmde@deploy2002> cscott, lucaswerkmeister-wmde: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-11-03T15:03:36Z] <lucaswerkmeister-wmde@deploy2002> Finished scap sync-world: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]] (duration: 09m 45s)

Change #1201091 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_45] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201091

Change #1201091 merged by jenkins-bot:

[mediawiki/core@REL1_45] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201091

Change #1196126 abandoned by C. Scott Ananian:

[mediawiki/core@master] parser: Ensure that double-underscore magic words actually begin with `__`

Reason:

Abandoned in favor of I8dd522d605c2b9e5310d169d7c51bcf424089497

https://gerrit.wikimedia.org/r/1196126

Change #1201166 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_44] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201166

Change #1201169 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_44] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201169

Change #1201170 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_43] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201170

Change #1201172 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_43] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201172

Change #1201174 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_39] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201174

Change #1201170 merged by jenkins-bot:

[mediawiki/core@REL1_43] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201170

Change #1201166 merged by jenkins-bot:

[mediawiki/core@REL1_44] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201166

Change #1201169 merged by jenkins-bot:

[mediawiki/core@REL1_44] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201169

Change #1201180 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@REL1_39] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201180

Change #1201172 merged by jenkins-bot:

[mediawiki/core@REL1_43] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201172

Change #1201174 merged by jenkins-bot:

[mediawiki/core@REL1_39] i18n: deprecate double-underscore magic words which don't start/end with __

https://gerrit.wikimedia.org/r/1201174

Change #1201180 merged by jenkins-bot:

[mediawiki/core@REL1_39] i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep

https://gerrit.wikimedia.org/r/1201180

Change #1198600 merged by jenkins-bot:

[mediawiki/core@master] i18n: Remove deprecated aliases for double-underscore magic words

https://gerrit.wikimedia.org/r/1198600

Change #1198419 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Get the list of behavior switch magic words from core

https://gerrit.wikimedia.org/r/1198419

MSantos triaged this task as Medium priority.Nov 21 2025, 10:26 AM

Change #1210725 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a6

https://gerrit.wikimedia.org/r/1210725

Change #1210725 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a6

https://gerrit.wikimedia.org/r/1210725