Page MenuHomePhabricator

Remove whitelist/blacklist in Parsoid
Open, MediumPublic

Description

Use better/more descriptive terms.

Event Timeline

Change 603574 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Rename 'TagWhiteList'

https://gerrit.wikimedia.org/r/603574

Change 603578 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Rename WikiLinkHandler::isWhitelistedOpt

https://gerrit.wikimedia.org/r/603578

Change 603580 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Replace a few occurences of "whitelist" in comments w/ more appropriate terms

https://gerrit.wikimedia.org/r/603580

Change 603581 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Fix an incorrect 'whitelist' in XMLSerializer

https://gerrit.wikimedia.org/r/603581

Change 603574 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Rename 'TagWhiteList'

https://gerrit.wikimedia.org/r/603574

Change 603578 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Rename WikiLinkHandler::isWhitelistedOpt

https://gerrit.wikimedia.org/r/603578

ssastry triaged this task as Medium priority.

Change 603580 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Replace a few occurences of "whitelist" in comments w/ more appropriate terms

https://gerrit.wikimedia.org/r/603580

Change 603581 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Fix an incorrect 'whitelist' in XMLSerializer

https://gerrit.wikimedia.org/r/603581

Change 603599 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Fix an incorrect 'whitelist' in XMLSerializerTest

https://gerrit.wikimedia.org/r/603599

FYI: @Arlolra already submitted https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/602518 and I merged it last week.

Excellent! Thanks, @Arlolra.

It appears that the remaining work here is in Sanitizer, which I'd like to fix upstream first (since our code is an incomplete/extended copy of upstream) and T75581: Parsoid doesn't support image allow/deny configuration, which I'd like to fix via T254802: Make the external image block list / allow list into an extension if I can -- that is, if I'm going to be deprecating a bunch of old names for config variables and functions anyway, I think I might as well move that code out of core in a way which lets Parsoid use it (via a new clean hook with a new clean name) at the same time.

Change 603599 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Fix an incorrect 'whitelist' in XMLSerializerTest

https://gerrit.wikimedia.org/r/603599

Change 603610 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Remove one more reference to 'whitelist', this time in a parserTest comment

https://gerrit.wikimedia.org/r/603610

Change 603610 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Remove one more reference to 'whitelist', this time in a parserTest comment

https://gerrit.wikimedia.org/r/603610

Change 605678 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a17

https://gerrit.wikimedia.org/r/605678

Change 605678 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a17

https://gerrit.wikimedia.org/r/605678

Reedy added a subscriber: Reedy.

Re-opening... There seems to be several usages still around

I know some of these can't be changed until other dependancies are done...

Targets
    Occurrences of '(white|black)[ \-]?list' in Directory /Users/reedy/PhpstormProjects/mediawiki/core/vendor/wikimedia/parsoid
Found Occurrences  (103 usages found)
    core  (103 usages found)
        vendor  (103 usages found)
            wikimedia  (103 usages found)
                parsoid  (103 usages found)
                    baseconfig  (51 usages found)
                        2  (34 usages found)
                            arwiki.json  (2 usages found)
                                2406 "imagewhitelistenabled": false,
                                10691 "GlobalBlockWhitelist",
                            be-taraskwiki.json  (2 usages found)
                                2293 "imagewhitelistenabled": false,
                                10481 "GlobalBlockWhitelist",
                            cawiki.json  (2 usages found)
                                2470 "imagewhitelistenabled": false,
                                10870 "GlobalBlockWhitelist",
                            cswiki.json  (2 usages found)
                                2510 "imagewhitelistenabled": false,
                                10999 "GlobalBlockWhitelist",
                            dewiki.json  (2 usages found)
                                2399 "imagewhitelistenabled": false,
                                10670 "GlobalBlockWhitelist",
                            enwiki.json  (2 usages found)
                                2209 "imagewhitelistenabled": false,
                                10333 "GlobalBlockWhitelist",
                            eswiki.json  (2 usages found)
                                2545 "imagewhitelistenabled": false,
                                10888 "GlobalBlockWhitelist",
                            fawiki.json  (2 usages found)
                                2645 "imagewhitelistenabled": false,
                                10898 "GlobalBlockWhitelist",
                            fiwiki.json  (2 usages found)
                                2323 "imagewhitelistenabled": false,
                                10568 "GlobalBlockWhitelist",
                            frwiki.json  (2 usages found)
                                2444 "imagewhitelistenabled": false,
                                10992 "GlobalBlockWhitelist",
                            iswiki.json  (2 usages found)
                                2178 "imagewhitelistenabled": false,
                                10395 "GlobalBlockWhitelist",
                            kaawiki.json  (2 usages found)
                                2672 "imagewhitelistenabled": false,
                                11073 "GlobalBlockWhitelist",
                            lnwiki.json  (2 usages found)
                                2351 "imagewhitelistenabled": false,
                                10902 "GlobalBlockWhitelist",
                            nlwiki.json  (2 usages found)
                                2377 "imagewhitelistenabled": false,
                                10702 "GlobalBlockWhitelist",
                            srwiki.json  (2 usages found)
                                2661 "imagewhitelistenabled": false,
                                11037 "GlobalBlockWhitelist",
                            trwiki.json  (2 usages found)
                                2418 "imagewhitelistenabled": false,
                                10698 "GlobalBlockWhitelist",
                            zhwiki.json  (2 usages found)
                                2750 "imagewhitelistenabled": false,
                                11271 "GlobalBlockWhitelist",
                        arwiki.json  (1 usage found)
                            9760 "GlobalBlockWhitelist",
                        be-taraskwiki.json  (1 usage found)
                            9551 "GlobalBlockWhitelist",
                        cawiki.json  (1 usage found)
                            9934 "GlobalBlockWhitelist",
                        cswiki.json  (1 usage found)
                            10058 "GlobalBlockWhitelist",
                        dewiki.json  (1 usage found)
                            9740 "GlobalBlockWhitelist",
                        enwiki.json  (1 usage found)
                            9385 "GlobalBlockWhitelist",
                        eswiki.json  (1 usage found)
                            9945 "GlobalBlockWhitelist",
                        fawiki.json  (1 usage found)
                            9953 "GlobalBlockWhitelist",
                        fiwiki.json  (1 usage found)
                            9623 "GlobalBlockWhitelist",
                        frwiki.json  (1 usage found)
                            10052 "GlobalBlockWhitelist",
                        iswiki.json  (1 usage found)
                            9465 "GlobalBlockWhitelist",
                        kaawiki.json  (1 usage found)
                            10147 "GlobalBlockWhitelist",
                        lnwiki.json  (1 usage found)
                            9976 "GlobalBlockWhitelist",
                        nlwiki.json  (1 usage found)
                            9772 "GlobalBlockWhitelist",
                        srwiki.json  (1 usage found)
                            10103 "GlobalBlockWhitelist",
                        trwiki.json  (1 usage found)
                            9764 "GlobalBlockWhitelist",
                        zhwiki.json  (1 usage found)
                            10334 "GlobalBlockWhitelist",
                    src  (18 usages found)
                        Core  (17 usages found)
                            Sanitizer.php  (17 usages found)
                                71 * Blacklist for evil uris like javascript:
                                72 * WARNING: DO NOT use this in any place that actually requires blacklisting
                                73 * for security reasons. There are NUMEROUS[1] ways to bypass blacklisting, the
                                74 * only way to be secure from javascript: uri based xss vectors is to whitelist
                                98 * Strip them before further processing so blacklists and such work.
                                129 * Fetch the whitelist of acceptable attributes for a given element name.
                                134 public static function attributeWhitelist( string $element ): array {
                                136 $lists = self::setupAttributeWhitelist();
                                146 private static function setupAttributeWhitelist(): array {
                                147 static $whitelist;
                                149 if ( $whitelist !== null ) {
                                150 return $whitelist;
                                207 $whitelist = [
                                301 # whitelist is used from the Parser object
                                368 return $whitelist;
                                857 $wlist = self::attributeWhitelist( $tag );
                                890 // Bypass RDFa/whitelisting checks for Parsoid-inserted attrs
                        Ext  (1 usage found)
                            Gallery  (1 usage found)
                                Opts.php  (1 usage found)
                                    48 $validUlAttrs = Sanitizer::attributeWhitelist( 'ul' );
                    tests  (34 usages found)
                        parser  (3 usages found)
                            legacyMediaParserTests.txt  (2 usages found)
                                3088 # blacklist and we'll catch selser regressions based on changes to the
                                3089 # blacklist entries for selser tests.
                            parserTests.txt  (1 usage found)
                                13600 ### For now, we are blacklisting them, and using this to test selser.
                        phpunit  (1 usage found)
                            Parsoid  (1 usage found)
                                Config  (1 usage found)
                                    Api  (1 usage found)
                                        data  (1 usage found)
                                            siteinfo.reqdata  (1 usage found)
                                                2 mysql","dbversion":"10.1.39-MariaDB","imagewhitelistenabled":false,"langconversion":true,"titleconversion":true,"linkprefixcharset":"","linkprefix":"",
                        TestUtils.js  (30 usages found)
                            363 * @param {boolean} blacklistChanged
                            366 var reportSummary = function(modesRan, stats, file, loggedErrorCount, testFilter, blacklistChanged) {
                            442 // If the blacklist changed, complain about it.
                            443 if (blacklistChanged) {
                            444 console.log("Blacklist changed!".red);
                            450 if (blacklistChanged) {
                            452 console.log("Use `bin/parserTests.js --rewrite-blacklist` to update blacklist.");
                            490 * @param {boolean} expectFail Whether this test was expected to fail (on blacklist).
                            492 * @param {Object} bl BlackList.
                            508 var blacklisted = false;
                            509 if (ScriptUtils.booleanOption(options.blacklist) && expectFail) {
                            513 blacklisted = true;
                            530 if (blacklisted) {
                            531 console.log('UNEXPECTED BLACKLIST FAIL'.red.inverse + ': ' + extTitle.yellow);
                            532 console.log('Blacklisted, but the output changed!'.red);
                            566 * @param {boolean} expectSuccess Whether this success was expected (or was this test blacklisted?).
                            576 if (ScriptUtils.booleanOption(options.blacklist) && !expectSuccess) {
                            658 * @param {Object} bl BlackList.
                            770 var reportSummaryXML = function(modesRan, stats, file, loggedErrorCount, testFilter, blacklistChanged) {
                            794 var blacklisted = false;
                            795 if (ScriptUtils.booleanOption(options.blacklist) && expectFail) {
                            797 blacklisted = (bl[title][mode] === actual.raw);
                            799 if (!blacklisted) {
                            932 'blacklist': {
                            933 description: 'Compare against expected failures from blacklist',
                            937 'rewrite-blacklist': {
                            938 description: 'Update parserTests-blacklist.json with failing tests.',
                            1007 if (ScriptUtils.booleanOption(options['rewrite-blacklist'])) {
                            1008 // turn on all modes by default for --rewrite-blacklist
                            1012 console.log("\nERROR> can't combine --rewrite-blacklist with --filter, --maxtests or --exit-unexpected");

The JS code (TestUtils.js) may be dead or may have lot of dead code that can be purged. baseconfig/* may potentially be dead code. If not, it will have to wait for core changes to land.
Sanitizer changes are blocked on core (since we've tried to keep that code closely synced with the core version).

The only fix we can make right now is in tests/parser/legacyMeidaParserTests.txt.

Should we switch from "master" to "main" as the default branch everywhere?

This comment was removed by Reedy.

Should we switch from "master" to "main" as the default branch everywhere?

I made this change just now. So far, it looks like we'll need close any open changes in gerrit before being able to remove the branch entirely.

I made this change just now

For posterity, so far,

  • Created the branch and git pushed it
  • In gerrit's repository admin ui, selected main as the HEAD
  • In github, changed main to the default branch
  • Retargeted a few PSs on gerrit

Change 681466 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@main] Make the default branch main for git review

https://gerrit.wikimedia.org/r/681466

Change 681466 merged by jenkins-bot:

[mediawiki/services/parsoid@main] Make the default branch main for git review

https://gerrit.wikimedia.org/r/681466

Should we switch from "master" to "main" as the default branch everywhere?

This T279612: Gerrit: default branch for new projects should be "main". But doing this just on Parsoid alone is just going to end up being disruptive because tooling, documentation, etc. don't support it yet :(

Can you provide some concrete examples for Parsoid?

Change 681467 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@main] Substitute a few more places to "main"

https://gerrit.wikimedia.org/r/681467

Can you provide some concrete examples for Parsoid?

It's not yet supported in LibUp for one: T273237: LibUp needs to handle "main" branches nicely.

Change 681713 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@main] Rename some things to primary

https://gerrit.wikimedia.org/r/681713

Change 681467 merged by jenkins-bot:

[mediawiki/services/parsoid@main] Substitute a few more places to "main"

https://gerrit.wikimedia.org/r/681467

Change 681713 merged by jenkins-bot:

[mediawiki/services/parsoid@main] Rename some things to primary

https://gerrit.wikimedia.org/r/681713

Change 683902 had a related patch set uploaded (by Arlolra; author: Arlolra):

[integration/docroot@master] Point Parsoid's documentation to main

https://gerrit.wikimedia.org/r/683902

Should we switch from "master" to "main" as the default branch everywhere?

No, absolutely not, it risks breaking too many things. See RelEng's guidance about this. They're going to switch it for every repo as part of the GitLab migration.

Change 683902 abandoned by Arlolra:

[integration/docroot@master] Point Parsoid's documentation to main

Reason:

https://gerrit.wikimedia.org/r/683902

Change 683969 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Revert "Make the default branch main for git review"

https://gerrit.wikimedia.org/r/683969

Change 683970 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Revert "Substitute a few more places to "main""

https://gerrit.wikimedia.org/r/683970

No, absolutely not, it risks breaking too many things. See RelEng's guidance about this. They're going to switch it for every repo as part of the GitLab migration.

Reverted everywhere; sorry everyone

Change 683969 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Revert "Make the default branch main for git review"

https://gerrit.wikimedia.org/r/683969

Change 683970 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Revert "Substitute a few more places to "main""

https://gerrit.wikimedia.org/r/683970

Arlolra removed a project: Patch-For-Review.

Change 693970 had a related patch set uploaded (by C. Scott Ananian; author: Jforrester):

[mediawiki/core@REL1_36] composer: Lock Parsoid version to specific 0.13.0 release

https://gerrit.wikimedia.org/r/693970

Change 693972 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@REL1_36] Bump wikimedia/parsoid to "release version" 0.13.0

https://gerrit.wikimedia.org/r/693972

Change 693971 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@REL1_36] Bump wikimedia/parsoid to "release version" 0.13.0

https://gerrit.wikimedia.org/r/693971

Change 693972 abandoned by C. Scott Ananian:

[mediawiki/vendor@REL1_36] Bump wikimedia/parsoid to "release version" 0.13.0

Reason:

in favor of I2bfdd568ba24489830fb0d6799524058b3b6d65f

https://gerrit.wikimedia.org/r/693972

Change 693971 merged by Jforrester:

[mediawiki/vendor@REL1_36] Bump wikimedia/parsoid to "release version" 0.13.0

https://gerrit.wikimedia.org/r/693971

Change 693970 merged by jenkins-bot:

[mediawiki/core@REL1_36] composer: Lock Parsoid version to specific 0.13.0 release

https://gerrit.wikimedia.org/r/693970