Page MenuHomePhabricator

Application Security Review Request : Vega 5 and related dependencies for ext:Graph
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project:

Upgrades for the ext:Graph extension and related dependencies

Description of how the tool will be used at WMF:

ext:Graph is already in use within Wikimedia production, but is currently disabled, see:

https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/909407
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/909623

(and related)

Dependencies

This will focus specifically on migrating to Vega 5. So whatever dependencies are involved in that process.

Has this project been reviewed before?

https://phabricator.wikimedia.org/T71623 (very old)

https://phabricator.wikimedia.org/T182536 (never resolved)

Working test environment

See MediaWiki-Docker or https://patchdemo.wmflabs.org/

Post-deployment

Undetermined at this time, though for now, the WMF Technology and Product teams.

Details

Risk Rating
Low
Author Affiliation
WMF Technology Dept

Event Timeline

sbassett created this task.
sbassett added a project: user-sbassett.
sbassett moved this task from Backlog to In Progress on the user-sbassett board.
sbassett moved this task from Incoming to In Progress on the secscrum board.
sbassett added a project: SecTeam-Processed.

Security Review Summary - T335051 - 2023-04-25

Overall, the current vendor code under consideration looks to be in decent shape from a security perspective. The following security analyses were performed in the review below:

The Security-Team has set an overall risk rating of Vega 5.25.0 as used with ext:Graph at: low.

General Security Information

Statistic/InfoValueRisk
Repositoryhttps://github.com/vega/vega none
Relevant tag/branchv5.25.0 none
Last commit reviewed (if relevant)3290eae none
Recent contributions to code (6 months)6 low
Active developers with > 10 commits17 low
Current overall usage10,400 stars , rated popular by Snyk low
Current open security issues0 none
Methods for reporting security issuesvia github low

Vulnerable Packages
As reported via npm/yarn audit (gh advisories), snyk, osv.dev, et al. These are all found within dev packages and indirect dependencies, and do not require immediate action. And the more serious ones for tar and request are within the year-old+ @definitelytyped/dtslint@0.0.111 package, which could stand to be upgraded. It might be worthwhile filing an upstream bug for that as v0.0.159 at least mitigates the tar issues. Risk: low.

VulnerabilityPackageNotesServiceRemediationRisk
CVE-2023-28155: CWE-918: Server...requestadvisory linkauditjs[see details within advisory links] medium
CVE-2021-32804: CWE-22: Imprope...taradvisory linkauditjs[see details within advisory links] high
CVE-2023-26115: CWE-1333word-wrapadvisory linkauditjs[see details within advisory links] medium
Regular Expression Denial of Servic...ansi-regexadvisory link rollup-plugin-bundle-size@1.0.3snyk[see details within advisory links] high
Arbitrary File Creation/Overwrite due to ...tar@definitelytyped/dtslint > @definitelytyp...npm auditadvisory link high
Arbitrary File Creation/Overwrite via ins...tar@definitelytyped/dtslint > @definitelytyp...npm auditadvisory link high
Arbitrary File Creation/Overwrite via ins...tar@definitelytyped/dtslint > @definitelytyp...npm auditadvisory link high
Arbitrary File Creation/Overwrite on Wind...tar@definitelytyped/dtslint > @definitelytyp...npm auditadvisory link high
Arbitrary File Creation/Overwrite via ins...tar@definitelytyped/dtslint > @definitelytyp...npm auditadvisory link high

Outdated Packages
As reported via npm outdated. No explicit vulnerabilities reported, simply noting for completeness' sake. There are a large number of dated dependencies here though, which is perhaps a marginally greater risk, though unsurprising given the size of vega and its sub-packages. Risk: low.

PackageCurrentWantedLatestDepended By
@babel/core7.21.07.21.47.21.4vega
@babel/plugin-transform-runtime7.21.07.21.47.21.4vega
@babel/preset-env7.20.27.21.47.21.4vega
@definitelytyped/dtslint0.0.1110.0.1110.0.159vega
@rollup/plugin-node-resolve15.0.115.0.215.0.2vega
@rollup/plugin-terser0.4.00.4.10.4.1vega
@types/estree1.0.01.0.11.0.1vega-expression@5.1.0
@types/node18.14.618.16.118.16.1vega
canvas2.11.02.11.22.11.2vega-cli@5.25.0
canvas2.11.02.11.22.11.2vega
d3-array3.2.23.2.33.2.3vega-crossfilter@4.1.1
d3-array3.2.23.2.33.2.3vega-encode@4.9.2
d3-array3.2.23.2.33.2.3vega-format@1.1.1
d3-array3.2.23.2.33.2.3vega-functions@5.13.2
d3-array3.2.23.2.33.2.3vega-geo@4.4.1
d3-array3.2.23.2.33.2.3vega-regression@1.2.0
d3-array3.2.23.2.33.2.3vega-scale@7.3.0
d3-array3.2.23.2.23.2.3vega-selections@5.4.1
d3-array3.2.23.2.33.2.3vega-statistics@1.9.0
d3-array3.2.23.2.33.2.3vega-time@2.1.1
d3-array3.2.23.2.33.2.3vega-transforms@4.10.2
d3-array3.2.23.2.33.2.3vega-view@5.11.1
d3-delaunay6.0.26.0.46.0.4vega-voronoi@4.2.1
eslint8.35.08.39.08.39.0vega
jsdom21.1.021.1.121.1.1vega
lerna6.5.16.6.16.6.1vega
node-fetch2.6.92.6.93.3.1vega-loader@4.5.1
prettier2.8.42.8.82.8.8vega
rimraf4.3.15.0.05.0.0vega
rollup3.20.43.21.03.21.0vega
typescript4.9.55.0.45.0.4vega
vega-datasets2.5.42.7.02.7.0vega
vega-typings0.24.00.24.10.24.1vega@5.25.0

Scorecard score
6.2 / 10 - Risk: medium
(see raw output: P47294)

Snyk health status
98 / 100 - Risk: low
(see: https://snyk.io/advisor/npm-package/vega)

Potential HTTP and protocol Leaks

  1. None found via this semgrep rule - Risk: low

Security Features Findings

  1. Data loaded via URLs, if enabled, should continue to use allow-lists of domains (see c910613) and make use of more limiting, internal protocols such as wikiraw:, wikiapi:, et al via the Graph extension. http: should be disabled or avoided, though the exact level of enforcement is still being discussed in c913984 - Risk: low.
  2. Vega expressions, primarily used with Signals (though which can also be used with filters and formulas) remain Vega's largest attack surface, though much has been done within recent versions of Vega to better guard against potential threats. First, expressions are far more restricted in their values and types via Vega's latest schema; from Vega's Expressions documentation:

    "The expression language is a restricted subset of JavaScript. All basic arithmetic, logical and property access expressions are supported, as are boolean, number, string, object ({}) and array ([]) literals. Ternary operators (ifTest ? thenValue : elseValue) and a special if(test, thenValue, elseValue) function are supported.

    To keep the expression language simple, secure and free of unwanted side effects, the following elements are not allowed: assignment operators (=, += etc), pre/postfix updates (++), new expressions, and most control flow statements (for, while, switch, etc). In addition, function calls involving nested properties (foo.bar()) are not allowed. Instead, the expression language supports a collection of functions defined in the top-level scope."

    This, along with numerous XSS fixes, should provide more safety from previous, outdated versions of Vega regarding the expressions functionality. Further protection would be available through the use of the vega interpreter, which should provide additional security via its means of traversing an Abstract Syntax Tree as opposed to previous parsing methods. It should be noted that this does however incur a modest performance penalty. This is being actively worked upon within c910605 and once implemented should reduce this potential vulnerability to a low risk.

Static Analysis Findings

  1. lockfile-lint returned no results. Risk: low
  2. gitleaks returned no results. Risk: low
  3. whispers returned no results. Risk: low
  4. git secrets returned no results. Risk: low
  5. packj found 0 "risky" dependencies. Risk: low
  6. scan.sh returned no true positive results. Risk: low
  7. bearer SAST found 2 issues, see P47295. An upstream bug should likely be filed to harden this code. Risk: low
  8. semgrep SAST found found 1 issue, see P47296. Risk: low

Data loaded via URLs, if enabled, should continue to use allow-lists of domains (see c910613) and make use of more limiting, internal protocols such as wikiraw:, wikiapi:, et al via the Graph extension.

To clarify, do you mean that the current arbitrary URL access (well, arbitrary aside from a protocol and domain allowlist) should be disabled eventually, in favor of the internal protocols? We had debated how/if to prioritize that so I want to clarify if this is a requirement.

I have never read these security reports, could someone tell me why is canvas repeated twice and d3-array 12 times in the outdated packages table?

To clarify, do you mean that the current arbitrary URL access (well, arbitrary aside from a protocol and domain allowlist) should be disabled eventually, in favor of the internal protocols? We had debated how/if to prioritize that so I want to clarify if this is a requirement.

I think the current URL access (which is basically a port of what existed in VegaWrapper) along with the various pseudo-schemes like wikiraw: should be low risk as-is. It would potentially be nice to restrict that access even further, at least within Wikimedia production, but I don't think that would substantially reduce the current risk level.

I have never read these security reports, could someone tell me why is canvas repeated twice and d3-array 12 times in the outdated packages table?

Ah, that was due to them being depended on by several sub-packages of vega. I've added an extra column that hopefully clarifies this. Most of the vendor dependencies we review aren't structurally organized this way, so the outdated dependencies are usually obvious. But not in vega's case. Thanks for pointing that out.

I think the current URL access (which is basically a port of what existed in VegaWrapper) along with the various pseudo-schemes like wikiraw: should be low risk as-is. It would potentially be nice to restrict that access even further, at least within Wikimedia production, but I don't think that would substantially reduce the current risk level.

If I read the code correctly, VegaWrapper (mw-graph-shared) only accepted custom schemes (wikiapi:/wikirest:/wikiraw:/tabular:/map:/wikifile:/wikidatasparql:/geoshape:/geoline:/mapsnapshot:). The current implementation allows arbitrary HTTPS URLs in Wikimedia domains (and no custom schemes ATM although that will probably change), so in that sense security is being relaxed. I just want to make sure there are no security concerns about that.

If I read the code correctly, VegaWrapper (mw-graph-shared) only accepted custom schemes (wikiapi:/wikirest:/wikiraw:/tabular:/map:/wikifile:/wikidatasparql:/geoshape:/geoline:/mapsnapshot:). The current implementation allows arbitrary HTTPS URLs in Wikimedia domains (and no custom schemes ATM although that will probably change), so in that sense security is being relaxed. I just want to make sure there are no security concerns about that.

As previously discussed, I don't believe there are any potentially destructive operations available via query params to production MediaWiki URLs. If there are any that we'd find particularly concerning, then we'd want to potentially reconsider this risk rating.

Historically i think the concern was you load a normal page as a csv file, parse out the edit token from the html source, then leak the edit channel via a side channel to an attacker. [Otoh good luck executing csrf style attacks on a modern browser, so maybe that specific scenario is less relavent now, although similar scenarios may still matter]

Much more private stuff in the api. Which was the reasoning behind the magic header to make api response be logged out. However if general urls are exposed probably a lot of that stuff is available from get requests in the web interface. I think but am not sure you can get revdel'd revisions from the index.php interface with just a get.

sbassett moved this task from In Progress to Our Part Is Done on the secscrum board.

I've filed one upstream security issue around dependencies, which was, at best, a minor concern. But, the vega folks appear to be merging that soon. The other issue around html sanitization that bearer SAST found seems like, at worst, a non-issue given how the current module forces attribute creation or maybe a very minor opportunity for code-hardening, so I'm not going to file that as a security issue at this time.

Much more private stuff in the api. Which was the reasoning behind the magic header to make api response be logged out. However if general urls are exposed probably a lot of that stuff is available from get requests in the web interface. I think but am not sure you can get revdel'd revisions from the index.php interface with just a get.

Good to know. I think there's a bit more of the old vega wrapper code we should likely port over. That's also being discussed within T335325.