Page MenuHomePhabricator

Draft: RFC: Evaluate alternative Node package managers for improved package security
Open, Stalled, Needs TriagePublic

Description

NOTE: This is a DRAFT RfC in the preparation phase. Please discuss the contents of this RFC on the talk page and respond to the RFC here, preferably after it's finalized. See "Motivation", "Status quo", "A few examples of NPM incidents" sections in the detailed RFC on wiki.

Descoped from T199004: RFC: Add a frontend build step to skins/extensions to our deploy process
Previous step in the sequence:
T257072: Determine Node package auditing workflows
T257061: Evaluate the workflows and workload of managing a Node package repository

This RFC is about evaluating alternative package managers in regards of:

  • security benefits
  • impact on development and deployment workflows
  • migration workload

The purpose is to test these PMs without changing or disrupting established workflows. One or both PMs can be introduced for testing by developers as an optional alternative in the codebase, without impacting production.
This testing phase might take months, involving compatibility updates to wikimedia projects and upstream packages.
If such test is successful, it can be considered to migrate production workflows from NPM to a chosen alternative.

The package managers in scope (details below, feel free to expand):

Scope

This RfC is only concerned with the tool distributing already audited packages. Code auditing practices, tools and the matter of tracking vetted package versions is the scope of T257072, a central package store is discussed in T257061.
In this ticket it is assumed that libraryupgrader2 (or similar) would be used to track all packages in our dependency trees, down to the last leaf.

Package managers

Without aiming for completeness, there are 2 recent solutions that target specifically the question of security:

Yarn 2 (berry)
  1. npm i -g yarn@berry
  2. Introductory article
  3. Yarn 2 was released in 2020. It is a recent, fundamentally different version from (Classic) Yarn 1 (offline mirror), which made only minor improvements to npm.
  4. A central offline cache can be synced to individual nodes through a git repository, or presumably other means, thus guaranteeing complete control over outside source-code entering the Wikimedia ecosystem.
  5. With Yarn 2, package files are loaded directly from the offline cache (doc), thereby eliminating the 'node_modules' folders, the duplicated packages and speeding up the install step. This breaks many packages assuming the presence of 'node_modules'.
  6. Packages using resolve 1.9.0 (published 2 years ago) are compatible (popular packages).
  7. Some upstream packages and wikimedia projects need to be upgraded to use 'resolve'. In the meantime an adapter called 'pnpify' can be used to execute these. If 'pnpify' fails then the package can be 'unplugged' traditional 'node_modules' can be created with the nodeLinker: node-modules setting (migration).
  8. In strict mode only direct dependencies are reachable by packages. This is the correct behavior, but NPM's flattening of node_modules allowed developers to look for indirect dependencies too. Those packages need to be upgraded or the pnpMode: loose setting set.
  9. Migration: CLI, [Q&A] (https://yarnpkg.com/advanced/qa#how-easy-should-you-expect-the-migration-from-classic-to-modern-to-be)
  10. Yarn supports workspaces, but uses a confusing vocabulary where Project > Worktree > Workspace. Our git repos (skins, extensions) would be called "workspace". The "core" repo would be the "project" or "worktree".
  11. Script names containing ':' are global, can be executed in any "workspace" of the "project". Scripts like 'svgmin' need to be defined only in the root package.json.
  12. '.yarn/cache' is stored only in the root.
  13. The classic version was created by Facebook, Exponent, Google, and Tilde (article), GitHub lists 511 contributors. The active developers now are not Facebook employees (ref: Q&A). Being released just half a year ago, it's popularity is beneficial when it comes to its drawback: incompatibility.
Pnpm
  1. npm i -g pnpm
  2. Pnpm package store.
  3. Pnpm was released in 2017, after Classic Yarn.
  4. Pnpm deduplicates packages using symlinks, recreates node_modules, but only including direct (top-level) dependencies without flattening the dependency tree.
  5. Packages can load only direct dependencies. Other than that there aren't many complications and setup is easier: if a tool is missing a package, add it to 'devDependencies'.
  6. Supports workspaces with a simple vocabulary: Workspace > Project. Node packages are linked to the instance in the workspace root if the version constraints allow it. This is the case with libraryupgrader2 keeping dependency versions consistent along projects.
  7. Drawback of its workspace implementation: 'pnpm install' anywhere in the workspace attempts to create/fix the 'node_modules' folder of all projects in the workspace, even if the developer only needs to do that for one project. If installing a package fails in any of the repos, the install process fails for all repos, creating an unusable state. This might be a reason to not use workspaces, or find a way around this behavior.
  8. Pnpm was created by one person, it's sponsored by 2 companies, used by quite a few know companies and GitHub lists 61 contributors.
  9. An article briefly mentioning why Pnpm workspaces were simpler to set up than Yarn 2's.
npm
  • For completeness it should be mentioned that full control over package versions is achievable with plain old NPM by generating 'package-lock.json' from libraryupgrader2, or using a 'package.json' generated with the exact version of all dependencies, direct and indirect. If developers use npm ci instead of npm i, their installed version will match too.
Common

The configuration and lock files of all 3 PMs can be checked into the same repository without conflict. The PM to use on any box can be chosen independently, allowing the testing of an alternative PM while NPM is used in production. Depending on usability and personal preference, it's also an option for developers to chose their favored PM to use. The workspace management features and simple setup might be reasons to choose on PM over the other.

  1. 'package.json' is shared, the package versions installed are expected to be the same if run at the same time as NPM (same state of NPM repo).
  2. 'package-lock.json' is NOT shared, there are individual 'yarn.lock' and 'pnpm-lock.yaml' files, these need to be generated separately.
  3. Plugins often use 'peerDependencies' to refer to their main package (eg. eslint-plugin-*, stylelint-plugin-*). Both PMs are stricter than NPM and require that the main package is properly declared as dependencies besides the plugins. Some wikimedia 'package.json' files need to be updated to satisfy this constraint, most notably by adding 'eslint', 'stylelint' and their plugins. This strictness might be unwelcome at first, but quickly adapted.
  4. Grunt currently can't load plugins with either PM. Issue submitted upstream. Grunt has lost momentum in last years, so I'll be looking into submitting a PR.
Questions
  1. Security:
    1. How much control it gives over package versions and code integrity?
    2. How to integrate with libraryupgrader2 ?
    3. How much delay is between a version is published on NPM #> delivered to 1) developers, 2) CI, 3) pre#deploy build?
    4. How much time a version is exposed to CI before being delivered to 1) developers, 2) pre#deploy build?
    5. How much time a version is exposed to developers before being used for pre#deploy build?
  2. Progressive transition:
    1. Is the PM usable in parallel with NPM to allow preparation without disrupting the established processes?
    2. What features require committing to changing those processes, what's the impact?
    3. What's the impact of changing PM on CI nodes, developer machines?
  3. Changes to package loading:
    1. How to "unplug" (Yarn terminology) packages incompatible with alternative package store (node_modules)?
    2. What packages need to be unplugged or upgraded to load dependencies (those using custom loaders, not require.resolve())?
  4. What steps are necessary to make wikimedia projects compatible with the PM, in detail:
    1. The install process works without warnings: $PM install
    2. Installed packages work (load) as expected.
    3. Scripts work as with NPM: $PM run <scriptname>
  5. What's the resource usage:
    1. Storage space required for package store (central), local cache (each CI node and developer box), installed packages (in workspace and each project).
    2. Network usage vs cache usage.
    3. Install and update time.
  6. Usability:
    1. What are the changes to developers?
    2. What are the common problems encountered by users?
    3. Is the developer experience improved?
  7. Workspaces:
    1. Is it suitable to the multi#repository setup?
    2. What benefits it gives to managing the multiple projects that make up a MediaWiki developer instance?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 3 2020, 5:25 PM
Aklapper changed the task status from Open to Stalled.Jul 3 2020, 6:34 PM
Aklapper added a project: User-Demian.

(stalled because "draft")

Demian updated the task description. (Show Details)Jul 3 2020, 10:31 PM
Demian moved this task from Incoming to In progress on the User-Demian board.Jul 3 2020, 11:00 PM

Hi! Thanks for this draft and for untangling the parent RFC, I think it's very helpful. When you have this to your liking, you're welcome to tag it as TechCom-RFC and put it into P1. You can keep drafting there, that's what that first phase is for.

Demian updated the task description. (Show Details)Jul 6 2020, 7:59 PM
Demian updated the task description. (Show Details)
Demian added a project: TechCom-RFC.

Thank you, @Milimetric! Time to get it rolling 😃 Suggestions about how to prepare it are very welcome!

Demian updated the task description. (Show Details)Jul 7 2020, 1:47 PM
Demian updated the task description. (Show Details)