Page MenuHomePhabricator

Adopt Software Bill of Materials (SBOM) for MediaWiki
Open, Needs TriagePublic

Description

I spent some time experimenting and researching on this.

Why?
SBOMs are great tools to detect supply chain attacks and mitigating them. They are so critical that there is an executive order by the US government recommending them (similarly recommended by Germany's BSI and other governments such as UK, Australia, Canada and many more I haven't looked up). The biggest motivation for pushing SBOMs seems to be the log4j and SolarWinds incidents.

Most important usecase of them is that we can feed the SBOMs to third party tools (like Intel's cve-bin-tool) which would check them against database of CVEs and inform us on libraries with potentially dangerous vulnerabilities that could compromise mediawiki's security.

The other nice feature of SBOMs is that they can be composed from dependencies. For example, each extension can have SBOMs produced automatically in release branches (an extra commit) and a tool would easily read and aggregate them together for extensions deployed in production.

Where it can be used:

  • Standardizing "foreign-resources" files we create in extensions when we vendor a dependency (list of them) basically turning them into SPDX sbom files instead of reinventing the wheel.
  • Providing a SBOM with tarball releases and release branches (both weekly and actual releases).
  • Building a SBOM builder for Mediawiki in production (by parsing Special:Version's API result and going through extensions and appending foreign resources and libs in the vendor repo)
    • For the sake of our security, we don't need to publish this. Can be behind NDA firewall somewhere.
  • Automatically addressing and dealing with T190891: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites (at least partially)

Which standard:
There are two major SBOM standards, CycloneDX and SPDX. SPDX is built by Linux foundation and seems to be more aligned with what we have.

Complexities:
It seems the tooling for php is a bit subpar and we probably need to make some changes to popular upstream tools for this (or wait for them to get better).

Here is an example. I wanted to build a SBOM from composer.lock file we have in vendor/ repo:

  • There is a tool that builds SPDX SBOM from composer.lock file https://github.com/opensbom-generator/spdx-sbom-generator
    • It clearly says it's under construction and the docker file needed many changes to start working for composer and after going back and forth I gave up.
  • There is another tool that can build a SBOM from composer file, it's in itself a composer library but at least an official lib by cycloneDX https://github.com/CycloneDX/cyclonedx-php-composer and it works out of the box
  • Intel's cve-bin-tool I mentioned above works very well in some cases
    • But since it only uses "name" part of dependency to look the repo up in CVE databases (and not the PURL), a lot of php libs we have clash with random libs in other language producing a basically useless report. I put an example in one of their open issues
    • I admit there might be other tools but I haven't looked in depth.
  • There seems to be a tendency in some libs to issue CVE for every bug. Some research shows that only 2% of CVEs are exploitable (source: Someone said it in a presentation in FOSDEM 2024 so it must be true). This can become overwhelming.

Ideas
We can have two SBOM for each release, one for prod and for dev (which would include require-dev and also npm dependencies, etc.). Not sure how viable it is.

Extra links

Event Timeline

Ladsgroup updated the task description. (Show Details)
Ladsgroup updated the task description. (Show Details)

SBOMs are great tools to detect supply chain attacks and mitigating them.

To nitpick a little bit, this is more about finding vulnerable dependencies than supply chain attacks. I suppose its true that it helps with mitigation once a supply chain attack becomes known.

There seems to be a tendency in some libs to issue CVE for every bug. Some research shows that only 2% of CVEs are exploitable (source: Someone said it in a presentation in FOSDEM 2024 so it must be true). This can become overwhelming.

Its not just that. Many vulnerabilities are context dependent. What is a vulnerability in one context is not neccesarily one in another. Most CVEs have some sort of risk rating, but they are often separate from reality... software authors have motivation to underplay things and vuln researchers have motivation to overplay them. Anyways, getting a list is just the first step, it is a huge amount of work to manage that list and ensure things are updated as appropriate, which shouldn't be underestimated.

Hey @Ladsgroup, thanks for filing this. As I noted on Slack, I kind of agree with @Bawolff's take above. SBOMs can be useful tools to assist in finding vulnerable dependencies (I've seen that term and supply chain attacks used interchangeably despite them being slightly different concepts). Just finding some tooling to create SBOMs from various lockfiles and potentially bundling them with MediaWiki, extensions, etc. is fairly trivial and doesn't create much value on its own IMO. But as you imply in this task, using them to help find vulnerable dependencies and related issues would be valuable. My issue is that we already do this with LibUp, our Gitlab AppSec Pipeline and our manual security review process. We don't necessarily generate SBOMs all of the time, but that's only because most tools that scan for CVEs within dependencies and similar issues readily support a number of lockfile formats out of the box (e.g. osv-scanner). So I guess a good question might be "what is the end goal of generating SBOMs?" Is it to improve some of the above processes that already accomplish similar goals? Or is it to create new processes or tooling to be run via CI, by developers themselves or via some other form of automation?

Thanks for the comments. I thought about this a bit and I think the case that can bring the most benefits to us is to migrate foreign-resources.yaml files to a SBOM (I don't care which standard now) since we vendor those dependencies and they aren't discoverable by automated scanning tools. For example, see https://gerrit.wikimedia.org/g/mediawiki/extensions/3D/+/da7659bae31845008ef5d1fa6c2c47be3c77ed73/modules/lib/foreign-resources.yaml where it's stored in and served in production https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/3D/+/da7659bae31845008ef5d1fa6c2c47be3c77ed73/modules/lib/three/

Does that sound good to you as the first step?

Does that sound good to you as the first step?

Sounds fine to me.

bd808 renamed this task from Adopt SBOMs for MediaWiki to Adopt Software Bill of Materials (SBOM) for MediaWiki.Apr 15 2024, 4:50 PM

Regarding how we can triage that many CVEs being reported by SBOMs, this might be useful: https://ieeexplore.ieee.org/document/9527965 but probably way too far in the future to be very useful. Worth putting it here though.