I spent some time experimenting and researching on this.
Why?
SBOMs are great tools to detect supply chain attacks and mitigating them. They are so critical that there is an executive order by the US government recommending them (similarly recommended by Germany's BSI and other governments such as UK, Australia, Canada and many more I haven't looked up). The biggest motivation for pushing SBOMs seems to be the log4j and SolarWinds incidents.
Most important usecase of them is that we can feed the SBOMs to third party tools (like Intel's cve-bin-tool) which would check them against database of CVEs and inform us on libraries with potentially dangerous vulnerabilities that could compromise mediawiki's security.
The other nice feature of SBOMs is that they can be composed from dependencies. For example, each extension can have SBOMs produced automatically in release branches (an extra commit) and a tool would easily read and aggregate them together for extensions deployed in production.
Where it can be used:
- Standardizing "foreign-resources" files we create in extensions when we vendor a dependency (list of them) basically turning them into SPDX sbom files instead of reinventing the wheel.
- Providing a SBOM with tarball releases and release branches (both weekly and actual releases).
- Building a SBOM builder for Mediawiki in production (by parsing Special:Version's API result and going through extensions and appending foreign resources and libs in the vendor repo)
- For the sake of our security, we don't need to publish this. Can be behind NDA firewall somewhere.
- Automatically addressing and dealing with T190891: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites (at least partially)
Which standard:
There are two major SBOM standards, CycloneDX and SPDX. SPDX is built by Linux foundation and seems to be more aligned with what we have.
Complexities:
It seems the tooling for php is a bit subpar and we probably need to make some changes to popular upstream tools for this (or wait for them to get better).
Here is an example. I wanted to build a SBOM from composer.lock file we have in vendor/ repo:
- There is a tool that builds SPDX SBOM from composer.lock file https://github.com/opensbom-generator/spdx-sbom-generator
- It clearly says it's under construction and the docker file needed many changes to start working for composer and after going back and forth I gave up.
- There is another tool that can build a SBOM from composer file, it's in itself a composer library but at least an official lib by cycloneDX https://github.com/CycloneDX/cyclonedx-php-composer and it works out of the box
- The biggest problem is that it produces CDX SBOM not SPDX. I considered using a tool that converts CDX to SPDX too (e.g. https://github.com/spdx/cdx2spdx/tree/main) but cve-bin-tool work with both so meh.
- Intel's cve-bin-tool I mentioned above works very well in some cases
- But since it only uses "name" part of dependency to look the repo up in CVE databases (and not the PURL), a lot of php libs we have clash with random libs in other language producing a basically useless report. I put an example in one of their open issues
- The good news is that they seem to be working to fix this: https://github.com/intel/cve-bin-tool/issues/3771
- I admit there might be other tools but I haven't looked in depth.
- But since it only uses "name" part of dependency to look the repo up in CVE databases (and not the PURL), a lot of php libs we have clash with random libs in other language producing a basically useless report. I put an example in one of their open issues
- There seems to be a tendency in some libs to issue CVE for every bug. Some research shows that only 2% of CVEs are exploitable (source: Someone said it in a presentation in FOSDEM 2024 so it must be true). This can become overwhelming.
Ideas
We can have two SBOM for each release, one for prod and for dev (which would include require-dev and also npm dependencies, etc.). Not sure how viable it is.
Extra links