Rationale
As part of the ongoing work on T296847, there is a need to understand how many Gadgets and User scripts would be impacted by the policy. This data will inform further discussions, especially during the upcoming policy consultation.
Need for inputs
For now, some data were collected using various methods, including Logstash queries, global-search.toolforge.org, and mwgrep. To improve the data quality and address errors, the initial data gathered is shown below alongside the methodology followed. Overall, the methodology is still largely manual and could be a bit more automated. Further inputs, corrections, and suggestions are warmly welcome and appreciated.
Initial findings
Methodology
The raw list of reported CSP violations was obtained from a Logstash querry. It features reports from February to April 2023. Finding the number Gadgets and Users scripts involved in those CSP violations was achievable by (a) trimming the URLs so as to obtain the list of domains involved in CSP violations, (b) finding the occurences of those domains across all Wikimedia projects's Gadgets and User namespaces using https://global-search.toolforge.org and or mwgrep, discarding noise such as "eval" and "data" results.
Top domains violating CSP restrictions
When grouped by domain origins, URLs that violate CSP rules the most seem to originate from around 50 domains.
Observations on Gadgets loading third-party resources
Generally speaking, translation tools and WMCS-hosted applications seem to be among the top domains involved in CSP violations. Around 90 gadgets appear to load resources from Wikimedia Cloud Services, while around 80 use resources originating from non-WMCS resources, including Google Translate and Yandex APIs.
Observations on User scripts loading third-party resources (in progress)
Most of User scripts related to CSP violations load non-WMCS resources, including Facebook Connect and Google Analytics. It is also good to note that Google fonts are among the most loaded external resources.