Page MenuHomePhabricator

Get stats on Gadgets and Users scripts loading third-party resources
Open, In Progress, MediumPublic

Description

Rationale

As part of the ongoing work on T296847, there is a need to understand how many Gadgets and User scripts would be impacted by the policy. This data will inform further discussions, especially during the upcoming policy consultation.

Need for inputs

For now, some data were collected using various methods, including Logstash queries, global-search.toolforge.org, and mwgrep. To improve the data quality and address errors, the initial data gathered is shown below alongside the methodology followed. Overall, the methodology is still largely manual and could be a bit more automated. Further inputs, corrections, and suggestions are warmly welcome and appreciated.

Initial findings

Methodology

The raw list of reported CSP violations was obtained from a Logstash querry. It features reports from February to April 2023. Finding the number Gadgets and Users scripts involved in those CSP violations was achievable by (a) trimming the URLs so as to obtain the list of domains involved in CSP violations, (b) finding the occurences of those domains across all Wikimedia projects's Gadgets and User namespaces using https://global-search.toolforge.org and or mwgrep, discarding noise such as "eval" and "data" results.

Top domains violating CSP restrictions
When grouped by domain origins, URLs that violate CSP rules the most seem to originate from around 50 domains.

Observations on Gadgets loading third-party resources

Generally speaking, translation tools and WMCS-hosted applications seem to be among the top domains involved in CSP violations. Around 90 gadgets appear to load resources from Wikimedia Cloud Services, while around 80 use resources originating from non-WMCS resources, including Google Translate and Yandex APIs.

#wikigadgetdomain
1az.wikipediaMediaViki:SidebarTranslate.jstranslate.google.com
2be.wikipediaMediaWiki:Gadget-GoogleTrans.jstranslate.google.com
3bjn.wikipediaMediaWiki:Gadget-GoogleTrans.jstranslate.google.com
4ckb.wikipediaمیدیاویکی:Gadget-GoogleTrans.jstranslate.google.com
5ckb.wikipediaمیدیاویکی:Gadget-LinkTranslator.jstranslate.google.com
6fa.wikiquoteمدیاویکی:Gadget-googletranslator.jstranslate.google.com
7fa.wikisourceمدیاویکی:GoogleTranslator.jstranslate.google.com
8fa.wiktionaryمدیاویکی:Gadget-googletranslator.jstranslate.google.com
9gom.wikipediaमिडियाविकी:Gadget-SidebarTranslate.jstranslate.google.com
10hy.wikipediaMediaWiki:Gadget-ArticleTranslator.jstranslate.google.com
11mk.wikisourceМедијаВики:Gadget-GoogleTrans.jstranslate.google.com
12ml.wikipediaമീഡിയവിക്കി:Gadget-GoogleTrans.jstranslate.google.com
13no.wikipediaMediaWiki:Interwiki-links.jstranslate.google.com
14pnb.wikipediaمیڈیا وکی:Gadget-SidebarTranslate.jstranslate.google.com
15ps.wikipediaميډياويکي:Gadget-SidebarTranslate.jstranslate.google.com
16shn.wikipediaမီႇတီႇယႃႇဝီႇၶီႇ:Gadget-SidebarTranslate.jstranslate.google.com
17so.wikipediaMediaWiki:Gadget-GoogleTrans.jstranslate.google.com
18ur.wikipediaمیڈیاویکی:Editnotice-4-ویکی منصوبہ تخلیق مضامین شہر-درخواست تخلیقtranslate.google.com
19ur.wikipediaمیڈیاویکی:Gadget-SidebarTranslate.jstranslate.google.com
20zh.wikipediaMediaWiki:Gadget-fixlinkstyle.jstranslate.google.com
21zh.wikivoyageMediaWiki:Gadget-fixlinkstyle.jstranslate.google.com
22vi.wikipediaMediaWiki:Gadget-SidebarTranslate.jstranslate.google.com
23ar.wikipediaميدياويكي:Gadget-LinkTranslator.jstranslate.google.com
24en.wikipediaMediaWiki:Gadget-SidebarTranslate.jstranslate.google.com
25ban.wikipediaMédiaWiki:Gadget-citations.jstools.wmflabs.org
26ca.wikipediaMediaWiki:Gadget-scribe.jstools.wmflabs.org
27es.wikivoyageMediaWiki:Kartographer.jstools.wmflabs.org
28fi.wikivoyageJärjestelmäviesti:Kartographer.jstools.wmflabs.org
29fr.wikivoyageMediaWiki:Kartographer.jstools.wmflabs.org
30gom.wikipediaमिडियाविकी:Gadget-citations.jstools.wmflabs.org
31he.wikivoyageמדיה ויקי:Kartographer.jstools.wmflabs.org
32ja.wikivoyageMediaWiki:Kartographer.jstools.wmflabs.org
33ms.wikipediaMediaWiki:Gadget-citations.jstools.wmflabs.org
34ru.wikivoyageMediaWiki:Kartographer.jstools.wmflabs.org
35test2.wikipediaMediaWiki:Kartographer.jstools.wmflabs.org
36test.wikipediaMediaWiki:Gadget-citations.jstools.wmflabs.org
37test.wikipediaMediaWiki:Gadget-scribe.jstools.wmflabs.org
38test.wikipediaMediaWiki:Gadget-scribe-v2.jstools.wmflabs.org
39ur.wikipediaمیڈیاویکی:Gadget-citations.jstools.wmflabs.org
40vec.wikipediaMediaWiki:Gadget-scribe.jstools.wmflabs.org
41yo.wikipediaMediaWiki:Gadget-citations.jstools.wmflabs.org
42zh.wikivoyageMediaWiki:Kartographer.jstools.wmflabs.org
43sv.wikipediaMediaWiki:Kartographer.jstools.wmflabs.org
44ar.wikipediaميدياويكي:Gadget-scribe.jstools.wmflabs.org
45de.wikivoyageMediaWiki:ListingEditor-es.jswikivoyage.toolforge.org
46de.wikivoyageMediaWiki:Gadget-ListingEditor.jswikivoyage.toolforge.org
47de.wikivoyageMediaWiki:Kartographer.jswikivoyage.toolforge.org
48de.wikivoyageMediaWiki:Gadget-MapTools.jswikivoyage.toolforge.org
49en.wikivoyageMediaWiki:Gadget-ListingEditor2.jswikivoyage.toolforge.org
50en.wikivoyageMediaWiki:Gadget-ListingEditor.jswikivoyage.toolforge.org
51en.wikivoyageMediaWiki:Kartographer.jswikivoyage.toolforge.org
52es.wikivoyageMediaWiki:ListingEditor.jswikivoyage.toolforge.org
53it.wikivoyageMediaWiki:Gadget-MapFrame.jswikivoyage.toolforge.org
54it.wikivoyageMediaWiki:Gadget-ListingEditor.jswikivoyage.toolforge.org
55it.wikivoyageMediaWiki:Gadget-ListingEditorBeta.jswikivoyage.toolforge.org
56it.wikivoyageMediaWiki:Kartographer.jswikivoyage.toolforge.org
57ja.wikivoyageMediaWiki:Kartographer.jswikivoyage.toolforge.org
58ja.wikivoyageMediaWiki:Gadget-ListingEditor.jswikivoyage.toolforge.org
59ru.wikivoyageMediaWiki:MapFrame.jswikivoyage.toolforge.org
60ru.wikivoyageMediaWiki:Gadget-ListingEditor.jswikivoyage.toolforge.org
61shn.wikivoyageမီႇတီႇယႃႇဝီႇၶီႇ:Gadget-ListingEditor.jswikivoyage.toolforge.org
62shn.wikivoyageမီႇတီႇယႃႇဝီႇၶီႇ:Gadget-ListingEditor2.jswikivoyage.toolforge.org
63shn.wikivoyageမီႇတီႇယႃႇဝီႇၶီႇ:Kartographer.jswikivoyage.toolforge.org
64fi.wikipediaJärjestelmäviesti:Gadget-socialMedia.jsconnect.facebook.net
65af.wiktionaryMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
66bn.wikisourceমিডিয়াউইকি:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
67bn.wiktionaryমিডিয়াউইকি:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
68ca.wikisourceMediaWiki:Gadget-TemplateScript.jstools-static.wmflabs.org
69el.wikisourceMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
70en.wikisourceMediaWiki:TemplateScript/proofreading.jstools-static.wmflabs.org
71en.wikisourceMediaWiki:TemplateScript/typography.jstools-static.wmflabs.org
72en.wikisourceMediaWiki:Gadget-RegexMenuFramework-Cleanup.jstools-static.wmflabs.org
73en.wikisourceMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
74en.wiktionaryMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
75es.wikibooksMediaWiki:Gadget-AjaxSysop.jstools-static.wmflabs.org
76gag.wikipediaMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
77hi.wikipediaमीडियाविकि:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
78hi.wikibooksमीडियाविकि:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
79hi.wikisourceमीडियाविकि:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
80id.wiktionaryMediaWiki:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
81kn.wikisourceಮೀಡಿಯವಿಕಿ:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
82kn.wikisourceಮೀಡಿಯವಿಕಿ:Gadget-RegexMenuFramework-Cleanup.jstools-static.wmflabs.org
83wikitech.wikimediaMediaWiki:Gadget-mobileVector.jstools-static.wmflabs.org
84mediawikiMediaWiki:Gadget-mobileVector.jstools-static.wmflabs.org
85mr.wikisourceमिडियाविकी:TemplateScript/typography.jstools-static.wmflabs.org
86mr.wikisourceमिडियाविकी:TemplateScript/proofreading.jstools-static.wmflabs.org
87or.wiktionaryମିଡ଼ିଆଉଇକି:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
88pa.wikipediaਮੀਡੀਆਵਿਕੀ:Gadget-cleanup.jstools-static.wmflabs.org
89pa.wikisourceਮੀਡੀਆਵਿਕੀ:Gadget-Blockcenter.jstools-static.wmflabs.org
90pa.wikisourceਮੀਡੀਆਵਿਕੀ:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
91pa.wikisourceਮੀਡੀਆਵਿਕੀ:Gadget-Cleanup.jstools-static.wmflabs.org
92pa.wikisourceਮੀਡੀਆਵਿਕੀ:Gadget-pathoschild.templatescript.jstools-static.wmflabs.org
93pt.wikibooksMediaWiki:Gadget-Informação adicional.jstools-static.wmflabs.org
94ru.wikisourceMediaWiki:Gadget-convenientDiscussions.jstools-static.wmflabs.org
95shn.wiktionaryမီႇတီႇယႃႇဝီႇၶီႇ:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
96sr.wikipediaМедијавики:Gadget-Poruke.jstools-static.wmflabs.org
97ta.wikiquoteமீடியாவிக்கி:Gadget-Ajax sysop.jstools-static.wmflabs.org
98ta.wiktionaryமீடியாவிக்கி:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
99test2.wikipediaMediaWiki:Gadget-MobileTW.jstools-static.wmflabs.org
100te.wikisourceమీడియావికీ:Gadget-RegexMenuFramework.jstools-static.wmflabs.org
101te.wikisourceమీడియావికీ:Gadget-RegexMenuFramework-Cleanup.jstools-static.wmflabs.org
102vi.wikisourceMediaWiki:TemplateScript/proofreading.jstools-static.wmflabs.org
103vi.wikisourceMediaWiki:TemplateScript/typography.jstools-static.wmflabs.org
104zh.wikipediaMediaWiki:Gadget-webfont.jstools-static.wmflabs.org
105es.wikipediaMediaWiki:Gadget-TemplateScript.jstools-static.wmflabs.org
106commons.wikimediaMediaWiki:Gadget-TabularImportExport.jstools-static.wmflabs.org
107test.wikipediaMediaWiki:Gadget-wikilabels.jslabels.wmflabs.org
108test.wikipediaMediaWiki:Gadget-wikilabels-loader.jslabels.wmflabs.org
109meta.wikimediaMediaWiki:Gadget-WikiLabels-loader.jslabels.wmflabs.org
110be-tarask.wikipediaMediaWiki:Common.js/coordinates.jsyandex.ru
111ce.wikipediaMediaWiki:Googlesearchyandex.ru
112ce.wikipediaMediaWiki:ExtSearchPanel.jsyandex.ru
113ce.wikipediaMediaWiki:Gadget-common-special-search.jsyandex.ru
114ka.wikipediaმედიავიკი:Search.jsyandex.ru
115ky.wikipediaМедиаВики:Search.jsyandex.ru
116lez.wikipediaMediaWiki:Search.jsyandex.ru
117myv.wikipediaMediaWiki:Common.jsyandex.ru
118ru.wikipediaMediaWiki:Gadget-yandex-speechrecognition.jsyandex.ru
119ru.wikisourceMediaWiki:Common.jsyandex.ru
120test.wikipediaMediaWiki:Gadget-ruwiki-common-special-search.jsyandex.ru
121tg.wikipediaМедиавики:Gadget-common-special-search.jsyandex.ru
122tg.wikipediaМедиавики:Gadget-yandex-speechrecognition.jsyandex.ru
123tg.wikipediaМедиавики:Search.jsyandex.ru
124tg.wikipediaМедиавики:Gadget-yandex-tts.jsyandex.ru
125tt.wikibooksМедиаВики:Search.jsyandex.ru
126ru.wikipediaMediaWiki:Powersearchtextyandex.ru
127uk.wikipediaMediaWiki:Gadget-SpeedyDeletion.jsyandex.ru
128meta.wikimediaMediaWiki:Gadget-common-special-search.jsyandex.ru
129ru.wikipediaMediaWiki:Googlesearchyandex.ru
130ru.wikipediaMediaWiki:ExtSearchPanel.jsyandex.ru
131ru.wikipediaMediaWiki:Gadget-yandex-tts.jsyandex.ru
132ru.wikipediaMediaWiki:Gadget-common-special-search.jsyandex.ru
133ar.wikipediaميدياويكي:Gadget-Timeless-Dark.cssfonts.googleapis.com
134bn.wikibooksমিডিয়াউইকি:Common.css/Typo.cssfonts.googleapis.com
135pnb.wikipediaمیڈیا وکی:Gadget-NotoNastaleeqMobile.cssfonts.googleapis.com
136ur.wikipediaمیڈیاویکی:Gadget-NotoNastaleeqMobile.cssfonts.googleapis.com
137zh.wikipediaMediaWiki:Gadget-webfontloader.jsfonts.googleapis.com
138meta.wikimediaMediaWiki:Centralnotice-template-trilogy dsk p1 lg monthly pitch1 extFontfonts.googleapis.com
139bn.wikibooksমিডিয়াউইকি:Common.css/Typo.csscdn.rawgit.com
140fa.wikipediaمدیاویکی:LoadTopRevisionsByRevertScore.jscdn.rawgit.com
141ar.wikipediaميدياويكي:Gadget-QuickEdit.jszh.moegirl.org

Observations on User scripts loading third-party resources (in progress)

Most of User scripts related to CSP violations load non-WMCS resources, including Facebook Connect and Google Analytics. It is also good to note that Google fonts are among the most loaded external resources.

Event Timeline

Restricted Application added subscribers: Reception123, Stang, Aklapper. · View Herald Transcript

Can the stats table be split by userscripts and gadgets? The later certainly have far more exposure, esp when counting userscripts of defunct users in the user table.

Can the stats table be split by userscripts and gadgets? The later certainly have far more exposure, esp when counting userscripts of defunct users in the user table.

I plan to put the data regarding user scripts in a separate section (see the very bottom of the description)

User scripts loading third-party resources
TBD

@Xaosflux, is that what you were asking for? Or did suggest that the User script stats be put in the same stats table, under separate columns?

Either a separate column, or a separate table is fine; I think there may be some exceptions to add as well, for example the page https://meta.wikimedia.org/wiki/MediaWiki:Gadget-common-special-search.js is on the list above, pointing to a yandex.ru link, however it isn't actually importing that, that is inside a comment - not sure how much effort would be needed or what the expected benefit of excluding comments would be

sguebo_WMF changed the task status from Open to In Progress.May 4 2023, 10:31 AM
sguebo_WMF triaged this task as Medium priority.
sguebo_WMF updated the task description. (Show Details)

Either a separate column, or a separate table is fine; I think there may be some exceptions to add as well, for example the page https://meta.wikimedia.org/wiki/MediaWiki:Gadget-common-special-search.js is on the list above, pointing to a yandex.ru link, however it isn't actually importing that, that is inside a comment - not sure how much effort would be needed or what the expected benefit of excluding comments would be

A separate table makes sense to me as well. I've adjusted the description accordingly. I agree that in some cases comments create some noise in the total count. I haven't figured out a much cleaner way to discard comments yet since I am using data global-search.toolforge.org. For now I am considering tweaking a copy of mwgrep to filter out comments and other patterns of noise but I am open to suggestions.

In T335892, @sguebo_WMF wrote:

It is also good to note that Google fonts are among the most loaded external resources.

Yes, one important learning here is that webfont support is strongly desired, sometimes surely for mere "fun" reasons ("I think it looks prettier"), but also for substantial reasons. e.g. T166138

This desire was why a "webfont" component was created (way back in the mists of time), but due to changing priorities it morphed into a i18n system (ULS) and staffing and mandate does not allow that team to actually address webfont-related issues.

The alternative was the GFont proxy hosted on WMCS (by a volunteer who has signed the NDA), but that's nixed by the Privacy Policy that due to some technical—legal brainfart treats the HTTP User-Agent header as directly equivalent to your social security number and bank account number. GFonts needs that header to send you the right font file (i.e. the standard protocol content negotiation), so it cannot work without it, but everything else is stripped from the request.

The stats gathered above show: 1) that there is a genuine unmet need for webfonts, and 2) work on the Privacy Policy and a Third-Party Resources policy should have enabling this use case a primary concern (within necessary strictures).

We need some way to, safely, enable a small number of webfonts by default on a project (including for non-logged in users). Locked down to interface admins, or even needing a site-request like other config, with explicit whitelisting of each font, run through an anonymizing proxy that puts User-Agent headers into buckets, etc. as necessary; but some way to support the use case.

It's orthogonal to this task, but the stats gathered here should inform secondary learnings like this, not least in terms of what the goals of the TPR process should be.

Toolforge does already have a google fonts proxy: https://fontcdn.toolforge.org/

We just need to convince people to use that instead.

Toolforge does already have a google fonts proxy: https://fontcdn.toolforge.org/

We just need to convince people to use that instead.

Toolforge is considered third-party (even if the maintainer has signed the NDA), and the Privacy Policy considers the User-Agent header to be PII that cannot be sent to Google even if everything else is stripped. See T166138. And, yes, I did check with WMF Legal. It’s currently Catch 22-impossible.

Either the Orivacy Policy needs to permit a risk-assessment based softening for HTTP headers used for content negotiation, or the WMF (not volunteers) need to implement a Google Fonts-alike in production (not Toolforge/WMCS). There are no other options open that I’ve been able to identify.

Toolforge does already have a google fonts proxy: https://fontcdn.toolforge.org/

We just need to convince people to use that instead.

Toolforge is considered third-party (even if the maintainer has signed the NDA), and the Privacy Policy considers the User-Agent header to be PII that cannot be sent to Google even if everything else is stripped. See T166138. And, yes, I did check with WMF Legal. It’s currently Catch 22-impossible.

While it's not great, it's still much better to hit toolforge than hitting Google.

Toolforge does already have a google fonts proxy: https://fontcdn.toolforge.org/

Toolforge is considered third-party (even if the maintainer has signed the NDA), and the Privacy Policy considers the User-Agent header to be PII that cannot be sent to Google even if everything else is stripped. See T166138. And, yes, I did check with WMF Legal. It’s currently Catch 22-impossible.

While it's not great, it's still much better to hit toolforge than hitting Google.

I don't disagree. But the Privacy Policy does, and WMF Legal (and whoever they have for technical advice) does. And AIUI the third-party resources policy currently does treat WMCS as third-party (I may be wrong), meaning it requires opt-in consent, meaning we can't enable fonts from there by default.

See T209998 for a different example. Note that the alternatives proposed on that task requires there to be some team at the WMF whose responsibility it is to add fonts. The only such team is the Language team. The Language team does not have the resources to fulfill such requests, and has been scoped down to only deal with i18n issues (i.e. the workarounds are outside their current scope so they can't spend time on it).

So… we can't have webfonts through volunteer tools (fontcdn) due to the Privacy Policy, and we can't have fonts installed in WMF production because it's outside every team's scope. In other words: Catch 22. It's tearing-my-hair-out level frustrating, but that's where we are.

I'm kinda derailing this task now (sorry), so my main point is that since we now have the stats above, and they point out fonts as among the main third-party resources people load, we should make sure we try to address that in some way (find a reasonable way to enable that use case without unreasonably raising the risk). It's currently bright-line prohibited, but the fontcdn solution has properties that reduce that risk to what I assert is an acceptable level. It's just not zero, which is what the current bright-line policy requires.

Toolforge does already have a google fonts proxy: https://fontcdn.toolforge.org/

We just need to convince people to use that instead.

Toolforge is considered third-party (even if the maintainer has signed the NDA), and the Privacy Policy considers the User-Agent header to be PII that cannot be sent to Google even if everything else is stripped. See T166138. And, yes, I did check with WMF Legal.

Is there a phab task where you discussed this with WMF Legal? I wonder if WMF Legal might reconsider the User-Agent header as PII for Chrome browser at least, given Google-Chrome-User-Agent-Deprecation.

Is there a phab task where you discussed this with WMF Legal?

It was email direct to privacy@ (tagged 36012 in their Zendesk, answered by Aeryn).

my main point is that since we now have the stats above, and they point out fonts as among the main third-party resources people load, we should make sure we try to address that in some way (find a reasonable way to enable that use case without unreasonably raising the risk). It's currently bright-line prohibited, but the fontcdn solution has properties that reduce that risk to what I assert is an acceptable level. It's just not zero, which is what the current bright-line policy requires.

For the specific case of fontcdn, I'd like to note that the issue of sharing User Agent information with Google would still remain, as it was pointed out by others in the past (T166138#7228181). That being said, I agree with the need to explore "a reasonable way to enable that use case without unreasonably raising the risk" in light of the data above. Speaking of that, the conversation about the policy draft is now open with some initial questions, including whether WMCS-hosted resources (eg: fonts) should be treated as third-parties: https://meta.wikimedia.org/wiki/Talk:Third-party_resources_policy

Hope to hear your thoughts there!

Toolforge is considered third-party (even if the maintainer has signed the NDA), and the Privacy Policy considers the User-Agent header to be PII that cannot be sent to Google even if everything else is stripped. See T166138. And, yes, I did check with WMF Legal. It’s currently Catch 22-impossible.

While it's not great, it's still much better to hit toolforge than hitting Google.

Apparently this toolforge tool still forwards user-agents to google, because the google fonts vary by user agent. I suggest we adapt that tool to simply override the user agent, as it is no longer required to vary on user-agent as all modern webbrowsers support woff and woff2.

Apparently this toolforge tool still forwards user-agents to google, because the google fonts vary by user agent. I suggest we adapt that tool to simply override the user agent, as it is no longer required to vary on user-agent as all modern webbrowsers support woff and woff2.

Should that be a dedicated task (under Tools and Privacy) about fontcdn? As too often I have no idea where its source is located though...

Definitely would be pro-overriding the user-agent for fontcdn (and cdnjs) — that would make it significantly easier to argue that they should be considered ok to allowlist for third-party resources.

[…] As too often I have no idea where its source is located though...

AIUI it's mostly custom config on the proxies (by special arrangement), so there's no real code to speak of. But @zhuyifei1999 can presumably clarify.

Definitely would be pro-overriding the user-agent for fontcdn (and cdnjs) — that would make it significantly easier to argue that they should be considered ok to allowlist for third-party resources.

It's logged above, but just for clarity: the cdnjs UA issue is in T210959.