Page MenuHomePhabricator

Investigate 3rd party assets loaded by techblog
Closed, ResolvedPublic

Description

In T243399#5964463, @Aklapper wrote:

Currently the test web site loads data from third-party stats.wp.com and fonts.googleapis.com. Should either investigate if we can remove these, or if not then need to check if an existing privacy policy could be adapted / reused.

In addition to the assets noticed by @Aklapper which were loading on all pages, some pages also show loads of New Relic performance tracking assets:

Event Timeline

bd808 added a parent task: Restricted Task.
bd808 added subscribers: Bmueller, srodlund.

I dug around a bit to figure out where various requests were coming from. Here's what I found so far:

  • stats.wp.com
  • pixel.wp.com
    • Tracking beacon for the stats.wp.com javascript to report data back to WP
    • Should go away if we get the jetpack js above disabled
  • fonts.googleapis.com & fonts.gstatic.com
    • Font stack loaded by the 'modern' theme we are using
    • The theme provides three filter hooks which allow tweaking the font stack, but ultimately it always constructs a fonts.googleapis.com URL with them.
    • Theme does have a "use custom fonts" setting which says it will disable the built-in font stack of the theme. We would instead need to install another (probably custom) WP font plugin.
  • fonts.googleapis.com & fonts.gstatic.com
    • Font stack loaded by the 'modern' theme we are using
    • The theme provides three filter hooks which allow tweaking the font stack, but ultimately it always constructs a fonts.googleapis.com URL with them.
    • Theme does have a "use custom fonts" setting which says it will disable the built-in font stack of the theme. We would instead need to install another (probably custom) WP font plugin.

We should be able to take care of this by disabling the built-in font stack and adding custom css with fonts loaded from https://tools.wmflabs.org/fontcdn/

This is going to take a bit more work. First we need to find out from wpvip host to disable the thing they disabled the admin ui for. Then we need to look into piwik integration to get some basic traffic stats.

  • stats.wp.com

This is going to take a bit more work. First we need to find out from wpvip host to disable the thing they disabled the admin ui for.

I have reached out to wpvip and received information on how to disable the tracker.

  • fonts.googleapis.com & fonts.gstatic.com
    • Font stack loaded by the 'modern' theme we are using
    • The theme provides three filter hooks which allow tweaking the font stack, but ultimately it always constructs a fonts.googleapis.com URL with them.
    • Theme does have a "use custom fonts" setting which says it will disable the built-in font stack of the theme. We would instead need to install another (probably custom) WP font plugin.

We should be able to take care of this by disabling the built-in font stack and adding custom css with fonts loaded from https://tools.wmflabs.org/fontcdn/

Rather than proxying traffic through the Toolforge CDN tool, I decided to make a really tiny WordPress extension to host the desired fonts directly with techblog. Testing also showed that the Modern theme still loaded the css for the remote fonts even when configured to use a custom font cdn. I added a small amount of additional code in the plugin to filter out that css if it is injected into the renderer layer.

https://github.com/bd808/wpvip-wikimedia-techblog/commit/0a6bde438cf9c558b6a4e400168a3269652bb32f

bd808 triaged this task as High priority.Mar 17 2020, 11:19 PM

In addition to the assets noticed by @Aklapper which were loading on all pages, some pages also show loads of New Relic performance tracking assets:

https://github.com/bd808/wpvip-wikimedia-techblog/commit/adfc2a4f35b319903146842bb2024655e6fa6ff7

@Aklapper I think I have hunted down and removed all the trackers and external assets. I would really appreciate it if you took a pass yourself at checking things on the test site to see if you can verify that things are looking cleaner for you too.

I'm now wondering about adding a Content-Security-Policy header for the site that will block other things from accidentally creeping in via an allow list approach. I will do a bit of research to see how easy/hard CSP headers are to add in WP.

@bd808: Thanks for all your work. Only thing left to mention is that comments below a specific blogpost try to pull the avatar image of that comment author from secure.gravatar.com (quite common). Data collected by Gravatar can be used for advertisement.
A less popular alternative could be Libravatar. Or no avatars. Or ignoring this. :P

@bd808: Thanks for all your work. Only thing left to mention is that comments below a specific blogpost try to pull the avatar image of that comment author from secure.gravatar.com (quite common). Data collected by Gravatar can be used for advertisement.
A less popular alternative could be Libravatar. Or no avatars. Or ignoring this. :P

Great find @Aklapper. I'll look into gravatar integration and what it would take to remove/switch providers.

Great find @Aklapper. I'll look into gravatar integration and what it would take to remove/switch providers.

This one was super easy! It is just a checkbox in the WordPress settings.

Screen Shot 2020-03-18 at 16.25.41.png (227×1 px, 43 KB)

I would still like to figure out a good way to add a strong CSP policy, but I will open a separate task for that. We can launch without that level of defense in depth.