Page MenuHomePhabricator

Set up a Gravatar proxy in Wikimedia production
Open, Needs TriagePublic

Description

Gravatar is a widely used web service for attaching faces or other avatars to email addresses. It is useful for humanizing web interfaces without having to develop an avatar upload system and without sending users through the hassle of uploading (and keeping up to date) an image for yet another system.

The simple way to use Gravatar is to take the user's email address, hash it, prefix it with Gravatar's URL, add a query parameter for size, and use that as an image URL. That presents two privacy problems: 1) the reader's browser will connect to Gravatar, thus giving IP and similar personal information to a third party; 2) the email address can be deciphered from the URL. The second issue is often irrelevant because the email address is public anyway; this task is about dealing with the first one.

The obvious solution is to set up a proxy under our control, so Gravatar only sees the proxy's IP; and probably add some sort of caching (for better performance, and to avoid putting too much burden on a third-party service). Bonus points if it can also support FLOSS or self-hosted alternatives to Gravatar (such as Libravatar).

Use cases include:

Event Timeline

Not sure what the right component would be for this request. Serviceops? Would it require some RfC-style thing first?

Some options:

  • Libravatar: a federated FLOSS Gravatar clone that can also fall back to Gravatar (ie. act as a proxy). It's written in Python (there is also a minimal PHP implementation but it seems unmaintained).
  • commons-gravatar: written in Java
  • write our own web service; it is a fairly trivial task
  • set it up purely via webserver config: make some Apache or Nginx server proxy URLs with a given prefix, and put it behind Varnish/ATS for caching.

GitLab, if we end up migrating there from Gerrit

Noting that GitLab doesn't require Gravatar support for avatars, so we likely don't need to worry about that one.

Noting that GitLab doesn't require Gravatar support for avatars, so we likely don't need to worry about that one.

There are plenty of user identities in a git repo where the user won't upload an image (commits written by long inactive users, repos imported from outside the community, code that's maintained in Github etc). Gravatar can still be useful for those (besides making things a bit more streamlined for everyone else).

Some options:

  • Libravatar: a federated FLOSS Gravatar clone that can also fall back to Gravatar (ie. act as a proxy). It's written in Python (there is also a minimal PHP implementation but it seems unmaintained).

This is a django app, but looking at https://git.linux-kernel.at/oliver/ivatar/-/blob/master/requirements.txt the dependencies are a lot. Also there's no differentiation between dev dependencies and real ones...

I think we'd prefer to avoid Java.

  • write our own web service; it is a fairly trivial task

If the Apache/nginx proxy doesn't work out, I think this would be easier than deploying libravatar.

  • set it up purely via webserver config: make some Apache or Nginx server proxy URLs with a given prefix, and put it behind Varnish/ATS for caching.

If this is doable, I think it would be the best. It doesn't require any custom code, just configuration.

Hi!

I'd heavily vote for Libravatar (for obvious reasons). Libravatar was recently moved to new hardware and the performance is great again and there is no need to actually deploy it yourself - just use the existing instance!
If there are any questions about Libravatar, I'd be happy to answer them!

Oliver

Hi!

I'd heavily vote for Libravatar (for obvious reasons). Libravatar was recently moved to new hardware and the performance is great again and there is no need to actually deploy it yourself - just use the existing instance!
If there are any questions about Libravatar, I'd be happy to answer them!

Oliver

Regardless of whichever service we end up using, we would still want to proxy it ourselves to avoid leaking viewers' IP addresses to a third-party server not under our privacy policy.

OK. I understand that of course. For libravatar I've created an example script:

https://git.linux-kernel.at/oliver/ivatar/-/blob/master/libravatarproxy.py

It doesn't feature federation and one may need to further adjust it to their needs, but as a start, it might come in handy?!

Oliver

  • set it up purely via webserver config: make some Apache or Nginx server proxy URLs with a given prefix, and put it behind Varnish/ATS for caching.

If this is doable, I think it would be the best. It doesn't require any custom code, just configuration.

@thcipriani does Release-Engineering-Team have any interest in doing this? It would be useful for Gerrit but potentially also for GitLab (https://docs.gitlab.com/ee/administration/libravatar.html). I did see @brennen's comment in T263161#6503156, but it still seems like Libravatar/Gravatar have usefulness for GitLab?

I won't speak for anyone else, but I think for me personally, Gravatar is a privacy / data exfiltration can of worms I'd be fine leaving unopened. (I acknowledge it's not that big a deal in the scheme of things, but why borrow trouble.)