Page MenuHomePhabricator

Investigate using a Squid based man in the middle proxy to cache package manager SSL connections
Closed, DeclinedPublic

Description

I gave a shot at using Squid as a man in the middle proxy to cache https requests made by package managers. Following the CI weekly checkin on 2015-10-20 that needs to be described.

Overview

The idea is to have package managers to use a central proxy that handles both HTTP and HTTPS.

When using a proxy, the client does a CONNECT to have the proxy establish a direct connection to the server and establish a bridge between the remote server and the client. Hence the traffic is encrypted and can not be cached.

With Squid 3.3-3.4, we can use a feature known as Ssl Bump Server First http://wiki.squid-cache.org/Features/BumpSslServerFirst . The connection is terminated by Squid, which does query the remote server, generates a certificate on the fly signed with a local CA and serves that back to the client.

Since Squid acts as a man in the middle, it can cache materials properly.

Teaser with curl

I gave it a try on an instance named pmcache.integration.eqiad.wmflabs. Example

curl --verbose --cacert /etc/ssl/localcerts/integration.crt  \
    --proxy https://pmcache.integration.eqiad.wmflabs:8081/ \
    https://www.wikipedia.org/

The client connect as usual with a CONNECT:

* Connected to pmcache.integration.eqiad.wmflabs (10.68.22.133) port 8081 (#0)
* Establish HTTP proxy tunnel to www.wikipedia.org:443
> CONNECT www.wikipedia.org:443 HTTP/1.1
> Host: www.wikipedia.org:443
> User-Agent: curl/7.38.0
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 200 Connection established
< 
* Proxy replied OK to CONNECT request

It switches to SSL and curl shows the server certificate which has been signed by our customer certificate:

* SSL connection using TLSv1.2 / AES256-GCM-SHA384
* Server certificate:
* 	 subject: C=US; ST=California; L=San Francisco; O=Wikimedia Foundation, Inc.; CN=*.wikipedia.org
* 	 start date: 2015-06-23 18:37:07 GMT
* 	 expire date: 2017-02-19 12:00:00 GMT
* 	 subjectAltName: www.wikipedia.org matched

vvvvvvvvvvvvv MAN IN THE MIDDLEvvvvvvvvvvvvvvvvvvvvvvvv
* 	 issuer: C=US; ST=California; L=San Francisco; O=Wikimedia Foundation Inc.;
                 OU=Release Engineering; CN=pmcache.integration.eqiad.wmflabs
^^^^^^^^^^^^ MAN IN THE MIDDLE ^^^^^^^^^^^^^^^^^^^

* 	 SSL certificate verify ok.
* SSLv2, Unknown (23):

Rest process as usual:

> GET / HTTP/1.1
> Host: www.wikipedia.org
* SSLv2, Unknown (23):
{ [data not shown]
< HTTP/1.1 200 OK
...
< X-Cache: HIT from pmcache
< X-Cache-Lookup: HIT from pmcache:8080

Dirty hands

The instance is a Debian Jessie distribution, due to copyright issue Squid can not be legally linked to OpenSSL and hence Squid is shipped without any SSL support. Squid 3.5 (not in Jessie) might supports GNUTLS.

So I have rebuild the Jessie package with support for OpenSSL:

apt-get install -y build-essential fakeroot libssl-dev openssl devscripts
apt-get source -y squid3
apt-get build-dep -y squid3

Edit debian/rules and add to DEB_CONFIGURE_EXTRA_FLAGS:

--enable-ssl \
--with-open-ssl="/etc/ssl/openssl.cnf"

Rebuild and install:

debuild -us -uc
dpkg -i squid3-common_3.4.8-6+deb8u1_all.deb squid3_3.4.8-6+deb8u1_amd64.deb squidclient_3.4.8-6+deb8u1_amd64.deb

SSL cert

Something like:

echo -e "US\nCalifornia\nSan Francisco\nWikimedia Foundation Inc.\nRelease Engineering\n`hostname --fqdn`\n\n" \
   | openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout integration.key -out integration.crt

Copied both .key and .crt to /etc/ssl/localcerts but maybe they should be copied to /etc/ssl/certs/ to be looked up automatically.

Squid Configuration

Archived at P2211:

# SSL Bumping
# -----------

# Allow bumping of the connection. Establish a secure connection with the
# server first, then establish a secure connection with the client, using a
# mimicked server certificate. Works with both CONNECT requests and intercepted
# SSL connections.
#
# http://www.squid-cache.org/Versions/v3/3.4/cfgman/ssl_bump.html
ssl_bump server-first all

debug_options ALL,1

http_port 8080
http_port 8081 ssl-bump cert=/etc/ssl/localcerts/integration.crt key=/etc/ssl/localcerts/integration.key generate-host-certificates=on
always_direct allow all

# LOGGING
# -------
access_log /var/log/squid3/access.log squid
cache_log /var/log/squid3/cache.log
logfile_rotate 5
log_mime_hdrs on
strip_query_terms off  # Log query terms

# CLIENT
# ------
request_header_max_size 8 KB
request_body_max_size 8 KB
reply_body_max_size 200 MB all

read_ahead_gap 1024 KB
quick_abort_min 0 KB
quick_abort_max 0 KB
quick_abort_pct 100


# MEMORY CACHE
# ------------
cache_mem 1 GB
maximum_object_size_in_memory 4096 KB
memory_replacement_policy heap GDSF  # Greedy-Dual Size Frequency

# DISK CACHE
# ----------
# Least Frequently Used with Dynamic Aging
cache_replacement_policy heap LFUDA
maximum_object_size 32 MB
cache_dir aufs /srv/squid3/cache 10000 16 256


# 'Hide' ourself
via off
forwarded_for off
follow_x_forwarded_for deny all

# ACCESS LISTS
# ------------
acl SSL_ports port 443
acl Safe_ports port 21 80 443
acl method_purge method PURGE
acl method_connect method CONNECT

# RULES
# -----
http_access allow manager localhost
http_access deny manager
http_access allow method_purge localhost
http_access deny method_purge

http_access deny !Safe_ports

http_access deny method_connect !SSL_ports
#http_access deny !method_connect SSL_ports

http_access deny to_localhost

http_reply_access allow all
icp_access deny all
`

You end up with an usual proxy on port 8080 and one that does SSL bumping on 8081.

Usage with package managers

More or less in common:

  • http_proxy=. to avoid hitting HTTP
  • https_proxy=https://hostname --fqdn`:8081
    • Can probably just point to http instead.
  • Get them to trust our CA via the public key /etc/ssl/localcerts/integration.crt

To verify what hits the cache, the best way is to tail /var/log/squid3/access.log.

npm

rm -fR ~/.npm; http_proxy=. https_proxy=http://`hostname --fqdn`:8081 npm --verbose --cafile /etc/ssl/localcerts/integration.crt install grunt-cli

Squid:

TCP_MISS/200 22154 GET https://registry.npmjs.org/grunt-cli
TCP_MEM_HIT/200 5152 GET https://registry.npmjs.org/grunt-cli/-/grunt-cli-0.1.13.tgz
^^^ served from Squid memory

Gotcha: https_proxy needs a http:// URL.

pip

rm -fR ~/.cache/pip; https_proxy=https://`hostname --fqdn`:8081 pip install --cert /etc/ssl/localcerts/integration.crt --target . PyYAML

Squid:

TCP_MISS/200 1872 GET https://pypi.python.org/simple/pyyaml/
TCP_MEM_HIT/200 249337 GET https://pypi.python.org/packages/source/P/PyYAML/PyYAML-3.11.tar.gz
^^^ served from Squid memory

bundler

I tested it with mediawiki/selenium.git , could not get it to works with a cert in localcerts such as:

SSL_CERT_FILE=/etc/ssl/localcerts/integration.crt http_proxy=. https_proxy=https://`hostname --fqdn`:8081 bundle install --verbose --path vendor/bundle

Network error while fetching https://rubygems.org/quick/Marshal.4.8/rubocop-0.29.1.gemspec.rz

With curl:

http_proxy=. https_proxy=https://`hostname --fqdn`:8081 curl --verbose --cacert /etc/ssl/localcerts/integration.crt https://rubygems.org/quick/Marshal.4.8/rubocop-0.29.1.gemspec.rz
< Location: https://rubygems.global.ssl.fastly.net/quick/Marshal.4.8/rubocop-0.29.1.gemspec.rz

If we follow redirect (curl -L) that shows a hit from the Squid cache.

I have tried various combination but I am hitting a wall here.

composer

Not covered.

Event Timeline

hashar claimed this task.
hashar raised the priority of this task from to Medium.
hashar updated the task description. (Show Details)
hashar set Security to None.

I have updated the task description with my attempts at using Squid as a man in the middle proxy.

Declining this for now. I went with a lame central rsync server and I think it is good enough.

Did a first pass using a cache store/restore system based on rsync. Investigated as part of T116017