Page MenuHomePhabricator

Phabricator should cache tasks for a few minutes for logged-out users
Open, MediumPublic

Description

Phabricator got the HN/Twitter kiss of death this morning because of T273741 going viral. I noticed when accessing that article while logged out that our frontend caches don't cache tasks at all for logged-out users. This puts a lot unnecessary of pressure on PHP and MariaDB when a phabricator task goes viral.

Either Phabricator itself, Apache or our frontend caches should add a rule to have Phabricator tasks cached in our frontend caches for X minutes for logged-out users. People will still be able to see the latest version of the task and its comments by logging in.

According to @Krinkle this might only be solvable upstream, as Phabricator gives all viewers a session and CSRF tokens.

From 1/128 sampled requests, the traffic started at roughly midnight after https://twitter.com/chrisalbon/status/1358890731981611009 (which ironically refers to a different traffic surge). Most hits are for phab.wmfusercontent.org

phab_traffic.png (558×778 px, 50 KB)

The esams cache traffic got moved to eqiad which would have caused all hits from Europe to hit eqiad. Eventually Phabricator ran out of local sockets (30k limit):

phab_socket_usage.png (496×1 px, 37 KB)

Leading to issues such as can't reach its db m3-master.eqiad.wmnet failed with error #2002: Cannot assign requested address.

Event Timeline

@epriestley seeing your old comment on a related issue:

https://phabricator.wikimedia.org/T219978#5346100

We now have a demonstration that excessive read traffic can melt down Phabricator and we would like to have the ability to cache logged-out traffic in our caches fronting Phabricator.

While I imagine it's not a common need for Phabricator instances, our fully public one is likely to run into this issue again. We need an easy way to tell logged-out traffic apart in our caching layer. Or have Phabricator itself issue a meaningful configurable Cache-control header for logged-out users.

Vgutierrez triaged this task as Medium priority.Feb 9 2021, 10:47 AM
Vgutierrez moved this task from Triage to Caching on the Traffic board.

Phabricator doesn't also seem to be caching user profile pictures at all, which are also stored in the phabricator_file database.

We need an easy way to tell logged-out traffic apart in our caching layer.

Can your caching layer examine cookie values?

If the phsid cookie is absent, the user is logged out.

If the phsid cookie is present, but the first character of the cookie value is A, the user is logged out.

Responses to logged out users may vary on other headers. For example, Phabricator sends a mobile-layout-by-default page to clients it guesses are mobile devices by inspecting their User Agent header, and a desktop-layout-by-default page to clients it guesses are not. Today, I believe these changes are generally not substantial enough to worry about very much. Phabricator does not send a broken-in-other-browsers page to IE6 clients or anything like that. However, in the future, these changes may become more significant: for example, Phabricator might deliver a dark-themed, mobile-targeted, German-language page to one logged-out user and a light-themed, desktop-targeted, French-language page to another logged-out user. If these features are implemented, the most likely implementation would let you turn them off, and I have no particular plans to implement more features in this vein in the near term, but this may eventually become an entangling concern.

The cached response will contain CSRF tokens particular to the phsid cookie that Phabricator read from the request, or, if it did not read a phsid in the request, the phsid it expected to send in the response. This may break forms on the page. If you cache the Set-Cookie header in the response, this may effect a session fixation attack on all users who receive the cached page, although escalating this attack seems very challenging.

Because forms on cached pages may be broken, accidentally or intentionally caching the login page may prevent any user from logging in.

Phabricator doesn't also seem to be caching user profile pictures at all, which are also stored in the phabricator_file database.

Can you explain what you're seeing in more detail? For example, my page source for your profile image has this URI:

https://phab.wmfusercontent.org/file/data/t23adkvlh46dktaotbnj/PHID-FILE-t3qexa2o5bjmcouiagbc/profile

This appears to be properly cached to me:

$ curl --verbose https://phab.wmfusercontent.org/file/data/t23adkvlh46dktaotbnj/PHID-FILE-t3qexa2o5bjmcouiagbc/profile --output /dev/null
...
> GET /file/data/t23adkvlh46dktaotbnj/PHID-FILE-t3qexa2o5bjmcouiagbc/profile HTTP/2
> Host: phab.wmfusercontent.org
> User-Agent: curl/7.64.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200 
...
< cache-control: max-age=2592000, public
< expires: Thu, 11 Mar 2021 02:03:23 GMT
< x-content-type-options: nosniff
< content-length: 106831
< content-type: image/png
< age: 53714
< x-cache: cp4028 hit, cp4027 hit/441
< x-cache-status: hit-front
< server-timing: cache;desc="hit-front"
...
* Closing connection 0

In particular, it appears that the "Cache-Control" and "Expires" headers are set to allow caching, and (assuming that I'm interpreting "x-cache-status: hit-front" correctly) that the request was itself served from an edge cache, i.e. this response was cached properly in practice. What am I missing?

Indeed I think it's caching static files correctly.

Eventually Phabricator ran out of local sockets (30k limit):

I'm not familiar with the particulars of your infrastructure, but if this was on the database machine (i.e., too many connections to the database), other changes might release this bottleneck. There are four general changes available:

  1. Enable the persistent option for connections hidden deep in the advanced documentation if it isn't already enabled.
  2. Configure partitioning.
  3. Overlays: Wait for the upstream to complete https://secure.phabricator.com/T11908.
  4. Read Replicas: Wait for the upstream to complete https://secure.phabricator.com/T11056.

(1) The persistent option enables connection pooling from the web application layer and primarily has the effect of dramatically reducing the number of ports Phabricator holds at once on the database host, especially under high-volume read workloads like this one. This option should probably be on by default, but it only helps large installs (and may slightly harm small installs) and one install reported some kind of issue with it that I never triaged. The problem resolved by persistent usually presents as port exhaustion on the database host, preventing it from accepting new connections, but the vast majority of these ports are in CLOSE_WAIT (i.e., waiting before allowing port reuse after closing a previous connection).

If you don't have this option enabled (and I'm not misunderstanding what local socket exhaustion is), it's likely worth trying. It may provide a huge amount of capacity.

(2) Configuring partitioning allows you to separate logical databases onto different physical database servers. For example, the phabricator_user and phabricator_maniphest databases can be moved to different servers. This is complicated and not likely to give you more than ~2X capacity in the best case. I think it's the least attractive option, I mention it only because it's a building block for (3).

(3) Phabricator currently opens a separate connection to each logical database, even if those logical databases are on the same physical (or virtual, or whatever) server. For example, if you have not partitioned your databases and all your databases are on the same physical server, Phabricator will open separate connections to phabricator_maniphest, phabricator_user, etc. This is almost always unnecessary.

Phabricator does this because, early on, it represented groundwork for partitioning: by making sure each application isolated its connections, I aimed to avoid having a huge pile of technical debt that needed to be cleaned up to support partitioning. This phase of the design worked: partitioning has been implemented and works correctly (secure has been running partitioned for a long time now). Since partitioning works, is no longer necessary for Phabricator to continue isolating connections to the same degree, and it could reasonably share a single connection per physical host across databases in the vast majority of cases. This likely represents a ~10X reduction in connections for most pages.

This is tracked upstream in https://secure.phabricator.com/T11908, and some work has been completed, although a meaningful amount of work is still required before it can be exposed as an option.

(4) Phabricator supports master/replica configuration but currently never sends traffic to the replica unless the master is unreachable. See https://secure.phabricator.com/T11056. The biggest barrier to this is handling read-after-write for logged-in users. This isn't a concern for logged-out traffic, and sending logged-out traffic to replicas might be entirely straightfoward.


Upshot: if this is some equivalent to port/connection exhaustion, you may have 10X capacity by enabling persistent, another 10X capacity on top of that with the upstream completion of connection overlays, and then as much capacity as you want to throw hardware at with slightly smarter upstream traffic routing.

Hmm. I'm looking at the Firefox developer console, and it looks like this:

image.png (625×1 px, 66 KB)

Note that styles and scripts are marked as "cached" instead of having some bytes transferred. The cache-control header looks the same on both types on requests (cache-control: max-age=2592000, public).

I suspect you're seeing that because when you reload a page by issuing a "Reload" command in your browser, most (all?) modern browsers interpret that to mean "Reload the page, skipping some/most/all caches". That is, the cache behavior of these sequences differs:

  1. Visit page A, press "Reload".
  2. Visit page A, click the URI bar, press return.

This (bypassing caches on explicit reload) is helpful in most situations where users would reload (e.g., it fixes the page if the old page really had bad cache headers), but can lead to confusion when actually trying to understand cache behavior. For this page, here's what the Network tab reports if I do (1):

Screen Shot 2021-02-09 at 10.57.40 AM.png (689×1 px, 227 KB)

...but here's what I get if I do (2):

Screen Shot 2021-02-09 at 10.57.24 AM.png (391×1 px, 88 KB)

The first request involves a lot more network traffic, but that's because I pressed "Reload", which the browser interprets to mean "Reload, with different cache behavior".

I think this is made even more confusing because older browsers had different behavior, and/or required certain invocations (reload multiple times, shift-reload?) to affect cache behavior.


Here, we also don't necessarily care about the browser cache behavior for purposes of reducing load on the database. A successful implementation might look like an edge cache serving a cached resource which is not itself cacheable by browsers. If this implementation was selected, browser behavior would be the same (the same number of requests, same cacheability behavior for responses) but load on the database would be dramatically reduced.

Put another way, just looking at browser behavior is insufficient to conclude that a request is served by phabricator_file or generates any database load, even if every time you load the page your browser legitimately retrieves a new copy of the resource: it might be served from an edge cache, or from an application-level APC cache, or from an application-level database cache (phabricator_cache), etc.

@epriestley: Thanks for the tips. I think we may have already enabled persistant connections, I will double check.

As for caching, our phabricator servers are already behind our own globally distributed caching reverse proxy CDN. The only problem with that is logged out requests get issued a session instead of a cacheable response. I don't know if there is a way to take advantage of the cache layer that's already in place without major changes to Phab but it would be nice if that were possible. Given how often phab tasks are likely to change I suppose that pages shouldn't ever be cached for a very long TTL so it might actually not be that useful.

Although I thought we were using cluster.databases to configure mysql connections, it appears that we are not.

Although I thought we were using cluster.databases to configure mysql connections, it appears that we are not.

Related: T112776

We were (and still are) blocked by: https://secure.phabricator.com/T11056 Currently, we already have HA by using a proxy, but not load balancing.

@jcrespo: I think we can use cluster.databases in phabricator's config even with a single mysql server name/password. Do you know if the proxy supports persistent connections and whether that would be worthwhile to enable?

There is HTTP-reuse functionality: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4.2-http-reuse but I don't think there is something like that for L3 (non-http) connections. We would need to use an SQL-aware proxy instead- eg, ProxySQL.

But based on above comments, the issue seem to be unclear or not debugged properly yet- wouldn't it make sense to find the exact bottleneck (e.g. TCP overhead, number of concurrent db connections at client side, at server side, etc.) rather than doing large architectural changes? It could be just a single configuration limit on client, server or proxy.

@jcrespo: sure, no argument from me. I was just curious if we could easily implement epriestley's advice (or whether we already had)

It's perfectly fine to configure cluster.databases with one service, and Phabricator internally (in effect) builds a one-service cluster.databases configuration (by looking at mysql.user, mysql.host, etc) if your configuration is empty. Swapping from mysql.user to cluster.databases causes (or, at least, should cause) no change in behavior -- and is happening behind the scenes anyway -- it just gives you access to the persistent flag.

Configuring with cluster.databases instead of mysql.* is informally recommended, and Phabricator may require you to do it in the future, similar to how cluster.mailers became recommended in 2018 and required in 2019 (even for installs with zero or one outbound mailer). The cluster.* configuration options are, to varying degrees, somewhat misleadingly named, and all just mean "list of services, which may sometimes contain zero or one entries, that Phabricator should use to do a thing".

Since the persistent pool is (as far as I understand it) normally held at the Apache or PHP-FPM level, I'd expect the flag will work (i.e., have some effect and not explode) regardless of the behavior of HAProxy. HAProxy will just see, for example, webserver-a.example.com establishing 10 connections with an average duration of 100 seconds instead of 1,000 connections with an average duration of 1 second. I can only imagine this being an issue if HAProxy is configured with a very short maximum connection duration (like 1 second), but even then you should be no worse off with persistent enabled.

Of course, it would be great to understand the root cause first. I don't think I can be of much help there, but let me know if you'd benefit from guidance on, e.g., how to write a Phabricator script that holds as many database connections open as possible or whatever else might be useful in reproducing the problem. But I'd guess you can reproduce it without anything fancier than ab, and most of the tricky part is setting up an environment where reproduction isn't disruptive.