Page MenuHomePhabricator

[Spike] Is Varnish caching ORES responses?
Open, MediumPublicSpike

Description

We don't want Varnish front-end caching, because it isn't currently able to purge scores when the ML models are regenerated. However, I see signs that we might be getting cached in Varnished.

We should be including the following headers with our responses:
https://github.com/wiki-ai/ores/blob/master/ores/wsgi/util.py#L93

However, hitting localhost with cURL from ores-web-04, the headers don't appear,

curl -D headers.txt http://localhost:8080/v2/scores/enwiki/damaging/745065890/

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 328
Access-Control-Allow-Origin: *

{
  "scores": {
    "enwiki": {
      "damaging": {
        "scores": {
          "745065890": {
            "prediction": false,
            "probability": {
              "false": 0.9712963561026546,
              "true": 0.028703643897345314
            }
          }
        },
        "version": "0.1.1"
      }
    }
  }
}

The old model version is also extremely suspicious. Maybe I'm on an older web box?

Here's the VCL configuration which should match on our Cache-Control header and prevent caching:
https://github.com/wikimedia/operations-puppet/blob/production/modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb#L331

Production responses are scaring me, they include headers like,

https://ores.wikimedia.org/v2/scores/enwiki/damaging/745065890/

Accept-Ranges: bytes
Age: 2582
Content-Encoding: gzip
Content-Length: 176
Content-Type: application/json
Date: Wed, 26 Oct 2016 17:35:42 GMT
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept-Encoding
Via: 1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4
X-Cache: cp1061 miss, cp2012 miss, cp4002 miss, cp4001 hit/2
X-Firefox-Spdy: h2
access-control-allow-origin: *
x-analytics: WMF-Last-Access=26-Oct-2016;https=1
x-cache-status: hit
x-client-ip: <redacted>
x-varnish: 26581384, 24460985, 9979155, 819430 469613

I don't like the cp4001 hit, especially as ganglia reports that machine is down. Hard refresh changes nothing.

Also, the cached score doesn't exist on ores-redis-02 port 6380,

keys get ores:enwiki:damaging:745065890:*

to be continued...

Event Timeline

I think I must be looking at staging boxes. Can someone help me access the production cluster?

Halfak triaged this task as Medium priority.Nov 10 2016, 3:27 PM
Halfak moved this task from Unsorted to Research & analysis on the Machine-Learning-Team board.
Ladsgroup renamed this task from Spike: Is Varnish caching ORES responses? to [Spike] Is Varnish caching ORES responses?.Nov 10 2016, 3:28 PM
Ladsgroup added a project: Spike.
$ curl -I https://ores.wikimedia.org/v2/scores/enwiki/damaging/745065890/ | grep X-Cache-Status
X-Cache-Status: pass

so it's not being cached anymore.

In any case, requests sent to localhost cannot be cached in Varnish (unless the local host is one of the Varnish boxes).

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptSep 23 2020, 4:36 PM