Right now we don't allow Varnishes to cache any content, but we plan to start allowing this soon. At that point, internal RESTBase metrics like http://grafana.wikimedia.org/#/dashboard/db/restbase?panelId=8&fullscreen will only show the cache misses. For our purposes it would be super useful to keep track of total requests matching /api/rest_v1/. This will let us track overall API usage, which is going to be our primary KPI for now.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Report RESTBase traffic metrics to Graphite | analytics/refinery/source | master | +272 -14 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • madhuvishy | T109547 Create a metric for overall RESTBase request rates from Varnish logs {hawk} [13 pts] | |||
Resolved | • madhuvishy | T110691 Productionize the Spark job that sends RESTBase stats to Graphite {hawk} |
Event Timeline
Couple of questions:
- What is the time granularity of the metrics needed - hourly/daily/monthly - etc
- We should check where /api/rest_v1/ requests show up in webrequest - only in mobile, text, or if anywhere else
Live graphs & selectable time scales similar to grafana would be awesome, but my understanding is that the analytics infrastructure is better set up to do batch analysis. Daily would be good enough to allow us to track the effect of activating new end points & clients. If hourly is possible without much extra effort, then that would be great too.
- We should check where /api/rest_v1/ requests show up in webrequest - only in mobile, text, or if anywhere else
RESTBase is behind the text varnishes only at this point. I'm not 100% sure if all requests to those Varnishes are logged (if not, we might have to tweak the config), but if they are then it will be for text.
Change 234453 had a related patch set uploaded (by Madhuvishy):
[WIP] Report RESTBase traffic metrics to Graphite
@madhuvishy: Did you see any RESTBase requests in the current request logs? I'm not 100% certain that our Varnish setup does indeed send those as well. If they are missing, we might have to tweak the Varnish config a bit.
@GWicke Yes! I'm actually plotting a graph on graphite for today. Will post that in a bit
@GWicke - Check out graphite.wikimedia.org - test.restbase.requests. I plotted the first 12 hours of today. (Graph Options->Line Mode-> Connected Line gives a clearer picture)
If you haven't seen the patch, what we're trying to do is to calculate hourly request counts, and send them directly to Graphite from Spark(Without statsd, because it won't accept a timestamp and is meant only for real time stats). This seems to work fine.
Bypassing statsd means we won't get derived metrics like count, min, max etc like you see otherwise in graphite. Let me know if this approach works for you. We also need to fix the name of the metric, because once this is productionized it should go to the restbase namespace, you can pick a name if restbase.requests won't work.
Very nice! I just verified the request rates, and found out that our main third-party consumers (Kiwix and Googlebot) are indeed still hitting rest.wikimedia.org, which are two different varnishes which don't report to the text logs. I'll tell them to switch.
Thanks again!
This is done, and the job scheduled on production. Graphite will be updated for hourly numbers starting from Aug 1 2015, and this can be seen on graphite.wikimedia.org under restbase.requests.varnish_requests.