Page MenuHomePhabricator

Analyze Range requests on cache_upload frontend
Closed, ResolvedPublic

Description

We want to get an idea of the amount of Range requests we currently receive as a percentage of the total.

The data-gathering procedure should only include meaningful Range requests. For example, Range: bytes=0- is not a particularly interesting one, as it ends up requesting the whole file. Same goes for Range: bytes=0-N for a file of length N+1.

Finally, we want to find out what percentage of those are high-range requests beyond some bytes cutoff (say 32M).

Related Objects

Event Timeline

ema created this task.Aug 4 2016, 8:40 AM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptAug 4 2016, 8:40 AM
ema updated the task description. (Show Details)Aug 4 2016, 9:51 AM

I've collected 30 minutes of frontend GET requests on cp1048 as follows:

varnishncsa -m 'RxRequest:GET' -F '%{Range}i %{Content-Length}o %r' -n frontend

Out of 1029692 total requests we got:

norange102667999.71% of total
range30120.29% of total
sillyrange54017.93% of range
highrange100.33% of range

Numbers obtained with:

/^- / { norange++ }
/^bytes=/ { range++ }
/^bytes=[3-9][2-9][0-9][0-9][0-9][0-9][0-9][0-9]/ { high++ }
/^bytes=0- / { silly++ }
/^bytes=0-[0-9]/ && $1 ~ "bytes=0-"$2-1 { silly++ }
END {
    print "Total " NR
    printf "|norange    |%7d|%.2f%% of total\n", norange, norange * 100 / NR
    printf "|range      |%7d|%.2f%% of total\n", range, range * 100 / NR
    printf "|sillyrange |%7d|%.2f%% of range\n", silly, silly * 100 / range
    printf "|highrange  |%7d|%.2f%% of range\n", high, high * 100 / range
}

Note that for some media file types, such as Ogg (the container format), it's impossible to know the file's duration in seconds from the header of the file. Browsers that want to show the file's duration in their little duration bar deal with this by seeking at the end of the file with a Range header.

Until recently, the workaround was to send the X-Content-Duration header, which we precalculate and store in Swift. Mozilla has deprecated this since Firefox 41, though, so the original behavior of seek-to-end might apply again.

Unlike most of the web, we used to have most of our media in Ogg/Ogv, as it was the only unencumbered format. In the days before WebM it was very common to see these kinds of Range requests on the upload varnishes. X-Content-Duration support in MediaWiki/Swift was added per our request exactly for avoiding these annoyingly expensive Ranges.

These days there's WebM, so hopefully Ogg/Ogv usage has declined. More data about this would be probably be useful, though, especially in light of the recent Firefox 41 changes.

This is a 24 hours sample collected on hosts in ulsfo, eqiad and esams.

cp4005 - Total 123158948

norange12282783699.73% of total
range3310040.27% of total
sillyrange6145118.57% of range
highrange30540.92% of range

cp1048 - Total 97091403

norange9682564499.73% of total
range2655840.27% of total
sillyrange8431331.75% of range
highrange17760.67% of range

cp3034 - Total 154112575

norange15381476199.81% of total
range2976880.19% of total
sillyrange8143127.35% of range
highrange41421.39% of range
ema moved this task from Triage to Done on the Traffic board.Aug 9 2016, 2:42 PM

Change 304802 had a related patch set uploaded (by Ema):
cache_upload backend: convert range requests into pass

https://gerrit.wikimedia.org/r/304802

Change 304802 merged by Ema:
cache_upload backend: convert range requests into pass

https://gerrit.wikimedia.org/r/304802

ema closed this task as Resolved.Aug 15 2016, 3:46 PM