Page MenuHomePhabricator

Something in WMF infrastructure corrupts responses with certain lengths
Closed, DuplicatePublic

Description

While investigating T132123, I discovered that responses with lengths near to multiples of 32768 bytes will have their final bytes corrupted. A response with a length exactly 32768 bytes will have the final 8 bytes corrupted (in my test, they are replaced with 27 a5 ae 23 8b 7f 00 00). A file with length 1–7 bytes less than such a multiple will have the final 7–1 bytes corrupted.

Steps to reproduce:

  1. Create a file with a length that is a multiple of 32768 someplace where it will be served by our infrastructure, e.g. using dd if=/dev/zero of=/srv/mediawiki/w/test.txt bs=1 count=32768 on mw1017.
  2. Fetch the file from the public Internet without using compression, e.g. using curl --header 'X-Wikimedia-Debug: backend=mw1017.eqiad.wmnet' -v "https://de.wikipedia.org/w/test.txt"

I've tested this with backends mw1017, mw1099, and mw2099 (frontend via eqiad) with identical results. Going through different load balancers (e.g. using curl -k --header 'Host: en.wikipedia.org' --header 'X-Wikimedia-Debug: backend=mw2099.codfw.wmnet' -v "https://text-lb.esams.wikimedia.org/w/test.txt") also produces the corruption, but the specific values of the corrupted bytes are different.

Event Timeline

This problem is a recent regression. Didn't happen a month ago.

akosiaris added a project: Traffic.
akosiaris added subscribers: BBlack, ema, akosiaris.

Triaging as high in case this is more widespread. Adding traffic and subscribers as well