Page MenuHomePhabricator

WDQS: Log `x-ja3n` and `x-is-browser` in nginx
Closed, ResolvedPublic

Description

The CODFW WDQS environment was disrupted today from 12:19-13:31 UTC.

While we don't yet know the root cause of the incident, adding the x-ja3n and x-is-browser to the nginx logs should help us identify bot traffic in the future.

Creating this ticket to:

  • Add the above fields to the nginx logs
  • Verify operation

Event Timeline

Change #1201734 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] WDQS: Log `x-ja3n` `x-is-browser` `x-is-client-ip`in nginx

https://gerrit.wikimedia.org/r/1201734

Change #1201734 merged by Bking:

[operations/puppet@production] WDQS: Log `x-ja3n` `x-is-browser` `x-is-client-ip`in nginx

https://gerrit.wikimedia.org/r/1201734

Unfortunately, the above patch only created the file resource. I forgot to include code that actually puts the file resource on the hosts. So we'll need at least one more follow-up patch.

Change #1202733 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] w[cd]qs: Log more provenance headers

https://gerrit.wikimedia.org/r/1202733

Change #1202733 merged by Bking:

[operations/puppet@production] w[cd]qs: Log more provenance headers

https://gerrit.wikimedia.org/r/1202733

Change #1202753 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: fix access log formatting, don't log monitoring traffic

https://gerrit.wikimedia.org/r/1202753

Change #1202753 merged by Bking:

[operations/puppet@production] wdqs: fix access log formatting, don't log monitoring traffic

https://gerrit.wikimedia.org/r/1202753

After merging the above patch, I can confirm that nginx has been reloaded and the logs now have the headers we're seeking.

The formatting is not perfect, but it should be good enough to give us some help when WDQS is falling over. Something like:

tac access.log | awk -F'X-Is-Browser:' '{ print $2}' | awk '{print $1}' | sort | uniq -c | sort -rn to get an idea whether the traffic hitting a host is considered bot-like or not.

bking claimed this task.
bking updated the task description. (Show Details)