Page MenuHomePhabricator

Create scripts to estimate Kafka queue size per wiki
Closed, DeclinedPublic

Description

Until we figure out how to report per-wiki metrics without destroying graphite (T175952) we need a way to estimate the queue size per wiki. This is fairly easy using kafkacat, jq and some bash scripting.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Created a simple bash script that can be run on Kafka boxes to find out the Kafka JobQueue backlog per job type per wiki. It's available in my home dir on kafka1001 under /home/ppchelko/check_queue.sh. I'll let it bake for a while (it probably needs more parametrization etc) and then we can transfer it to some more appropriate location

{

1#!/bin/bash
2
3if [ "$1" = "-h" ] || [ "$#" -lt 1 ]; then
4 echo "Usage: check_queue.sh [ -d domain ] jobtype"
5 exit 0
6fi
7
8if [ "$1" = "-d" ]; then
9 DOMAIN="$2"
10 JOB_TYPE="$3"
11else
12 JOB_TYPE="$1"
13fi
14
15KAFKA_INFO="`kafka run-class kafka.admin.ConsumerGroupCommand \
16 --describe \
17 --group change-prop-${JOB_TYPE} \
18 --bootstrap-server localhost:9092 \
19 --new-consumer | grep eqiad.mediawiki.job.${JOB_TYPE}`"
20
21COMMITTED_OFFSET="$(echo ${KAFKA_INFO} | tr ',' '\n' | sed -n 4p | tr -d '[:space:]')"
22BACKLOG="$(echo ${KAFKA_INFO} | tr ',' '\n' | sed -n 6p | tr -d '[:space:]')"
23
24echo "Committed offset for ${JOB_TYPE}: ${COMMITTED_OFFSET}"
25echo "Full backlog for ${JOB_TYPE}: ${BACKLOG}"
26
27if [ "x$DOMAIN" = "x" ]; then
28 echo "Top 10 domains by job backlog count:"
29 kafkacat -b localhost:9092 \
30 -p 0 \
31 -t eqiad.mediawiki.job.${JOB_TYPE} \
32 -o ${COMMITTED_OFFSET} \
33 -c ${BACKLOG} 2> /dev/null | jq -c '.meta.domain' | sort | uniq -ic | sort -n -k 1 -r | head -n 10
34else
35 echo -n "Job count for ${DOMAIN}: "
36 kafkacat -b localhost:9092 \
37 -p 0 \
38 -t eqiad.mediawiki.job.${JOB_TYPE} \
39 -o ${COMMITTED_OFFSET} \
40 -c ${BACKLOG} 2> /dev/null | jq -c "select(.meta.domain == \"${DOMAIN}\") | ." | wc -l
41fi
}

mobrovac subscribed.

@Pchelolo let's add it to ops/puppet?