Page MenuHomePhabricator

Kafka 0.9's partitions rebalance causes data log mtime reset messing up with time based log retention
Closed, ResolvedPublic0 Estimated Story Points

Description

We observed a weird Kafka behavior since we upgraded to 0.9: right after a broker restart the data log mtime gets reset to now. This messes up a lot with time based log retention, since the Kafka log cleaner uses the data log inode's mtime value to establish when a log file on disk needs to be cleaned.

We sent an email to kafka-users@ tracked in this email thread

One user pointed us to the following upstream bug: https://issues.apache.org/jira/browse/KAFKA-1379

The only workaround proposed, for the moment, is to set the retention.bytes configure option to control the maximum size of a topic partition. We are currently using a similar trick but with retention.ms:

https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Temporarily_Modify_Per_Topic_Retention_Settings

This phab task is meant to track upstream changes and to decide how we should proceed from now with Kafka restarts.

Event Timeline

After the cluster restart, kafka1022 doesn't look good:

elukey@kafka1022:/var/spool/kafka/b/data/webrequest_text-16$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs           9.5G  146M  9.3G   2% /run
/dev/md3         28G  3.5G   23G  14% /
tmpfs            24G     0   24G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            24G     0   24G   0% /sys/fs/cgroup
/dev/sde1       1.8T  410G  1.4T  23% /var/spool/kafka/e
/dev/sdc1       1.8T  841G  993G  46% /var/spool/kafka/c
/dev/sdg1       1.8T  536G  1.3T  30% /var/spool/kafka/g
/dev/sdh1       1.8T  1.2T  699G  62% /var/spool/kafka/h
/dev/sdf1       1.8T  1.1T  785G  58% /var/spool/kafka/f
/dev/sdd1       1.8T  313G  1.5T  18% /var/spool/kafka/d
/dev/sda3       1.8T  620G  1.2T  35% /var/spool/kafka/a
/dev/sdl1       1.8T  619G  1.2T  34% /var/spool/kafka/l
/dev/sdj1       1.8T  1.1T  777G  58% /var/spool/kafka/j
/dev/sdi1       1.8T  542G  1.3T  30% /var/spool/kafka/i
/dev/sdk1       1.8T  1.4T  472G  75% /var/spool/kafka/k
/dev/sdb3       1.8T  1.6T  240G  87% /var/spool/kafka/b

Checked on the host:

elukey@kafka1022:/var/spool/kafka/b/data/webrequest_text-16$ ls -lht
[...]
-rw-r--r-- 1 kafka kafka 512M Jun  3 08:35 00000000012543942505.log
-rw-r--r-- 1 kafka kafka 137K Jun  3 08:35 00000000012543942505.index
elukey@kafka1022:/var/spool/kafka/b/data/webrequest_text-16$ date
Sat Jun  4 09:17:54 UTC 2016

So a solution to be ok during the weekend and the next days would be:

kafka configs --alter --entity-type topics --entity-name
webrequest_upload --add-config retention.ms=86400000

Mentioned in SAL [2016-06-04T09:38:07Z] <elukey> Lowering down temporarily the Analytics kafka upload retention time to 24h to free space (T136690)

Mentioned in SAL [2016-06-04T09:38:22Z] <elukey> Lowering down temporarily the Analytics kafka upload retention time to 24h to free space (T136690)

Mentioned in SAL [2016-06-04T09:47:50Z] <elukey> removed temporary Analytics Kafka upload retention override (T136690)

Milimetric triaged this task as Unbreak Now! priority.Jun 6 2016, 4:40 PM
Milimetric edited projects, added Analytics-Kanban; removed Analytics.

Mentioned in SAL [2016-06-08T08:15:38Z] <elukey> lowering down webrequest_text kafka topic retention time from 7 days to 4 days to free disk space (T136690)

Mentioned in SAL [2016-06-08T08:45:33Z] <elukey> removed temporary retention override for kafka webrequest_text topic (T136690)

Somebody was able to narrow down the problem in https://issues.apache.org/jira/browse/KAFKA-3802 finding the exact code change that seems to have caused the issue!

Change 293270 had a related patch set uploaded (by Elukey):
Limit the maximum broker topic log size to 10TB.

https://gerrit.wikimedia.org/r/293270

Mentioned in SAL [2016-06-08T16:02:54Z] <elukey> temporary set a 10TB upperbound to the Kafka webrequest_text topic to free space (T136690)

Mentioned in SAL [2016-06-09T15:31:43Z] <elukey> added topic override retention.bytes=536870912000 to Kafka webrequest_text (T136690)

Change 293270 merged by Elukey:
Limit the maximum Kafka topic partition size to 500GB.

https://gerrit.wikimedia.org/r/293270

elukey lowered the priority of this task from Unbreak Now! to Medium.Jun 13 2016, 9:37 PM

https://gerrit.wikimedia.org/r/293270 worked fine and we are now automatically limiting Kafka topic partitions to 500GB. The next step is to keep following up in https://issues.apache.org/jira/browse/KAFKA-3802 for a more permanent fix.

Milimetric set the point value for this task to 0.Jun 28 2016, 4:02 PM