Hi,
Can you please create (or point how to create) the mediawiki.httpd.accesslog discussed in the parent task on both kafka-logging clusters as well as the related ingestion and dashboard?
Expected volume is 10-15k messages per second.
Thanks :)
Hi,
Can you please create (or point how to create) the mediawiki.httpd.accesslog discussed in the parent task on both kafka-logging clusters as well as the related ingestion and dashboard?
Expected volume is 10-15k messages per second.
Thanks :)
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T198901 Migrate production services to kubernetes using the pipeline | |||
Open | None | T238770 Deploy MediaWiki to Wikimedia production in containers | |||
Open | None | T238771 Get production MW-land images built and published | |||
Duplicate | None | T238773 Create initial production MW-land images with blubber | |||
Open | None | T238747 Migrate www.wikipedia.org (and other www portals) to be its own service | |||
Resolved | akosiaris | T238774 Provide the official production base images for Wikimedia use | |||
Resolved | Joe | T265324 Create the base container images for running MediaWiki in a production environment | |||
Resolved | Clement_Goubert | T265876 Logging options for apache httpd in k8s | |||
Resolved | Clement_Goubert | T324439 New mediawiki.httpd.accesslog topic on kafka-logging + logstash and dashboard |
Thank you for reaching out @Clement_Goubert ! re: topic creation IIRC is open (i.e. topic will be auto-created on first push), wrt adding ingestion the relevant file is modules/profile/manifests/logstash/production.pp (with examples) on how to configure new topics to be picked up by logstash (cc @colewhite for confirmation), hope that helps!
As noted in the parent task, and quite an important information I forgot in the task creation, is volume (10/15k messages per second).
From irc discussion:
I think that if we keep each partition at max 2k/3k msg/s is probably better, but we'd also need to consider the number of brokers and the downstream consumers.. we should have the same number of partitions on each broker (traffic in/out is spread evenly) and the consumers should be able to leverage the high number of partitions (like having one thread/process for each partition). Logstash should be good without a lot of fine tuning, but Cole will likely have more insights on that pipeline
I imagine that the partition number can't be set on first push with the open topic creation. It's not a problem since it can be changed dynamically, but we should start with a reasonable default to avoid a costly topic rebalancing in the future.
Volume recommendation is apparently ~2/3k mps/partition, so we may want 5 partitions, not considering broker equilibrium and consumer spread.
The topic can be created with the number of partitions we want through this command on a kafka node:
kafka topics --create --topic mediawiki.http.accesslog --partitions 5 --replication-factor 3
@Joe It maybe better to create it that way before starting to send messages to it and rebalance later? In any case, we can wait on @colewhite's opinion.
I'm not a kafka expert, but this seems like a reasonable place to start. Pre-creating the topics is definitely the way to go.
At the beginning, we should configure logstash to consume from the topic and store only 1% of the logs. Once we have data on actual volume and figure out data retention requirements, we can then raise the threshold to meet the need.
I've dug into it a bit, and we have 3 brokers per datacenter for kafka-logging, so for balance's sake I'll create the topic with 6 partitions.
kafka topics --create --topic mediawiki.http.accesslog --partitions 6 --replication-factor 3
eqiad:
cgoubert@kafka-logging1001:~$ kafka topics --create --topic mediawiki.http.accesslog --partitions 6 --replication-factor 3 kafka-topics --zookeeper conf1007.eqiad.wmnet,conf1008.eqiad.wmnet,conf1009.eqiad.wmnet/kafka/logging-eqiad --create --topic mediawiki.http.accesslog --partitions 6 --replication-factor 3 WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic "mediawiki.http.accesslog".
codfw:
cgoubert@kafka-logging2001:~$ kafka topics --create --topic mediawiki.http.accesslog --partitions 6 --replication-factor 3 kafka-topics --zookeeper conf2004.codfw.wmnet,conf2005.codfw.wmnet,conf2006.codfw.wmnet/kafka/logging-codfw --create --topic mediawiki.http.accesslog --partitions 6 --replication-factor 3 WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic "mediawiki.http.accesslog". ``
Change 867136 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):
[operations/puppet@production] P:logstash::production: mediawiki-http-accesslog
I've done my best to add sensible logstash::input::kafka, but couldn't find how to configure logstash to only store 1% of the logs. Can you point me towards where that should be set?
Change 867630 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: heavily restrict mediawiki http accesslog during initial onboarding
Change 867630 merged by Cwhite:
[operations/puppet@production] logstash: heavily restrict mediawiki http accesslog during initial onboarding
Change 867136 merged by Clément Goubert:
[operations/puppet@production] P:logstash::production: mediawiki-http-accesslog
Changed kafka topic retention time to 2 days instead of the default 7.
cgoubert@kafka-logging1001:~$ kafka topics --alter --config retention.ms=172800000 --topic mediawiki.http.accesslog cgoubert@kafka-logging2001:~$ kafka topics --alter --config retention.ms=172800000 --topic mediawiki.http.accesslog
Change 880895 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):
[operations/puppet@production] logstash: Fix typo in mediawiki.httpd.accesslog
There was a typo made when creating the topics (mediawiki.http.accesslog instead of mediawiki.httpd.accesslog)
The above CR fixes the logstash ingestion side.
For the kafka topics themselves, the topic already existed in eqiad, so I set its retention to two days (same as above), but it doesn't exist in codfw. Creating it with the required replication facter is blocked by kafka-logging2002 being behind b2 https://phabricator.wikimedia.org/T327001
Change 880895 merged by Clément Goubert:
[operations/puppet@production] logstash: Fix typo in mediawiki.httpd.accesslog