Page MenuHomePhabricator

Send Mediawiki Kafka logs to Kafka jumbo cluster with TLS encryption
Closed, DeclinedPublic

Description

With the planned switch of traffic from eqiad to codfw we will start shipping these log messages over the internet. As they contain PII (the user queries and ip addresses, but not names) we should look into encrypting the traffic. It looks like modern versions of kafka support TLS encryption,

The library we use in php would have to be adjusted to support TLS, it currently uses a direct fsockopen call. We can likely abstract out the stream into normal and tls versions. php supports tls sockets via the stream api roughly as follows:

$context = stream_context_create(); 
$result = stream_context_set_option($context, 'ssl', 'local_cert', '/path/to/keys.pem'); 
$socket = stream_socket_client('tls://'.$host.':443', $errno, $errstr, 30, STREAM_CLIENT_CONNECT, $context);

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added subscribers: EBernhardson, Ottomata, Gehel.

TLS is in Kafka 0.9, and we aren’t planning on upgrading soon. Depending
on your volume, it may be possible to use the main (non analytics) Kafka
clusters in each DC to produce this data. This would allow you to avoid
cross DC produces, which aren’t recommended anyway. This data would then
be mirrored to the analytics-eqiad Kafka cluster via MirrorMaker.

Which, reminds me, we need to look into encrypting Cross DC MirrorMaker
traffic. We can probably do this with IPSec like we are for the cross DC
produce caches now.

Hm.

If MirrorMaker won’t work, perhaps we can do App server IPSec? Oof,
doesn’t sound nice. Will poke @BBlack about it.

Milimetric triaged this task as Medium priority.Feb 11 2016, 6:05 PM
Milimetric set Security to None.
Milimetric moved this task from Incoming to Event Platform on the Analytics board.

The current volume for only CirrusSearch logging is a peak of ~3k/s. This might increase greatly when the API team starts using the same code path for logging api feature usage. I'm not sure what their volume will be, but i think it will be similar to the api.log on fluorine which at this particular moment (probably not the busiest time of day) is doing ~5k/s. If i had to guess, expecting a peak of 15k logs per second would be reasonable.

Can the local kafka instances in codfw handle 15k/s on top of whatever they do currently?

@EBernhardson time has passed and now this task is a child of https://phabricator.wikimedia.org/T152015, so we probably need to discuss next steps for encrypting traffic from MW to the new "Analytics" Kafka cluster (that will be called Jumbo and it will be more general purpose :).

We'll offer two ports initially, one for plaintext traffic and one for TLS authentication, so nothing should immediately change, but the sooner we discuss this the better.

Ottomata renamed this task from Look into encrypting logs sent between mediawiki app servers and kafka to Send Mediawiki Kafka logs to Kafka jumbo cluster with TLS encryption.Jan 17 2018, 8:59 PM

Change 404870 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/vagrant@master] Update Kafka to 1.0 with SSL support

https://gerrit.wikimedia.org/r/404870

Change 404870 merged by Ottomata:
[mediawiki/vagrant@master] Update Kafka to 1.0 with SSL support

https://gerrit.wikimedia.org/r/404870

Ok, the new Kafka Jumbo cluster is ready with TLS support. I think we need to change PHP Kafka clients if we want to support TLS properly.

We'll go ahead and move Mediawiki over to Jumbo before figuring out the TLS piece.

mforns raised the priority of this task from Medium to Needs Triage.Apr 16 2018, 4:15 PM
mforns moved this task from Kafka Work to Deprioritized on the Analytics board.

If we do this, it will be from eventgate-analytics, not from Mediawiki.