Fri, Jan 19
HMMM. If this is JSON data, and the schema is consistent, we could use JSONRefine to build the table, rather than doing all those Hive table/oozie job steps.
Hm, this sounds right to me. The question now is why is the processor process restarting?
Wow nice etherpad plan, <3
Thu, Jan 18
Hm, could we possibly use EventLogging (or similar?) system for this? Incoming valid EventLogging data goes to a Kafka topic anyway. The data would then be available in Hive/Hadoop/Spark for historical querying (although we'd have to whitelist it from purging). The Kafka topic could then be consumed by some process that would then somehow emit to (or be pulled from) Prometheus. Perhaps a streaming aggregator of some kind? The proposed Stream Data Platform program (final name still TBD) next year might make this kinda stuff way easier.
Another question, does kafka use a different port for TLS service?
first step on the frack side is to whitelist the new hosts at the firewalls, can you point me to the list and I'll add a phabricator task?
deployment-kafka01 has been deleted.
deployment-kafka03 has been deleted.
I'm guessing we're talking about a new pool of kafka hosts
Yup! Mostly just changing settings and bouncing the kafkatee instances, but we'll have to coordinate it. If yall use any offset storage features of kafkatee, we'll have to wipe those and start with new offsets.
Wed, Jan 17
@Jgreen FYI, we'll need to coordinate this soon :)
If all is still well tomorrow, I will delete the analytics instances in deployment-prep.
Yeehaw, FYI, all Kafka clients have been ported from analytics to jumbo in deployment-prep in Cloud VPS. EventLogging was a breeze there.
I'd prefer to pin the version in puppet, then restrict it everywhere for SCB. If we fix up that patch to something more acceptable, would you be ok with that @mobrovac?
Tue, Jan 16
Do we still want to do this?
It is also installed on cp1008.wikimedia.org (cache canary) and used by varnishkafka there.
We've also got librdkafka 0.11 backported for Jessie in our apt repo now too. I don't see any blockers to using it. I've tested EventStreams with it locally.
BTW, +1 for this. It'd be especially cool if we applied the same puppet profile in labs and got the same grafana dashboards there.
Ah, the doc was incorrect, analytics-users gives access to both stat1004 and stat1005. Just updated the doc.
K cool, sounds good :)
I suppose a restart without a configcheck would be dangerous, right? So just changing the subscribe behavior of the puppet service isn't quite right. Should we add an exec that does something like configtest && restart with refreshonly, and then notify the exec on config change?
Just curious, why not use git fat? We have a git-fat store available already, and it can be used by scap:
Just to confirm, since this specifically mentions 'pageviews' not 'webrequests', it is likely that analytics-users will be sufficient. Aggregated pageviews are generally public data.
I believe just analytics-privatedata-users would be appropriate for this access.
analytics-privatedata-users and researchers is probably appropriate here.
Mon, Jan 15
Thu, Jan 11
Oook, I've set this [restricted certpath algorithms] on all jumbo Kafka brokers.
Great, that'll do just fine! Assigned to @faidon for approval.
/usr/lib/bigtop-utils/bigtop-detect-javahome, that seems to favor java7 over java8.
Strange that it favors Java 7 even if update-java-alternatives chooses Java 8. Hm.
Here's a Q:
event.data in python 2 is an instance of unicode
Hm, you are right.
Wed, Jan 10
Oook, I've set this on all jumbo Kafka brokers. @BBlack anything else?
Does that mean SHA1 is disabled, except in the cases that it's the root cert of a chain stored in the jdkCA's default store (e.g. list of public CAs)?
Tue, Jan 9
OO I have done some research!
If we do celery workers, it will be as a different task.