Fri, Jan 18
I don't think writing from Hadoop directly to M2 master is a good idea. But it is not really my call.
Thu, Jan 17
I'm opening this task now, as we are starting to design some new schemas as part of our goals for Q3 2018-2019. We plan to migrate the Mediawiki monolog avro schemas and events over to JSON Schema and use the new EventGate stream intake service. T214080: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate tracks this work.
to have a daemon on the mysql hosts
To clarify, it is unlikely these scripts would run on the mysql servers themselves. If not Hadoop, they'd run somewhere else, either a Ganeti instance as Dzahn suggests, or perhaps colocated with the recommendation-api service.
FYI, the symlinked (or copied, or packaged .tgz) chart is necessary in the charts/ dir. Leaving it as a symlink since it is there for development.
We should have a clear separation of concerns and while the hadoop cluster is in charge of computing the data the task of updating the db needs to live in the service side.
@bmansurov I think we can eventually figure out a way to get your dump files out of analytics to somewhere that can access mysql. Where and how to run your importer script is a different question that we probably need to talk with SRE about. Could we automate running it on the boxes where your service actually runs? Not sure.
nope! def not. must be some super legacy thang.
@Pchelolo, so aside from the eventual HTTP based schema registry idea, we will still need (at least) one more git schema repository for analytics. This repo should use the same CI pipeline we build for event-schemas, but more people will have commit and merge access to it.
In our discussion yesterday, we mention both git lfs and swift as an option for this. I turned that idea down, but it seems it has been partially explored by RelEng. We should at least look into it.
For development phase, i'll use thee wmfdebug image. For prod deployment (outside of staging k8s), we'll build the schema repo clone into the app image itself.
Hm, well, I did consider doing this for final prod deployment.........yar ok. Nevermind. I'll DO IT!
Yar ok. I don't really want to put the schemas into EventGate, soooo I'll make a deploy repo after all! :)
Wed, Jan 16
@fselles, I'm not able to get the requirements.yaml repository to work. The only reason my symlink works is I think because the dependency will be looked for in charts/ by default. No matter what I put for repository, if I don't have the symlink in charts/, I get
That works for a java developer, but nobody else (including Quarry). Quarry is a Python app which is why I was looking for support in their python client.
I was trying to figure out how this will actually look for the end user
Tue, Jan 15
Kafka doesn't support SRV. Hence my Round Robin DNS patch. After more discussion with @BBlack, I think I've decided to abandon this idea and just hardcode the Kafka brokers for now. @akosiaris mentioned that they have identified this problem (config management in helm charts) in other areas too, so I'll just hardcode for now and hope for a better future.
Tomorrow (Jan 15) we have a meeting with some SRE folks to revisit this. We've got the cloud-analytics Hadoop and Presto cluster up and running in Cloud VPS (thanks Andrew!). But as Brian says
No no for me, all I want is an alias for the list of Kafka brokers in a given Kafka cluster. I don't need any DC failover stuff. Perhaps discovery is not the right word here. Round Robin DNS might be enough for me.
Yeah, this isn't the first time we've had this problem. It isn't actually that easy to solve, because the Kafka consumer doesn't advance if there are not messages in the topic, so we don't have a way of knowing if there are just no messages, or if the Kafka consumer is stalled/broken.
Mon, Jan 14
Hm. In Kafka main clusters, we handle bursts of around 1000 msgs / sec per a single partition topic.
Analytics is not planning on doing anything like this. If we were to do this, I think it should be a larger effort between RelEng and the SRE/Service Operations team.
Thanks for filing Tilman, I'm refining this data now, and Oozie is scheduling the jobs now:
Fri, Jan 11
Seems like it would work, but it doesn't look like this provides much beyond the different variants in the blubber.config files. Could the stage and directory keys just be built into the variant config? Or, does that couple the blubber format to our CI pipeline in a way we don't want?
Wed, Jan 9
@Capt_Swing The issue is the presence of the .my.cnf file in your home dir on stat1006. It's being read by default and overriding the research-client.cnf conf file. Move or rename it out of the way and it will work!
OK cool, so then it sounds like at least for event schema stuff, adding partial block info to the event is not a blocker for partial blocks deployment; we can do that whenever we get around to it/need it.
Hi all! There are a lot of tickets about this stuff, and I'm just learning about it from Moriel's email so forgive me if I don't have a full understanding. Just an FYI, this will affect both eventbus and change-prop stuff. The EventBus extension is emitting user/blocks-change events on the BlockIpComplete hook. The user/blocks-change schema includes the blocks as they were before the change and as they are after.
Tue, Jan 8
I think kubectl describe pod is the most helpful. I'm onto something great here!
Ok, I'm pretty close. I've got the charts deployed in minikube via helm. It seems my setup isn't quite right though, I think the image doesn't start properly. Got any tips for debugging? Not finding much help via helm / kubectl commands. (Tried both kubectl logs and attach)
OH! Nevermind I see, that isn't an instruction...but a summary of what we are doing, never mind!
@Pchelolo, hmmmm. eventgate in prod will need to have the event-schemas repo(s) available somehow. I'm working on getting the docker images and helm charts figured out. For the initial deployment prototype, I'm considering just making a blubber and CI based docker image that will be included in the eventgate docker image somewhere. This will work for a trial, but will be a bit inflexible, since it will mean that a new schema will require a rebuild and redeploy of eventgate.
Ok thanks @sbasset. I've brought the form template over to this task and filled it out. @charlotteportero let me know if I'm missing anything.
Hm, why was '0.26.3' in frozen-requirements.txt if you cherry picked it? It should be pointing at the github fork link. https://gerrit.wikimedia.org/r/#/c/analytics/superset/deploy/+/481054/1/frozen-requirements.txt
@charlotteportero I don't think any of us knew there was a security review form. Can you link to it please?
the "Pull-request to Change"
Mon, Jan 7
Hmm, I'm not sure if it would work, but it might. We should be able to at least install the python 3.6 binary (we have deb packages for it now :) ). I don't think anyone would mind if we just did that, and you could try it on your own (since you are an ADVANCED SWAPPER :D )
If something like ^ worked nicely, my pro github arguments would be all moot and I'd be fine with gerrit all the way. :)
Are you likely to get many contributors?
Q: would blubber's variants be enough to support the wsgi vs celery use case?
Most of the time I'm mostly unoppionated and am all for gerrit, especially when the target audience of the software is users that are (or should be) already used to using gerrit.
The next time we build/deploy EventStreams, KafkaSSE from diffusion will no longer be used. Can we close this? Should we delete the repo from diffusion?