Aug 28 2020
Thanks for clarifying. A correction from my end: the extra dimensions would actually take significantly less then 6 hours since they would not be included as part of refine, but as part of the augmentation job we would be adding, which would run as soon as the hour events are available in hive.
@CDanis that makes sense. In that case what we propose is adding an intermediate data augmentation step to add these dimensions about 6-7 hours after they are added in real time, with the intention of adding a streaming job that adds them real time at a later stage.
Aug 27 2020
@JAllemandou and I just had a chat about these changes. Before proceeding with any of the ways Joseph described above, @faidon: how important is it that this dataset remains real time? Nuria mentioned DOS prevention so presumably it's important to keep it real time. In any case this task will require adding a data augmentation step before ingesting to druid, so using Druid lookups to get the region/site dimension won't be necessary.
Aug 24 2020
Aug 21 2020
PR created in Github: https://github.com/wikimedia/jsonschema-tools/pull/16
Aug 3 2020
it seems you didn't add the text of the feature request?
Jul 27 2020
Jul 15 2020
k now it has more bars so it doesn't look sad. Closing.
Jul 13 2020
ping @Ottomata did we add throttling to the public eventgate instance?
Jun 18 2020
we're not sure this can be removed, let's look into it
@Tgr can you confirm the correct data is there?
For this site, the puppet configuration needs to skip TLS deployment.
@jeblad hi! Unfortunately right now we have a pretty big backlog and given the relatively low usage of annotations I don't think we'll be able to dedicate time to this in the near future. But if you're up for it I'll definitely review and provide timely feedback if you want to submit a CR for it.
One thing to consider is that this is not the final form and feedback is appreciated. @Milimetric gave two pieces of feedback that aren't fully implemented in these mocks but that will be in the final form:
Jun 15 2020
According to the way edits_hourly is defined, the data as it's presented is correct. This seems more of a visualization problem. Maybe try to run a presto query that does an explode on the tags and visualize that? If you think this is a problem we can file the bug upstream.