Fri, Nov 17
Seems to me that the DROP DATABASE list is correct.
Tue, Nov 14
Here's the wiki pages where the dump's docs have been historically:
Fri, Nov 10
No, all data will be purged the same.
Thu, Nov 9
We'll undo the change. This looks like a really good argument.
Our Alpha launch will not have internationalization, we are aiming to implement it for our final launch.
From duplicate task:
Actually, no need to invite me. We can follow up on Legal's position after the meeting. Thanks!
Please, invite me to the meeting as well. Regardless of the outcome, I'd like to know Legal's point of view on this subject, so that we can apply it to similar scenarios in the future.
Tue, Nov 7
We encountered a couple difficulties in the way Pivot works versus the nature of NavigationTiming measures:
- NavigationTiming's metrics are time measures in milliseconds. Those are "inverted" because the lower the value the better. And are also "bounded" because the minimum is 0. Now the problem is NavigationTiming's fields are not required, and can have NULL values eventually. Druid ingestion transforms NULL values for numerical metrics into 0s. In the case of timely measures we can not count the absence of a metric as 0 because 0 is not a neutral value (it's the "best" value a metric can have). There's ways to work this around for Druid, but when it comes to Pivot, that can not be easily solved.
- Raw NavigationTiming's metrics are not of much value in Pivot. Pivot will show you a sum of all time measures for a given metric in a given time, but it's not the absolute sum we'd be interested in, rather a percentile value. There are ways to configure average metrics in pivot using the Yaml config file, but those are not scalable and would only provide average metrics, which are probably not interesting for performance measures.
A couple things we can try to solve those issues are:
- The latest development version of Druid has "approximate histograms" (http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html), that might help in ingesting percentile metrics that can be displayed in Pivot.
- We could pre-compute percentiles in scala ingestion job, so that Druid would be able to display them as regular metrics. One drawback of this approach is that we'd be forced to choose a granularity for pre-computation (i.e. minutely) and the metric would be frozen to that granularity. It would be kind of contradictory, since Druid is about aggregating.
Concluding, we can have a look at "approximate histograms", updating our Druid version, etc. to have NavigationTiming in Pivot. But, before that we'd like to have a simpler Schema being ingested periodically from Hive to Druid to Pivot. We'll pause this task until we've successfully finished the pipeline that ingests simple Hive schemas, and then will resume it to fix these problems.
Mon, Oct 30
Oh, understand! I didn't know that. Thanks for the explanation.
Then yes, you're totally right, skin = (vector | minerva | other) will be fine.
Fri, Oct 27
Thanks @pmiazga for pushing this forward.
Thu, Oct 26
We can try to use the same timestamp (minimum between revision timestamp and move timestamp) to correct both the redirect page creation and the original page move event.
Mon, Oct 23
Oct 18 2017
Awesome, will check for incoming data during the afternoon.
Oct 17 2017
Oct 16 2017
Oct 15 2017
Oct 13 2017
Oct 11 2017
Oct 10 2017
Speaking with the team, we agreed that Schema:Print's skin field is a tricky case, and that we needed to dig a little bit more. So looking deeper into it:
Oct 9 2017
@JAllemandou Will do that :]
Oct 6 2017
You're right, one would have to know which skin do I use. For example: If someone suspects that I printed a certain article (and they have access to the Schema:Print data), then they can potentially confirm their suspicion by i.e. peeking at my laptop and getting my Wikipedia skin.
I know the possibility of this actually happening is rather microscopic, but the theory is there.
@Jdlrobson Answering to you comment on gerrit:
Could you elaborate on this by giving an example? Why would this (the skin field) be potentially identifying?
I think the skin field could be potentially identifying, because some skins might be uncommon enough that they are used by very few people.
For instance: Imagine we have an event collected from catalan wikipedia with skin="blah". And there's only 3 people in the world with that skin (me and 2 more). I'm from Barcelona and the other 2 live in New Zealand, so this may indicate that the event was generated by me. This, of course, depends on the frequency (distribution) of the skins.
Oct 5 2017
Oct 4 2017
Oct 3 2017
I get an error when selecting any wiki in the dashboard or detail page:
bundle.js:29171 Uncaught (in promise) SyntaxError: Unexpected token u in JSON at position 0 at JSON.parse (<anonymous>) at new GraphModel (bundle.js:29171) at bundle.js:57450 at <anonymous>
The dashboard and detail page don't show any data.
LGTM! Awesome visuals and code structure. I couldn't find any errors.
Oct 2 2017
BTW, thanks a lot for taking the time and effort to create that puppet change!
Sep 28 2017
This is probably because of the new debian stretch stat1005 is running on.
Let's change banner_activity_minutely to hyphens and that's that.
Hi @Nirmos, I'm not super familiar with the AbuseFilters. Can you please give us an explanation of the flow that you are seeing and the one that you'd expect to see? Thanks!