Yes, no renames will be needed! We'll find a solution to the array field and implement it soon.
The only caveat is that fields (dimensions) with high cardinality, like pageToken, sessionToken, pageTitle and pageIdSource perform very bad in Druid, so I would blacklist them from Druid ingestion if possible.
Fri, Sep 14
@Nuria, I totally agree that "1-Month" is too short to percieve any change, and the graph becomes totally useless.
On the other hand, "All" has too many data points for this particular metric (even if monthly). And you can not distinguish the last data points.
You can see the slope, but not follow the recent progress of the metric.
Thu, Sep 13
Hi all :]
Mon, Sep 10
There's also about 20 citationUsage events per minute dropped client-side before sending (about 1% of potential events), due to urlSize constraints.
Resolved! Kudos to Sahil
Let's 1) stop purging 2) drop all echo tables on events and events sanitized database 3) start purging again
Aug 22 2018
Aug 21 2018
Aug 20 2018
As part of this task, we should also add/change a/the puppet cron job to launch this properly.
Aug 14 2018
Aug 13 2018
Should we sort the array before stringifying?
Like: [1,2] and [2,1] generate the same string "[1,2]"?
This would reduce the number of possible string values, and make differently ordered arrays match.
However, if the order in the array has a specific semantics, that would be lost in the stringification.
Aug 9 2018
Aug 6 2018
Aug 2 2018
You can also reproduce the error just by selecting a monthly time range and then changing to a daily time range.
The selected period is inconsistent with the shown data.
Aug 1 2018
Jul 30 2018
Let's concentrate on the popups fix, and then if we have time, we can work on this. Is this OK?
Jul 27 2018
Jul 20 2018
Jul 19 2018
Makes sense to me!
Jul 18 2018
Jul 16 2018
Already resolved by Sahil and Amit, thaaanks!
Jul 13 2018
I think a minimal amount of errors is expected.
AFAIK we only shorten the source_url, but there are other potentially long fields like source_title, that combined still can make the event overflow the max 2000 chars.
So, the errors should be a lot less frequent but not disappear, no?
Jul 12 2018
Base64 includes a-z, A-Z, 0-9, +, and /. So, all except / are 'legal'. I bet pivot/turnilo URI encode the base64 string to avoid problems with the /. This would add 2 extra chars for every / (it frequency being 1/64 in average), so a 3% increase. Theoretically, then, using base64+uriEncoding would be ~17% (not ~20%) shorter than using uriEncoding only.
I just merged the change, because it got a +1 from Nuria and a +1 from Chelsy.
We'll deploy that with the next refinery deployment.
I think we can close this task as resolved.
Jul 11 2018
@chelsyx Oh, ok! Sorry for the confusion :P
Yes, I'm aware that the length of the URL may be more than 2000 chars in some extreme cases (e.g. the user selects many languages). But I don't have other solution except putting it into another schema. Do you have any suggestion?
I can not reproduce this error, it might be a race condition.
Jul 10 2018
When working on T195269, I saw that a new field was added tho MobileWikiAppiOSUserHistory: feed_enabled_list. This field is a "2-level" nested object with arrays at its leaves; While theoretically this is supported by EL pipeline, we might see some issues. A couple comments on it:
- As MobileWikiAppiOSUserHistory is already blacklisted for MySQL insertion, there will not be problems inserting events for this schema to MySQL and/or sanitizing these events in MySQL.
- However, as this field can potentially become very long, it might contribute to the whole event overflowing the max URL length of aprox. 2000 chars. And in this case, the events will fail validation in the EL processors. I saw that the subfield names were shortened on purpose, so I assume you already are aware of this.
- Fields with complex types are not supported in Druid, so this schema as is, will not be able to be fully imported to Druid (or turnilo).
- I think the schema does not follow the json schema spec when defining the 'ena' and 'dis' sub-fields. I think the [ and ] are not supposed to be there, but I might be wrong.
Jul 9 2018
Jul 6 2018
- have the popup be the same component for all charts, that receives the data.
if the first problem "diagonal zoom" takes a lot of time to solve, don't bother
- keep 3 colors designed for sections: reading, contributing, content (we can use different shades if it looks better)
- remove black border from line charts
- maybe thicken the lines
Jul 5 2018
We could include here also changing the colors of the bar charts, because some of the colors currently used are too strong.
This happens in both bar-chart and line-chart.
This problem was solved by another task already.
Jul 3 2018
For this particular case, let's not whitelist the os_minor field for now. We released the new version about a week ago and I need to verify that we are collecting data from users as expected. After that, I will run another analysis using the data from this table and check the bucket size to see how small it is.
Jul 2 2018
Jun 29 2018
Also forgive me for the late response, your thorough report gave me a lot to think about.
Jun 27 2018
Oh yea, I meant the tiny inconsistency created by the normalization rounding. I added a one-liner to the docs:
Yes, it's better for the map component in Superset to have standard country names, and the names in the original dataset were far from standard (e.g. for Curaçao I found three variations: Curaçao, Curacao and Cura?ao).
Jun 26 2018
Sorry, this change was not meant to be linked to this task... please ignore.
Jun 25 2018
Jun 22 2018
Hey @mpopov, sure I can try.
Jun 21 2018
I'm still a little cautious about adding additional logic to the client even a simple one. Am happy to be more aggressive with the cut off e.g. 1000 characters and then see what happens.