Fri, Jun 28
Thu, Jun 27
The README says that Prometheus itself if it doesn't see a metric for 5 minutes it'll think it is stale, however a metric pushed to the pushgateway will stay there until deleted, so Prometheus will never think the metric is stale when it pulls metrics from the pushgateway. With Graphite / statsd you push the metric and that's it, if there are no datapoints the metric will have holes where there haven't been pushes.
Oh, I see! Thanks for the clarification.
Wed, Jun 26
Should we do these changes for the dashboard as well?
I think we should! And everywhere in Wikistats, no?
Maybe, we could factor this out into a single place that affects all the app?
Going with https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html I think 50.3M should be probably 50M? and 50.6 M gets shown as 51M?
I think, philosophically, 3 significant digits (50.3M) is more coherent with the fact that we already are simplifying big numbers by way of K, M, etc. abreviations.
Right now, we simplify 534208 to 534K (3 significant digits).
If we did only 2 significant digits, 534805 would rather be simplified to 530K, right?
So following this rule, we can apply the same to numbers that acquire a decimal part, no? 50345719 -> 50.3M, 4378452 -> 4.38M
That said... practically, I think both 2-significant-digits and 3-significant-digits are good for the Wikistats2 case.
@fgiunchedi thanks a lot for the help!
Mon, Jun 24
This is done!
Closing because now the user can choose the chart type anytime.
This data set's size in Druid is 100GB per week.
We can increase it to a month with our current capacity.
Would that be OK?
So, closing then as invalid.
Is there something in Casey's home folders in HDFS and stat/notebook machines that can not be deleted?
Otherwise, we'll proceed to delete it all.
Jun 21 2019
Jun 14 2019
Merged it, thanks for the clarifications!
Jun 13 2019
@MMiller_WMF no I agree with you, it seems OK to me to keep that information.
Jun 12 2019
I added a design document here: https://docs.google.com/document/d/1gL7igq1AtsbZZL_5lQrAE7ak30lYrhXPPz1s-fdZREM
The questions that I think are still open are marked in orange.
Please, feel free to comment and modify!
Thanks @nettrom_WMF for bringing this out.
If the mentor-mentee relation is already public on wiki and users know that (as @SBisson said), I think it's OK to keep that information in the events!
No need to bucket edit_counts nor time since last activity.
Jun 11 2019
Jun 10 2019
@kzimmerman You're right, I think T221338 is not ready yet.
Now, when the data is fixed, we won't need to do anything here, edit_hourly in Hive and edits_hourly in Druid will update automatically.
I'd say it's safe to close, no?
I launched the Oozie coordinator to precompute the edit_hourly table in Hive last Thursday.
And I forgot to launch the other Oozie coordinator, to load it to Druid.
It's running now, if there's no issues, should be live within the next hour.
Jun 7 2019
I've backfilled those 4 schemas from 2019-04-01 until 2019-05-31.
Backfilling for March was sadly not possible because we don't have the old salt for Q1.
The improvement to saltrotate that keeps that extra salt for a couple weeks was introduced after 2019-04-01.
I also didn't backfill June, because we still need to deploy the fix T225178.
After we deploy it, we can backfill the remaining days of June.
To do so, we can execute this command in i.e. stat1007.eqiad.wmnet (adjust the --until param):
sudo -u analytics spark2-submit --name backfill_refine_sanitize_eventlogging_analytics --class org.wikimedia.analytics.refinery.job.refine.EventLoggingSanitization --files /etc/hive/conf/hive-site.xml,/home/mforns/refine_sanitize_eventlogging_analytics_delayed.properties,/srv/deployment/analytics/refinery/artifacts/hive-jdbc-1.1.0-cdh5.10.0.jar,/srv/deployment/analytics/refinery/artifacts/hive-service-1.1.0-cdh5.10.0.jar --master yarn --deploy-mode cluster --queue production --driver-memory 16G --conf spark.driver.extraClassPath=/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:hive-jdbc-1.1.0-cdh5.10.0.jar:hive-service-1.1.0-cdh5.10.0.jar --conf spark.dynamicAllocation.maxExecutors=128 --conf spark.ui.retainedStage=20 --conf spark.ui.retainedTasks=1000 --conf spark.ui.retainedJobs=100 /srv/deployment/analytics/refinery/artifacts/refinery-job.jar --config_file refine_sanitize_eventlogging_analytics_delayed.properties --since 2019-06-01T00:00:00 --until 2019-06-16T00:00:00
This last change is meant for when the netflow data ingestion is fixed.
So that the ingestion happens periodically, every hour.
Here's the sample data loaded to Druid:
I deleted the old "netflow" datasource from Druid.
However, there's some config left in the Turnilo yaml config.
Will create a patch to remove that.
Jun 6 2019
@Amire80 the dashboard should now be showing up to date data, please check everything looks good.
Jun 4 2019
As per Nuria's CR comment I believe she is referring to the MAX_UA_LENGTH limit and not the MAX_UA_DIGIT_COUNT when she mentions to change to 400.
I think @Nuria is talking about the length threshold, no?
The example in the task description has 623 digits in the number after AppleWebKit/.
What do you think is a good threshold?
So I guess this is the reason why the chart ends on 2019-05-12 at the moment?
Yes, you can follow progress to fix that in T224948.
Jun 3 2019
Looking at the generated reports, the header was the following:
date, project, percent_interlanguage_navigation, weekly_navigation_count.project
But the data had only 3 columns:
From start of data until 2018-10 the second column was empty and the last column contained the project dimension,
and then starting from 2018-11 the last column was empty and the second column contained the project dimension.
Dashiki was configured to use weekly_navigation_count.project as the project dimension,
so from 2018-11 on there were no counts for projects, only null.
Looking into this
May 24 2019
This might be related to the recent change of user.
Before, all those crons and systemd timers were executed by the hdfs user.
This last week all has been migrated to the analytics user.
However, it's weird, because the puppet code specifies the analytics user,
and the log file that appears in the message also belongs to that user.
Those seem correct.