Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (253 w, 5 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Unknown

Recent Activity

Yesterday

mforns added a comment to T204735: Move the Analytics Refinery to Python 3.

Thanks Luca for the pastes.
I changed a bit the syntax to adapt to python3.
And tested that everything is ok :].
Luckily, the checksums match the ones generated by python2,
so we won't need to change all the checksums.
Here's the change:

Wed, Sep 18, 5:40 PM · Analytics-Kanban, Analytics

Tue, Sep 17

mforns added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

@Nuria yep, makes sense.

Tue, Sep 17, 9:46 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)
mforns added a comment to T226663: Develop a tool or integrate feature in existing one to visualize WMCS edits data.

Yes! if we manage to use that UDF and populate a hive table with it, I think it would be easy to configure a reportupdater job to generate the desired reports.

Tue, Sep 17, 8:53 PM · Cloud-Services, Developer-Advocacy (Jul-Sep 2019)

Mon, Sep 16

mforns moved T229674: Set up automatic deletion for netflow datasource in Druid from Ready to Deploy to Paused on the Analytics-Kanban board.
Mon, Sep 16, 3:03 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T204735: Move the Analytics Refinery to Python 3.

@elukey OK thanks!
Will add those as a TODO for me.

Mon, Sep 16, 2:22 PM · Analytics-Kanban, Analytics
mforns added a comment to T204735: Move the Analytics Refinery to Python 3.

The following scripts should be removable after migration to drop-older-than:

refinery-drop-banner-activity-partitions:#!/usr/bin/env python --> seems unused in puppet
refinery-drop-eventlogging-partitions:#!/usr/bin/env python --> seems unused in puppet
refinery-drop-hourly-partitions:#!/usr/bin/env python  --> seems unused in puppet

And these two, should too I think, I wonder why they are still in puppet. Did I forget to migrate them?

refinery-drop-hive-partitions:#!/usr/bin/env python
refinery-drop-webrequest-partitions:#!/usr/bin/env python
Mon, Sep 16, 2:12 PM · Analytics-Kanban, Analytics

Fri, Sep 13

mforns created T232852: Reload Hive2Druid datasources from already indexed data instead of raw data.
Fri, Sep 13, 4:05 PM · Analytics
mforns added a comment to T208612: Release edit data lake data as a public json dump /mysql dump, other?.

See the final format of the dumps, chosen after the community survey, here: T224459#5491080

Fri, Sep 13, 3:58 PM · Analytics-Kanban, Research-Backlog, Analytics
mforns added a comment to T224459: Recommend the best format to release public data lake as a dump.

So, this is the final format of the MediaWiki history dumps:

Fri, Sep 13, 3:57 PM · Research, Analytics
mforns added a comment to T224459: Recommend the best format to release public data lake as a dump.

Hi all!

Fri, Sep 13, 3:20 PM · Research, Analytics
mforns created T232844: Release wikimedia history dumps sorted by user ID and page ID.
Fri, Sep 13, 3:19 PM · Analytics
mforns created T232843: Can we add ORES data so it can be easily retrieved per revision present on mediawiki history?.
Fri, Sep 13, 3:16 PM · Analytics

Thu, Sep 12

mforns moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from In Progress to In Code Review on the Analytics-Kanban board.
Thu, Sep 12, 8:50 PM · Analytics-Kanban, Research-Backlog, Analytics
mforns moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from Paused to In Progress on the Analytics-Kanban board.
Thu, Sep 12, 5:11 PM · Analytics-Kanban, Research-Backlog, Analytics
mforns updated subscribers of T230743: Create a repository and user for Product Analytics Oozie jobs?.

@kzimmerman
Can you Product Analytics please, as Greg suggests, request a repository in Gerrit to store your team's Oozie jobs?
https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests
I can do that as well, but I thought you'd like to chose a repo name, owner and type.
Cheers

Thu, Sep 12, 2:45 PM · Repository-Admins, Release-Engineering-Team, Product-Analytics
mforns added a comment to T230743: Create a repository and user for Product Analytics Oozie jobs?.

Can you please reply to the previous comment? Thanks! (Also, what is a "user" in which specific system?)

Thu, Sep 12, 2:39 PM · Repository-Admins, Release-Engineering-Team, Product-Analytics

Wed, Sep 11

mforns added a comment to T229674: Set up automatic deletion for netflow datasource in Druid.

That patch should do the trick,
but we should wait about 2 months before merging.
Netflow data from 90 days ago still has the old schema and would produce useless and confusing data.
In 60 days, we can merge this and will hopefully work.

Wed, Sep 11, 8:20 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T229674: Set up automatic deletion for netflow datasource in Druid from In Progress to In Code Review on the Analytics-Kanban board.
Wed, Sep 11, 8:19 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T229436: Add --skip-trash arg to refinery-drop-older-than calls in data_purge.pp from Ready to Deploy to Done on the Analytics-Kanban board.
Wed, Sep 11, 4:02 PM · Analytics-Kanban, Analytics

Mon, Sep 9

mforns added a comment to T232226: wmf_netflow cube in Turnilo missing bytes and packets measures.

I'm not exactly sure how to fix, but perhaps the Turnilo config needs an explicit wmf_netflow dataCube declared?

Mon, Sep 9, 12:58 PM · Patch-For-Review, Analytics-Kanban, Analytics

Wed, Sep 4

mforns added a comment to T231874: Rename oozie edit_hourly job.

I thought the 'hourly' in pageview_hourly meant aggregated hourly, not updated hourly.
In general I would name a data set after what does it contain, rather than how it is processed or when it is updated.
Now, edit_hourly is partitioned by snapshot, not by hour. So it's structurally different from pageview_hourly.
We could mirror that in the name. Maybe edit_history_hourly? To be a bit shorter than edits_history_aggregated_hourly?
Question: Should we have the 's' at the end of edits or not? I didn't put it there because other Hive data sets seem to lean towards the singular word.

Wed, Sep 4, 12:57 PM · Analytics
mforns added a comment to T224459: Recommend the best format to release public data lake as a dump.

I reviewed the survey responses yesterday, both on survey results and comments in the Phabricator task(s).
There were good insights!

Wed, Sep 4, 12:24 PM · Research, Analytics

Mon, Sep 2

mforns moved T229436: Add --skip-trash arg to refinery-drop-older-than calls in data_purge.pp from In Progress to In Code Review on the Analytics-Kanban board.
Mon, Sep 2, 6:04 PM · Analytics-Kanban, Analytics
mforns moved T229436: Add --skip-trash arg to refinery-drop-older-than calls in data_purge.pp from Next Up to In Progress on the Analytics-Kanban board.
Mon, Sep 2, 6:01 PM · Analytics-Kanban, Analytics

Wed, Aug 28

mforns added a comment to T231339: Set up automatic deletion for netflow data set in Hive.

@Nuria

We can do it but let's tackle that once we have done the dataset releases we have as high priority for this quarter. Does that sound good?

Sure! Makes sense.

Wed, Aug 28, 12:34 PM · Analytics
mforns added a comment to T230963: Turnilo: Remove count metric for edit_hourly data cube.

@MNeisler the count metric was removed from Turnilo, let me know if there's any problems. Thanks!

Wed, Aug 28, 12:32 PM · Analytics-Kanban, Analytics
mforns added a comment to T231339: Set up automatic deletion for netflow data set in Hive.

@Nuria, who would be responsible for migrating netflow data set to be ingested into the event pipeline?

Wed, Aug 28, 12:20 PM · Analytics
mforns added a comment to T229674: Set up automatic deletion for netflow datasource in Druid.

@ayounsi
Great, thanks.
I'm not sure if we can change the granularity of the data within a single data set, say have the latest 3 months be minutely, and the rest be 5-minutely. I assume not.
But I will start testing how Druid/Turnilo behave when overriding existing data with new data that does not contain the fields you mentioned.

Wed, Aug 28, 12:16 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Aug 27

mforns moved T230963: Turnilo: Remove count metric for edit_hourly data cube from In Code Review to Done on the Analytics-Kanban board.
Tue, Aug 27, 4:02 PM · Analytics-Kanban, Analytics
mforns moved T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period from In Code Review to Done on the Analytics-Kanban board.
Tue, Aug 27, 3:59 PM · Analytics-Kanban, Analytics
mforns added a comment to T224459: Recommend the best format to release public data lake as a dump.

Cool!

Tue, Aug 27, 3:58 PM · Research, Analytics
mforns removed a project from T231339: Set up automatic deletion for netflow data set in Hive: Analytics-Kanban.
Tue, Aug 27, 3:54 PM · Analytics
mforns added a comment to T231339: Set up automatic deletion for netflow data set in Hive.

So, the idea for solution #2 (see task description) is the following:

Tue, Aug 27, 3:53 PM · Analytics
mforns added a comment to T231339: Set up automatic deletion for netflow data set in Hive.

Please, review this task and see if it makes sense.
I assume from what Luca told me, unless you tell me the opposite, that keeping the data in Hive/HDFS for only 3 months is not enough.

Tue, Aug 27, 3:48 PM · Analytics
mforns created T231339: Set up automatic deletion for netflow data set in Hive.
Tue, Aug 27, 3:46 PM · Analytics
mforns moved T229674: Set up automatic deletion for netflow datasource in Druid from Next Up to In Progress on the Analytics-Kanban board.
Tue, Aug 27, 3:39 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T229674: Set up automatic deletion for netflow datasource in Druid.

Please, review this task and let us know how long would you like to keep the netflow data in Druid/Turnilo.
Or put in another way, how interesting is it to you, to have the netflow data accessible in Druid/Turnilo for a long time?
This task is not about the netflow data in Hive/HDFS, I'll create another one for that :-)

Tue, Aug 27, 3:37 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns renamed T229674: Set up automatic deletion for netflow datasource in Druid from Set up a deletion timer for netflow data set to Set up automatic deletion for netflow datasource in Druid.
Tue, Aug 27, 3:34 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period.

@JAllemandou
I created this page in Wikitech, explains a bit how data_purge.pp works and how the retention period vs timer interval work.
Please, feel free to modify!

Tue, Aug 27, 3:26 PM · Analytics-Kanban, Analytics
mforns moved T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period from Next Up to In Code Review on the Analytics-Kanban board.
Tue, Aug 27, 10:29 AM · Analytics-Kanban, Analytics
mforns added a project to T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period: Analytics-Kanban.
Tue, Aug 27, 10:29 AM · Analytics-Kanban, Analytics

Mon, Aug 26

mforns moved T230963: Turnilo: Remove count metric for edit_hourly data cube from Next Up to In Code Review on the Analytics-Kanban board.
Mon, Aug 26, 9:34 PM · Analytics-Kanban, Analytics
mforns claimed T230963: Turnilo: Remove count metric for edit_hourly data cube.
Mon, Aug 26, 9:34 PM · Analytics-Kanban, Analytics
mforns created T231248: Wikistats2 time related bugs.
Mon, Aug 26, 7:33 PM · Analytics
mforns added a comment to T231111: Access to HUE for cchen.

Cool!

Mon, Aug 26, 6:48 PM · Analytics-Kanban, SRE-Access-Requests, Operations, Analytics
mforns moved T231111: Access to HUE for cchen from Next Up to Done on the Analytics-Kanban board.
Mon, Aug 26, 6:19 PM · Analytics-Kanban, SRE-Access-Requests, Operations, Analytics
mforns added a project to T231111: Access to HUE for cchen: Analytics-Kanban.
Mon, Aug 26, 6:19 PM · Analytics-Kanban, SRE-Access-Requests, Operations, Analytics
mforns added a comment to T231111: Access to HUE for cchen.

@cchen You should be able to access Hue now.
Please, reach out if you have any problems.
Cheers!

Mon, Aug 26, 6:18 PM · Analytics-Kanban, SRE-Access-Requests, Operations, Analytics

Fri, Aug 23

mforns claimed T131280: Make aggregate data on editors per country per wiki publicly available.
Fri, Aug 23, 3:51 PM · Patch-For-Review, Product-Analytics, Analytics-Kanban
mforns moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from In Progress to Paused on the Analytics-Kanban board.
Fri, Aug 23, 3:47 PM · Analytics-Kanban, Research-Backlog, Analytics
mforns added a comment to T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period.
  • IIRC the retention policy is about keeping AT MOST 90 days, so I'd rather keep 65, making sure we always have 2 months of data when the geoeditors job run, and try not to go over instead of having 90 days sure, and delete when there is at most 90+31 = 121 days.
Fri, Aug 23, 11:52 AM · Analytics-Kanban, Analytics

Thu, Aug 22

mforns added a comment to T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period.

We should ensure that at least we keep last 90 days.
And delete the data as soon as possible after that.

Thu, Aug 22, 5:07 PM · Analytics-Kanban, Analytics
mforns created T231017: Geoeditors_private deletion scripts scheduled day conflicts with retention period.
Thu, Aug 22, 3:06 PM · Analytics-Kanban, Analytics

Aug 19 2019

mforns added a project to T230741: Apply hive2-server fix to command line: Product-Analytics.
Aug 19 2019, 5:58 PM · Analytics-Kanban, Product-Analytics, Analytics
mforns added a project to T230742: Ensure Wikitech page about custom jupyter notebooks exists and is up to date: Product-Analytics.
Aug 19 2019, 5:58 PM · Analytics-Kanban, Product-Analytics, Analytics
mforns added a project to T230743: Create a repository and user for Product Analytics Oozie jobs?: Product-Analytics.
Aug 19 2019, 5:58 PM · Repository-Admins, Release-Engineering-Team, Product-Analytics
mforns created T230743: Create a repository and user for Product Analytics Oozie jobs?.
Aug 19 2019, 5:56 PM · Repository-Admins, Release-Engineering-Team, Product-Analytics
mforns created T230742: Ensure Wikitech page about custom jupyter notebooks exists and is up to date.
Aug 19 2019, 5:56 PM · Analytics-Kanban, Product-Analytics, Analytics
mforns moved T229143: Access to HUE for Mayakpwiki from Ops Week to Incoming on the Analytics board.
Aug 19 2019, 5:50 PM · Operations, Analytics
mforns created T230741: Apply hive2-server fix to command line.
Aug 19 2019, 5:50 PM · Analytics-Kanban, Product-Analytics, Analytics
mforns moved T220542: Update R from 3.3.3 to 3.6.0 on stat and notebook machines from Operational Excellence to Incoming on the Analytics board.
Aug 19 2019, 5:45 PM · Analytics, Product-Analytics
mforns moved T212591: Provide Python 3.6+ on SWAP from Jupyter to Incoming on the Analytics board.
Aug 19 2019, 5:45 PM · Analytics, Analytics-SWAP, Contributors-Analysis, Product-Analytics

Aug 14 2019

mforns moved T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar from Ready to Deploy to Done on the Analytics-Kanban board.
Aug 14 2019, 4:12 PM · Analytics-Kanban, Analytics

Aug 8 2019

mforns added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.

Is that in general or for longer retention time? Maybe we can store aggregated data (without source/dest IP) for long term storage?
We can also use IP prefixes instead of IP addresses, less ideal for us and I don't know if it helps much with the cardinality issue.

Aug 8 2019, 2:12 PM · Patch-For-Review, Analytics-Kanban, Analytics

Aug 7 2019

mforns added a comment to T229682: Add more dimensions to netflow's druid ingestion specs.
  • ip_src - ip address cardinality
  • ip_dst - ip address cardinality
Aug 7 2019, 2:02 PM · Patch-For-Review, Analytics-Kanban, Analytics

Aug 6 2019

mforns moved T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Aug 6 2019, 3:55 PM · Analytics-Kanban, Analytics

Aug 2 2019

mforns moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from Paused to In Progress on the Analytics-Kanban board.
Aug 2 2019, 7:01 PM · Analytics-Kanban, Research-Backlog, Analytics
mforns moved T226862: Make timers that delete data use the new deletion script from Ready to Deploy to Done on the Analytics-Kanban board.
Aug 2 2019, 7:00 PM · Analytics-Kanban, Analytics
mforns moved T225314: Load Netflow to Druid from In Progress to Done on the Analytics-Kanban board.
Aug 2 2019, 7:00 PM · Analytics-Kanban, Analytics
mforns moved T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar from In Progress to In Code Review on the Analytics-Kanban board.
Aug 2 2019, 7:00 PM · Analytics-Kanban, Analytics
mforns created T229674: Set up automatic deletion for netflow datasource in Druid.
Aug 2 2019, 4:19 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar from Next Up to In Progress on the Analytics-Kanban board.
Aug 2 2019, 3:38 PM · Analytics-Kanban, Analytics
mforns added a project to T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar: Analytics-Kanban.
Aug 2 2019, 3:38 PM · Analytics-Kanban, Analytics
mforns created T229669: Oozie queries that use 'reflect("org.json.simple.JSONObject"...' need refinery_hive jar.
Aug 2 2019, 3:31 PM · Analytics-Kanban, Analytics

Aug 1 2019

mforns moved T229254: API Request for unique devices for all wikipedia families is only showing data up to November 2018 from Ready to Deploy to Done on the Analytics-Kanban board.
Aug 1 2019, 3:38 PM · Patch-For-Review, Analytics, Analytics-Kanban

Jul 31 2019

mforns created T229436: Add --skip-trash arg to refinery-drop-older-than calls in data_purge.pp.
Jul 31 2019, 3:19 PM · Analytics-Kanban, Analytics
mforns moved T226862: Make timers that delete data use the new deletion script from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jul 31 2019, 1:24 PM · Analytics-Kanban, Analytics
mforns moved T229254: API Request for unique devices for all wikipedia families is only showing data up to November 2018 from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jul 31 2019, 1:24 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns moved T226514: Map doesn't redraw when returning from table view from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jul 31 2019, 1:24 PM · Analytics-Kanban, Analytics

Jul 30 2019

mforns claimed T229254: API Request for unique devices for all wikipedia families is only showing data up to November 2018 .
Jul 30 2019, 4:03 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns moved T229254: API Request for unique devices for all wikipedia families is only showing data up to November 2018 from Next Up to In Code Review on the Analytics-Kanban board.
Jul 30 2019, 4:02 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns added a comment to T229254: API Request for unique devices for all wikipedia families is only showing data up to November 2018 .

Both daily and monthly unique devices per project family are backfilled now.
It remains to merge and deploy changes to the queries and restart the bundle.

Jul 30 2019, 4:01 PM · Patch-For-Review, Analytics, Analytics-Kanban

Jul 29 2019

mforns moved T208612: Release edit data lake data as a public json dump /mysql dump, other? from In Progress to Paused on the Analytics-Kanban board.
Jul 29 2019, 2:52 PM · Analytics-Kanban, Research-Backlog, Analytics

Jul 25 2019

mforns moved T225314: Load Netflow to Druid from Paused to Done on the Analytics-Kanban board.
Jul 25 2019, 3:29 PM · Analytics-Kanban, Analytics
mforns added a comment to T228982: Deletion of limn-edit-data repository.

The limn-edit-data/edit/ folder contains several RU queries and config, however they are not currently scheduled in puppet for execution.
I think those are the reports used in the old compare Wikitext vs VisualEditor dashboard that we disabled a couple years ago.
I don't think we'll use those again, but let @Jdforrester-WMF decide.

Jul 25 2019, 11:12 AM · Cleanup, Editing-team, Analytics
mforns updated subscribers of T228979: Deletion of limn-ee-data repository.

The limn-ee-data/ee-migration folder contains several RU queries and config, but they are not currently scheduled for execution in puppet.
I don't know if we can delete them though, maybe @Catrope knows?

Jul 25 2019, 11:01 AM · Analytics
mforns added a comment to T225314: Load Netflow to Druid.

Just for the record, @elukey and I looked into this, and we confirmed that we can merge the puppet patch that will launch the druid loading job.

Jul 25 2019, 10:50 AM · Analytics-Kanban, Analytics

Jul 23 2019

mforns moved T215863: Coarse alarm on data quality for refined data based on entrophy calculations from In Progress to Paused on the Analytics-Kanban board.
Jul 23 2019, 5:00 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T215863: Coarse alarm on data quality for refined data based on entrophy calculations from Paused to In Progress on the Analytics-Kanban board.
Jul 23 2019, 4:10 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T225314: Load Netflow to Druid from Ready to Deploy to Paused on the Analytics-Kanban board.
Jul 23 2019, 4:09 PM · Analytics-Kanban, Analytics

Jun 28 2019

mforns moved T226862: Make timers that delete data use the new deletion script from Next Up to In Code Review on the Analytics-Kanban board.
Jun 28 2019, 7:52 PM · Analytics-Kanban, Analytics
mforns created T226862: Make timers that delete data use the new deletion script.
Jun 28 2019, 7:16 PM · Analytics-Kanban, Analytics
mforns moved T226835: Fix Hive partition thresholding in refinery-drop-older-than from In Code Review to Done on the Analytics-Kanban board.
Jun 28 2019, 7:12 PM · Analytics-Kanban, Analytics
mforns moved T226835: Fix Hive partition thresholding in refinery-drop-older-than from Next Up to In Code Review on the Analytics-Kanban board.
Jun 28 2019, 2:39 PM · Analytics-Kanban, Analytics
mforns created T226835: Fix Hive partition thresholding in refinery-drop-older-than.
Jun 28 2019, 2:31 PM · Analytics-Kanban, Analytics

Jun 27 2019

mforns added a comment to T215863: Coarse alarm on data quality for refined data based on entrophy calculations.

The README says that Prometheus itself if it doesn't see a metric for 5 minutes it'll think it is stale, however a metric pushed to the pushgateway will stay there until deleted, so Prometheus will never think the metric is stale when it pulls metrics from the pushgateway. With Graphite / statsd you push the metric and that's it, if there are no datapoints the metric will have holes where there haven't been pushes.

Oh, I see! Thanks for the clarification.

Jun 27 2019, 12:14 PM · Patch-For-Review, Analytics-Kanban, Analytics

Jun 26 2019

mforns added a comment to T200070: Wikistats2: Values in map view show unnecessary decimal digits.

Should we do these changes for the dashboard as well?

I think we should! And everywhere in Wikistats, no?
Maybe, we could factor this out into a single place that affects all the app?

Jun 26 2019, 3:16 PM · Analytics-Kanban, Analytics, Analytics-Wikistats
mforns added a comment to T200070: Wikistats2: Values in map view show unnecessary decimal digits.

Going with https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html I think 50.3M should be probably 50M? and 50.6 M gets shown as 51M?

I think, philosophically, 3 significant digits (50.3M) is more coherent with the fact that we already are simplifying big numbers by way of K, M, etc. abreviations.
Right now, we simplify 534208 to 534K (3 significant digits).
If we did only 2 significant digits, 534805 would rather be simplified to 530K, right?
So following this rule, we can apply the same to numbers that acquire a decimal part, no? 50345719 -> 50.3M, 4378452 -> 4.38M
That said... practically, I think both 2-significant-digits and 3-significant-digits are good for the Wikistats2 case.

Jun 26 2019, 3:14 PM · Analytics-Kanban, Analytics, Analytics-Wikistats
mforns added a comment to T215863: Coarse alarm on data quality for refined data based on entrophy calculations.

@fgiunchedi thanks a lot for the help!

Jun 26 2019, 2:22 PM · Patch-For-Review, Analytics-Kanban, Analytics

Jun 24 2019

mforns moved T225232: Backfill EL new schemas sanitization after ownership issue fixed from In Progress to Done on the Analytics-Kanban board.
Jun 24 2019, 7:10 PM · Analytics-Kanban, Analytics