mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (219 w, 1 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Unknown

Recent Activity

Fri, Jan 18

mforns added a project to T214136: event_pageissues Turnilo view contains no valid data from before January 5: Analytics-Kanban.
Fri, Jan 18, 3:26 PM · Analytics-Kanban, Page-Issue-Warnings, Analytics
mforns added a comment to T214136: event_pageissues Turnilo view contains no valid data from before January 5.

@Tbayer Thanks for the heads up. I think I know what happened.

Fri, Jan 18, 3:25 PM · Analytics-Kanban, Page-Issue-Warnings, Analytics
mforns claimed T214136: event_pageissues Turnilo view contains no valid data from before January 5.
Fri, Jan 18, 3:07 PM · Analytics-Kanban, Page-Issue-Warnings, Analytics

Thu, Jan 17

mforns moved T213969: Citation Usage: run third round of data collection from Incoming to Radar on the Analytics board.
Thu, Jan 17, 6:14 PM · Analytics, Research, Knowledge-Integrity, Epic
mforns moved T213996: New MongoDB version is not DFSG-compatible, dropped by Debian from Incoming to Radar on the Analytics board.
Thu, Jan 17, 6:14 PM · VisualEditor, Software-Licensing, Performance-Team, Operations
mforns moved T214057: Broken disk on analytics1056 from Incoming to Radar on the Analytics board.
Thu, Jan 17, 6:13 PM · ops-eqiad, Operations, Patch-For-Review, Analytics
mforns moved T213566: Transferring data from Hadoop to production MySQL database from Incoming to Operational Excellence on the Analytics board.
Thu, Jan 17, 6:13 PM · User-Marostegui, Operations, Article-Recommendation, Analytics, Research
mforns removed projects from T213996: New MongoDB version is not DFSG-compatible, dropped by Debian: Analytics, Analytics-EventLogging.
Thu, Jan 17, 6:06 PM · VisualEditor, Software-Licensing, Performance-Team, Operations
mforns added a comment to T213996: New MongoDB version is not DFSG-compatible, dropped by Debian.

We do not use MongoDB in EventLogging production. Thanks for the heads up. Removing EventLogging and Analytics.

Thu, Jan 17, 6:05 PM · VisualEditor, Software-Licensing, Performance-Team, Operations
mforns added projects to T214057: Broken disk on analytics1056: Operations, ops-eqiad.
Thu, Jan 17, 6:04 PM · ops-eqiad, Operations, Patch-For-Review, Analytics
mforns moved T213748: swap a2-eqiad PDU with on-site spare from Incoming to Radar on the Analytics board.
Thu, Jan 17, 6:03 PM · Patch-For-Review, DBA, Analytics, ops-eqiad, Operations
mforns lowered the priority of T213923: Create staging environment for superset from High to Normal.
Thu, Jan 17, 6:02 PM · Analytics-Kanban, Analytics
mforns triaged T213741: Use MaxMind DB in piwik geo-location as Low priority.
Thu, Jan 17, 6:01 PM · Analytics
mforns triaged T213770: Remove Zero support in analytics as Normal priority.
Thu, Jan 17, 5:59 PM · Analytics-Kanban, Technical-Debt, Analytics
mforns assigned T213770: Remove Zero support in analytics to JAllemandou.
Thu, Jan 17, 5:59 PM · Analytics-Kanban, Technical-Debt, Analytics
mforns triaged T213800: [Wikistats v2] Default selection for (active) editors is confusing for inexperienced users as Normal priority.
Thu, Jan 17, 5:58 PM · Analytics, Analytics-Wikistats
mforns triaged T213910: Add user_properties mysql table data to hadoop cluster as Normal priority.
Thu, Jan 17, 5:53 PM · Analytics
mforns added a comment to T213910: Add user_properties mysql table data to hadoop cluster.

user_properties is not the best case for one-off sqoop, because it is constantly updated.
We would benefit from a real time approach, but this is not going to happen in the near future.

Thu, Jan 17, 5:52 PM · Analytics
mforns moved T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production from Incoming to Operational Excellence on the Analytics board.
Thu, Jan 17, 5:51 PM · Discovery, Analytics
mforns triaged T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production as Normal priority.
Thu, Jan 17, 5:50 PM · Discovery, Analytics

Tue, Jan 15

mforns added a comment to T212493: Clean up staging db.

@elukey Definitely the tables prefixed with mforns_ can be deleted.

Tue, Jan 15, 3:23 PM · Analytics-Kanban, Analytics
mforns moved T212014: Sanitization should be run a second time from In Progress to In Code Review on the Analytics-Kanban board.
Tue, Jan 15, 9:33 AM · Patch-For-Review, Analytics, Analytics-Kanban

Fri, Jan 11

mforns moved T208332: Add EditAttemptStep properties to the schema whitelist from Next Up to In Code Review on the Analytics-Kanban board.
Fri, Jan 11, 9:50 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Patch-For-Review, Growth-Team, Product-Analytics
mforns added a project to T208332: Add EditAttemptStep properties to the schema whitelist: Analytics-Kanban.
Fri, Jan 11, 9:50 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Patch-For-Review, Growth-Team, Product-Analytics

Wed, Jan 9

mforns created T213290: Add Chinese Wikiversity edit-related metrics to Wikistats 2.
Wed, Jan 9, 3:31 PM · Chinese-Sites, Analytics-Kanban, Patch-For-Review, Analytics

Tue, Jan 8

mforns moved T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts from Ready to Deploy to Done on the Analytics-Kanban board.
Tue, Jan 8, 4:21 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T208332: Add EditAttemptStep properties to the schema whitelist.

The white-list patch above (merged on Dec 11th) is perfectly fine. The problem is we Analytics missed to deploy refinery and activate the changes at that moment, sorry for that. A deployment of refinery was made on Jan 6th, and since then the sanitized events are being written into event_sanitized.editattemptstep correctly. Please check that data looks good to you.

Tue, Jan 8, 3:41 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Patch-For-Review, Growth-Team, Product-Analytics
mforns added a comment to T208332: Add EditAttemptStep properties to the schema whitelist.

@Neil_P._Quinn_WMF looking into this right now.

Tue, Jan 8, 1:32 PM · Analytics-Kanban, Analytics-Data-Quality, Analytics, Patch-For-Review, Growth-Team, Product-Analytics

Mon, Jan 7

mforns moved T202429: [EL sanitization] Make cron send alert emails if job fails before calling refine from Ready to Deploy to Done on the Analytics-Kanban board.
Mon, Jan 7, 7:03 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban, Analytics
mforns moved T209050: Print schema is whitelisting both session ids and page ids from Ready to Deploy to Done on the Analytics-Kanban board.
Mon, Jan 7, 6:58 PM · Analytics-Kanban, Patch-For-Review, Readers-Web-Backlog, Analytics
mforns moved T209822: Add new wikis to analytics from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Mon, Jan 7, 6:56 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Mon, Jan 7, 3:53 PM · Patch-For-Review, Analytics-Kanban, Analytics

Dec 20 2018

mforns added a comment to T212451: Create Spark code to compare DateTimes with partition columns.

Also, seems that calling UDFs from Spark SQL is possible no? https://stackoverflow.com/questions/40369170/registering-hive-custom-udf-with-spark-spark-sql-2-0-0

Dec 20 2018, 7:50 PM · Analytics
mforns added a comment to T212451: Create Spark code to compare DateTimes with partition columns.

Note that the example code considers both since and until DateTimes inclusive, we might want to consider whether until should be exclusive?

Dec 20 2018, 7:49 PM · Analytics
mforns created T212451: Create Spark code to compare DateTimes with partition columns.
Dec 20 2018, 7:48 PM · Analytics

Dec 19 2018

mforns added a comment to T211833: [BUG] userAgent missing from all EventLogging analytics Hive tables between 2018-11-29 and 2018-11-14.

@Ottomata Coool, will do.

Dec 19 2018, 4:28 PM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Analytics
mforns moved T212014: Sanitization should be run a second time from Next Up to In Progress on the Analytics-Kanban board.
Dec 19 2018, 4:07 PM · Patch-For-Review, Analytics, Analytics-Kanban

Dec 14 2018

mforns moved T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts from In Progress to In Code Review on the Analytics-Kanban board.
Dec 14 2018, 3:27 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Nuria, yes the docs were updated a couple months ago to include that, see: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention_and_auto-purging

Dec 14 2018, 3:14 PM · Analytics-EventLogging, Analytics-Kanban

Dec 13 2018

mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Miriam Yes, Monday is good! Thanks :]

Dec 13 2018, 3:28 PM · Analytics-EventLogging, Analytics-Kanban

Dec 12 2018

mforns added a comment to T208589: [EventLoggingToDruid] Add support for ingesting subfields of map columns.

Hm! good point...
I think part of it has been solved by the recent changes in T210099.
Namely there was a bug in accessing capsule fields that had underscores in them, like geocoded_data. This is solved.
However, there's still some additions needed:

Dec 12 2018, 8:30 PM · Analytics
mforns added a comment to T207207: [EL2Druid] Make RefineTarget compatible with Druid and use it from EventLoggingToDruid.

@Nuria
Are you referencing the "double ingestion" (first hourly, then after a couple days daily), that is supposed to reduce the backfilling ingestion problems?
If so, I believe we agreed that was an interim solution that would buy us time to develop this task, no?

Dec 12 2018, 8:20 PM · Analytics

Dec 11 2018

mforns moved T199836: [EL sanitization] Write and productionize script to drop partitions older than 90 days in events database from In Code Review to Done on the Analytics-Kanban board.
Dec 11 2018, 4:06 PM · Patch-For-Review, Analytics, Analytics-Kanban

Dec 10 2018

mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Oh, and if you guys need any help copying/formatting that data, you can ping me and I'll try to help.

Dec 10 2018, 7:29 PM · Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@leila and @Miriam, can you please leave me a couple days to execute the deletion script before the Christmas vacation kicks in (end of quarter)? Thanks!

Dec 10 2018, 7:28 PM · Analytics-EventLogging, Analytics-Kanban

Dec 7 2018

mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Hi @leila! Have you guys copied the data you need? Thanks

Dec 7 2018, 3:12 PM · Analytics-EventLogging, Analytics-Kanban

Dec 3 2018

mforns created T211036: Give access to Superset to Pau.
Dec 3 2018, 6:03 PM · Analytics

Nov 30 2018

mforns moved T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts from Next Up to In Progress on the Analytics-Kanban board.
Nov 30 2018, 4:28 PM · Patch-For-Review, Analytics-Kanban, Analytics

Nov 29 2018

mforns added a comment to T189475: Identify common abuse filters that affect translations.

@Amire80 Oh! No, the table you created will not be accessible on the web. What is accessible on the web are reports created by RU. As we used RU in this alternate way to insert data into another database, that data will not be copied over any report, thus not appearing on the web.

Nov 29 2018, 11:18 AM · Language-Team (Language-2019-January-March), CX-analytics

Nov 28 2018

mforns added a comment to T189475: Identify common abuse filters that affect translations.

I just checked and data looks good in analytics-slave::staging.cx_abuse_filter_daily.

Nov 28 2018, 3:30 PM · Language-Team (Language-2019-January-March), CX-analytics

Nov 25 2018

mforns added a comment to T210297: Auto-redirect to HTTPS.

@MusikAnimal Indeed! He passed me an http link that I clicked to check.
Now, using https, all appears fine.
Thank you!

Nov 25 2018, 12:26 PM · Tool-Pageviews, Community-Tech

Nov 23 2018

mforns added a comment to T189475: Identify common abuse filters that affect translations.

I think this looks really good!
I might be missing some detail, but seems that it would work.

Nov 23 2018, 4:55 PM · Language-Team (Language-2019-January-March), CX-analytics
mforns created T210297: Auto-redirect to HTTPS.
Nov 23 2018, 3:33 PM · Tool-Pageviews, Community-Tech
mforns updated subscribers of T202429: [EL sanitization] Make cron send alert emails if job fails before calling refine.

Yesterday @JAllemandou and I discovered a bug in profig, the library that ConfigHelper uses to parse property files.
I created an issue on their github repo: https://github.com/outr/profig/issues/24

Nov 23 2018, 12:37 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban, Analytics
mforns moved T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Nov 23 2018, 12:35 PM · Patch-For-Review, Analytics-Kanban, Analytics

Nov 21 2018

mforns moved T202429: [EL sanitization] Make cron send alert emails if job fails before calling refine from In Progress to In Code Review on the Analytics-Kanban board.
Nov 21 2018, 10:52 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban, Analytics
mforns moved T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive from Next Up to In Progress on the Analytics-Kanban board.
Nov 21 2018, 9:38 PM · Analytics-EventLogging, Analytics-Kanban
mforns moved T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine from Next Up to In Code Review on the Analytics-Kanban board.
Nov 21 2018, 9:37 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns renamed T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine from [EventLogging Sanitization] Fix Refine parameters and cdh jars to unbreak production to [EventLogging Sanitization] Fix passing of input_path_regex params to Refine.
Nov 21 2018, 9:36 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine.
Nov 21 2018, 9:29 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns created T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine.
Nov 21 2018, 9:27 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T203669: Return to real time banner impressions in Druid.

@AndyRussG
We discussed in our daily meeting about this, and decided to modify our codebase to adapt to your needs, so that we can ingest 1/sampleRate as a new Druid measure.
So, you would not need to add anything to the schema. See; T210099

Nov 21 2018, 6:03 PM · Patch-For-Review, Fundraising-Backlog, Analytics-Kanban, User-Elukey, Analytics
mforns added a comment to T203669: Return to real time banner impressions in Druid.

@elukey

Confirmed that it works! I used recordImpressionEventSampleRate as measure, everything works like a charm (caveat: the datasource in turnilo needs to be set with no introspection).

I think you can set introspection: autofill-dimensions-only and then at least you don't need to configure dimensions, only measures (that are few..)

Nov 21 2018, 5:12 PM · Patch-For-Review, Fundraising-Backlog, Analytics-Kanban, User-Elukey, Analytics

Nov 20 2018

mforns reassigned T209179: Update log_namespace, page_namespace from bigint to int from mforns to JAllemandou.
Nov 20 2018, 4:05 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T202429: [EL sanitization] Make cron send alert emails if job fails before calling refine from In Code Review to In Progress on the Analytics-Kanban board.
Nov 20 2018, 4:05 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban, Analytics
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

On the other hand, delaying the sanitization significantly sounds like it would be a big annoyance.

Makes, sense. We'll take this into account.

Nov 20 2018, 3:39 PM · Analytics-EventLogging, Analytics-Kanban

Nov 15 2018

mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Tbayer

Just to double-check: The information in the documentation that "Sanitization happens right after events are generated (with a couple hours lag)" is still current, right? In that case I don't think this will be a concern (although we will need to update some queries - CCing @Groceryheist regarding ReadingDepth).

Nov 15 2018, 2:02 AM · Analytics-EventLogging, Analytics-Kanban

Nov 14 2018

mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Yup, keeping time range as a filter, but also potentially dropping other fields which we may not need, if any. I will coordinate with Miriam off-this-task and we will give you the clear signal soon.

Nov 14 2018, 10:00 PM · Analytics-EventLogging, Analytics-Kanban
mforns updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Also, @Neil_P._Quinn_WMF, @nettrom_WMF and @chelsyx, please check out this task. I don't recall there was any pending issue on your side before we can proceed, but just in case. Thanks!

Nov 14 2018, 8:06 PM · Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Also, @mpopov, we should probably fix the white-list to include the recent (or any other) renames to EL schema fields T209087, and backfill sanitization before we activate the purging script. Otherwise, data will be lost for those renamed fields.

Nov 14 2018, 8:03 PM · Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@leila Yes, makes sense to me! When you say "copy specific parts of the table" you mean specific time ranges, no? Sure, let's do that.

Nov 14 2018, 7:52 PM · Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@mpopov @Tbayer @leila @bmansurov
Oh forgot... Please, feel free to subscribe other people that you think might be interested in participating in this discussion to this task. Thanks!

Nov 14 2018, 5:00 PM · Analytics-EventLogging, Analytics-Kanban
mforns updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@leila @bmansurov
Hi! I'd also like to confirm with you guys that it's OK to activate the script that will delete all events older than 90 days from Hive's event database (unsanitized), so you'll be left only with the sanitized version of it in event_sanitized database. I believe the main point we wan to discuss here is how we keep unsanitized CitationUsage events while we figure out a way to handle those with Legal. On my end, I'd be happy to white-list all fields temporarily, given that we continue an active conversation with Legal to solve this in the short term. Would that be OK with you? Do you see any other concerns in activating the script regarding EL data belonging to the Research team?

Nov 14 2018, 4:57 PM · Analytics-EventLogging, Analytics-Kanban
mforns updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@mpopov @Tbayer
Hi! I'd like to confirm that you guys are OK with us activating the script that will delete all events older than 90 days in Hive's 'event' database (unsanitized data).
As we discussed in earlier threads:

  • All instances of the app_install_id field are being kept indefinitely in a sanitized form: salted hash with rotating salt every 3 months, see: T198426, T199902.
  • As requested per @mpopov, the old salt is being kept for 2 extra weeks after salt rotation (end of quarter) to allow for consistent backfilling of the event_sanitized database in case of issues: T199899, T199900.
Nov 14 2018, 4:47 PM · Analytics-EventLogging, Analytics-Kanban
mforns triaged T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive as Normal priority.
Nov 14 2018, 4:34 PM · Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T196066: Add prometheus metrics for varnishkafka instances running on caching hosts.

Thank you @elukey!

Nov 14 2018, 2:53 PM · Analytics-Kanban, Traffic, Operations, Analytics

Nov 13 2018

mforns moved T196066: Add prometheus metrics for varnishkafka instances running on caching hosts from Next Up to In Progress on the Analytics-Kanban board.
Nov 13 2018, 2:18 PM · Analytics-Kanban, Traffic, Operations, Analytics
mforns moved T199836: [EL sanitization] Write and productionize script to drop partitions older than 90 days in events database from In Progress to In Code Review on the Analytics-Kanban board.
Nov 13 2018, 2:18 PM · Patch-For-Review, Analytics, Analytics-Kanban

Nov 12 2018

mforns renamed T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas from [EventLogging Sanitization] Update EL sanitization whit-elist for field renames in EL schemas to [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.
Nov 12 2018, 5:22 PM · Product-Analytics, Reading-analysis, Analytics

Nov 9 2018

mforns added a comment to T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.

@Tbayer
I should have looked who that schema's maintainer was, sorry for that.
I intended it just as a heads up, and thought of you, given that you've been our main point of contact in the past, regarding EL Reading schemas in general.
Please, feel free to reasign the task! I also added other analysts to the task so that they can chime in.

Nov 9 2018, 4:59 PM · Product-Analytics, Reading-analysis, Analytics

Nov 8 2018

mforns updated subscribers of T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.
Nov 8 2018, 6:52 PM · Product-Analytics, Reading-analysis, Analytics
mforns created T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.
Nov 8 2018, 6:51 PM · Product-Analytics, Reading-analysis, Analytics

Nov 6 2018

mforns created T208872: [EventLoggingToDruid] Add explicit types to numeric dimensions so that they are ingested as such.
Nov 6 2018, 6:04 PM · Patch-For-Review, Analytics-Kanban, Analytics

Nov 2 2018

mforns added a subtask for T203669: Return to real time banner impressions in Druid: T208589: [EventLoggingToDruid] Add support for ingesting subfields of map columns.
Nov 2 2018, 2:14 PM · Patch-For-Review, Fundraising-Backlog, Analytics-Kanban, User-Elukey, Analytics
mforns added a parent task for T208589: [EventLoggingToDruid] Add support for ingesting subfields of map columns: T203669: Return to real time banner impressions in Druid.
Nov 2 2018, 2:14 PM · Analytics
mforns created T208589: [EventLoggingToDruid] Add support for ingesting subfields of map columns.
Nov 2 2018, 2:14 PM · Analytics
mforns added a comment to T189475: Identify common abuse filters that affect translations.

Oh, so I should just simply create my own table like that, in an SQL script scheduled with report updater? I thought I'd need to do it with a DBA or something :)

Nov 2 2018, 12:04 PM · Language-Team (Language-2019-January-March), CX-analytics

Oct 29 2018

mforns added a comment to T189475: Identify common abuse filters that affect translations.

I think it looks good to start!

Oct 29 2018, 8:23 PM · Language-Team (Language-2019-January-March), CX-analytics

Oct 26 2018

mforns added a comment to T189475: Identify common abuse filters that affect translations.

@Amire80 In one of our last stand/up meetings, we brought up this task, and some of our team members recalled that Superset was not working properly with labs MySQL databases. We are making sure that's true. @JAllemandou, you said log db was working for you in Superset?

Oct 26 2018, 10:38 AM · Language-Team (Language-2019-January-March), CX-analytics

Oct 23 2018

mforns moved T206342: Finalize eventlogging to druid ingestion from In Code Review to In Progress on the Analytics-Kanban board.
Oct 23 2018, 2:03 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns renamed T206342: Finalize eventlogging to druid ingestion from Finalize eventlogging to druid ingestion with a whitelist instead of a blacklist to Finalize eventlogging to druid ingestion.
Oct 23 2018, 2:03 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns moved T166414: Explore NavigationTiming by faceted properties - EventLogging refine from In Code Review to Done on the Analytics-Kanban board.
Oct 23 2018, 2:02 PM · Analytics-Kanban, Performance-Team (Radar), Analytics, Patch-For-Review
mforns moved T205562: Ingest data aggregate ReadingDepth data into Druid from In Code Review to Done on the Analytics-Kanban board.
Oct 23 2018, 2:02 PM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T202751: Ingest data from PageIssues EventLogging schema into Druid from Ready to Deploy to Done on the Analytics-Kanban board.
Oct 23 2018, 2:02 PM · Patch-For-Review, Analytics-Kanban, Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
mforns added a comment to T205562: Ingest data aggregate ReadingDepth data into Druid .

I backfilled the last 3 months of data. This is now productionized!
Data will continue to be imported automatically every hour
(with a 5 hour lag to allow for previous collection and refinement of EL events into Hive).
Next steps are:

  • Write a comprehensive documentation about EventLoggingToDruid ingestion.
  • Remove the confusing Count metric from the datasource in Turnilo, or at least uncheck it by default (and make the default the actual eventCount).
  • Try to add a new metric to the datasource, eventCountPercentage, that normalizes eventCount splits by the total aggregate, so that time measure buckets become percentage-of-total values, instead of frequencies. This way they will not vary with throughput changes or seasonality, and will be a lot easier to follow. (not sure if this will be possible, though)

In any case these items will not be part of this task, I will tackle them as part of T206342.
Will move this task to Done in Analytics-Kanban.
Cheers!

Oct 23 2018, 2:02 PM · Readers-Web-Backlog (Tracking), Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T202751: Ingest data from PageIssues EventLogging schema into Druid.

I backfilled the last 3 months of data. This is now productionized!
Data will continue to be imported automatically every hour
(with a 5 hour lag to allow for previous collection and refinement of EL events into Hive).
Next steps are:

  • Write a comprehensive documentation about EventLoggingToDruid ingestion.
  • Remove the confusing Count metric from the datasource in Turnilo, or at least uncheck it by default (and make the default the actual eventCount).
  • Try to add a new metric to the datasource, eventCountPercentage, that normalizes eventCount splits by the total aggregate, so that time measure buckets become percentage-of-total values, instead of frequencies. This way they will not vary with throughput changes or seasonality, and will be a lot easier to follow. (not sure if this will be possible, though)

In any case these items will not be part of this task, I will tackle them as part of T206342.
Will move this task to Done in Analytics-Kanban.
Cheers!

Oct 23 2018, 2:02 PM · Patch-For-Review, Analytics-Kanban, Reading-analysis, Product-Analytics, Readers-Web-Backlog (Tracking), Page-Issue-Warnings, Analytics
mforns added a comment to T166414: Explore NavigationTiming by faceted properties - EventLogging refine.

I backfilled the last 3 months of data. This is now productionized!
Data will continue to be imported automatically every hour
(with a 5 hour lag to allow for previous collection and refinement of EL events into Hive).
Next steps are:

  • Write a comprehensive documentation about EventLoggingToDruid ingestion.
  • Remove the confusing Count metric from the datasource in Turnilo, or at least uncheck it by default (and make the default the actual eventCount).
  • Try to add a new metric to the datasource, eventCountPercentage, that normalizes eventCount splits by the total aggregate, so that time measure buckets become percentage-of-total values, instead of frequencies. This way they will not vary with throughput changes or seasonality, and will be a lot easier to follow. (not sure if this will be possible, though)

In any case these items will not be part of this task, I will tackle them as part of T206342.
Will move this task to Done in Analytics-Kanban.
Cheers!

Oct 23 2018, 2:01 PM · Analytics-Kanban, Performance-Team (Radar), Analytics, Patch-For-Review
mforns renamed T206342: Finalize eventlogging to druid ingestion from Parametize eventlogging to druid ingestion with a whitelist instead of a blacklist to Finalize eventlogging to druid ingestion with a whitelist instead of a blacklist.
Oct 23 2018, 2:00 PM · Patch-For-Review, Analytics, Analytics-Kanban

Oct 19 2018

mforns added a project to T196066: Add prometheus metrics for varnishkafka instances running on caching hosts: Analytics-Kanban.
Oct 19 2018, 3:19 PM · Analytics-Kanban, Traffic, Operations, Analytics
mforns moved T206342: Finalize eventlogging to druid ingestion from Done to In Code Review on the Analytics-Kanban board.
Oct 19 2018, 2:41 PM · Patch-For-Review, Analytics, Analytics-Kanban