Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (236 w, 5 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Unknown

Recent Activity

Yesterday

mforns added a comment to T219323: Add additional dimensions to edits_hourly in Turnilo and Superset .

Hi @MNeisler and @Neil_P._Quinn_WMF, thank you for all the feedback.

Wed, May 22, 1:30 PM · Analytics, Product-Analytics

Mon, May 20

mforns moved T190840: EventLogging requests we get from non-wiki* hostnames or apps should be filtered at refine time from Paused to In Progress on the Analytics-Kanban board.
Mon, May 20, 7:20 PM · Patch-For-Review, Analytics-Kanban, Analytics-Data-Quality, Analytics

Fri, May 17

mforns added a comment to T217271: Some event data (like the one that comes from mediawiki events such us revision create) should not get sanitized.

Idea looks fine but I do not think it I do not think it will be wise to change naming at this stage.

I think changing naming now will be a indeed bit of work (database, jobs, coordinate deployment, documentation, notify people, etc.).
And it's likely that we mess up and have to do backfilling and such.
But I think the advantages of this approach are also significant:

  • No need of blacklisting of the deletion-after-90-days script (which is dangerous), if it fails, non-EL data could be deleted from event database.
  • No need of blacklisting of sanitization process (not so dangerous, but avoidable).
  • Better organization of data, which would allow for more data sets getting deleted-after-90-days and sanitized in an easier way and avoid confusion overall.
Fri, May 17, 6:39 PM · Analytics
mforns added a comment to T217271: Some event data (like the one that comes from mediawiki events such us revision create) should not get sanitized.

I believe, as Andrew suggested once, that instead of having an "event" db and an "event_sanitized" db, we should have an "event_unsanitized" db and an "event" db.
This way, event_sanitized would only contain only temporary data that will be deleted after 90 days, and the event db will contain final (sanitized if necessary) data.
The data sets that we control and do not need sanitization could be directly ingested into the final event database.
This would make everything easier.

Fri, May 17, 5:02 PM · Analytics

Thu, May 16

mforns triaged T223284: Remove deprecated using schema.* syntax from MultimediaViewer as High priority.
Thu, May 16, 5:08 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), Analytics-EventLogging, Multimedia, Analytics-Kanban, Analytics
mforns triaged T223285: Remove deprecated using schema.* syntax from WikimediaEvents as High priority.
Thu, May 16, 5:08 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), MediaWiki-extensions-WikimediaEvents, Analytics-Kanban, Analytics-EventLogging, Analytics
mforns moved T223284: Remove deprecated using schema.* syntax from MultimediaViewer from Incoming to Operational Excellence on the Analytics board.
Thu, May 16, 5:08 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), Analytics-EventLogging, Multimedia, Analytics-Kanban, Analytics
mforns moved T223285: Remove deprecated using schema.* syntax from WikimediaEvents from Incoming to Operational Excellence on the Analytics board.
Thu, May 16, 5:08 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), MediaWiki-extensions-WikimediaEvents, Analytics-Kanban, Analytics-EventLogging, Analytics
mforns triaged T223286: Remove deprecated using schema.* syntax from WikiEditor as High priority.
Thu, May 16, 5:08 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), WikiEditor, Analytics-Kanban, Analytics-EventLogging, Analytics
mforns moved T223286: Remove deprecated using schema.* syntax from WikiEditor from Incoming to Operational Excellence on the Analytics board.
Thu, May 16, 5:07 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), WikiEditor, Analytics-Kanban, Analytics-EventLogging, Analytics
mforns added a comment to T223387: CamusPartitionChecker email alert should be more descriptive.

We think this is the related change:
https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/510598/

Thu, May 16, 5:07 PM · Analytics-Kanban, Analytics
mforns added a comment to T223387: CamusPartitionChecker email alert should be more descriptive.

@Ottomata, this is done right?

Thu, May 16, 5:06 PM · Analytics-Kanban, Analytics
mforns triaged T223414: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop as High priority.
Thu, May 16, 5:05 PM · Analytics, Analytics-EventLogging
mforns moved T223414: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop from Incoming to Smart Tools for Better Data on the Analytics board.
Thu, May 16, 5:05 PM · Analytics, Analytics-EventLogging
mforns triaged T223444: Update geo-editors job to use tags and report desktop/mobile edits as Normal priority.
Thu, May 16, 5:04 PM · Product-Analytics, Analytics
mforns raised the priority of T223444: Update geo-editors job to use tags and report desktop/mobile edits from Normal to Needs Triage.
Thu, May 16, 5:04 PM · Product-Analytics, Analytics
mforns triaged T223444: Update geo-editors job to use tags and report desktop/mobile edits as Normal priority.
Thu, May 16, 5:03 PM · Product-Analytics, Analytics
mforns added a project to T223444: Update geo-editors job to use tags and report desktop/mobile edits: Product-Analytics.
Thu, May 16, 5:03 PM · Product-Analytics, Analytics

Wed, May 15

mforns moved T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset from In Progress to In Code Review on the Analytics-Kanban board.
Wed, May 15, 4:09 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T220092: Set up edit_hourly data set in Hive.

Added some docs on Wikitech:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Edit_hourly

Wed, May 15, 2:45 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns changed the point value for T220092: Set up edit_hourly data set in Hive from 5 to 13.
Wed, May 15, 2:45 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns updated the task description for T220092: Set up edit_hourly data set in Hive.
Wed, May 15, 2:44 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, May 14

mforns added a comment to T219323: Add additional dimensions to edits_hourly in Turnilo and Superset .

The new datasource is available in Turnilo!
Please, have a look :]

Tue, May 14, 4:00 PM · Analytics, Product-Analytics

Mon, May 13

mforns moved T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset from Ready to Deploy to In Progress on the Analytics-Kanban board.
Mon, May 13, 2:40 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns moved T220092: Set up edit_hourly data set in Hive from Ready to Deploy to In Progress on the Analytics-Kanban board.
Mon, May 13, 2:40 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T190840: EventLogging requests we get from non-wiki* hostnames or apps should be filtered at refine time from In Progress to Paused on the Analytics-Kanban board.
Mon, May 13, 2:11 PM · Patch-For-Review, Analytics-Kanban, Analytics-Data-Quality, Analytics

Thu, May 9

mforns moved T212014: Sanitization should be run a second time from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 9:30 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns moved T212014: Sanitization should be run a second time from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Thu, May 9, 9:29 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns moved T191964: Clickstream dataset for Persian Wikipedia only includes external values from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:49 PM · Analytics-Kanban, Analytics
mforns moved T222425: Fix jobs after mediawiki-history refactor from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:48 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T222603: Fix oozie banner_impression monthly job from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:48 PM · Analytics-Kanban, Analytics
mforns moved T222422: Mandatory success_email_to parameter in mediawiki_history_check coordinator from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:48 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T213770: Remove Zero support in analytics from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:48 PM · Patch-For-Review, Analytics-Kanban, Technical-Debt, Analytics
mforns moved T222460: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, May 9, 6:48 PM · Patch-For-Review, Analytics-Kanban, Analytics

Wed, May 8

mforns added a comment to T209868: Extend CX2 translations graph to show also published translations that need review.

@Amire80
When we use RU for Hive, we have to use a script instead of the query.
That is so, because RU doesn't have yet a Hive client. So we use a bash script that calls hive -e "<query>".
The way RU passes dates (and other params) to the script is different from the way it passes dates to sql files.
In a nutshell, to add a date column in a Hive query (bash script) use:

SELECT
    ...
    '$1' AS date,
    ...

$1 is the first parameter that RU passes to the script, which is the date in question.
You can find this and other infos in the RU documentation:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater
Also, take a look at this example of another Hive-based RU report:
https://github.com/wikimedia/analytics-limn-language-data/blob/master/interlanguage/percent_interlanguage_navigation_curr
You can basically copy the way hive is called (hive -e "..." 2> /dev/null | grep -v parquet.hadoop).
And also, copy the way $1 is used.

Wed, May 8, 2:19 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Language-Team (Language-2019-April-June), MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), CX-analytics
mforns moved T222460: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Wed, May 8, 12:13 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, May 7

mforns added a comment to T222460: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org.

OK, I think I have some conclusions.

Tue, May 7, 9:09 PM · Patch-For-Review, Analytics-Kanban, Analytics

Mon, May 6

mforns moved T222460: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org from Next Up to In Code Review on the Analytics-Kanban board.
Mon, May 6, 9:27 PM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Apr 25

mforns moved T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Thu, Apr 25, 9:10 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns moved T220092: Set up edit_hourly data set in Hive from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Thu, Apr 25, 9:10 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns merged T221866: Add edit tags when available to edit_hourly dataset on turnilo into T219323: Add additional dimensions to edits_hourly in Turnilo and Superset .
Thu, Apr 25, 7:12 PM · Analytics, Product-Analytics
mforns merged task T221866: Add edit tags when available to edit_hourly dataset on turnilo into T219323: Add additional dimensions to edits_hourly in Turnilo and Superset .
Thu, Apr 25, 7:12 PM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T221866: Add edit tags when available to edit_hourly dataset on turnilo.

@Nuria There is already one task (T219323) that Megan created which corresponds to the second iteration on that data set. Will make this a duplicate of that.

Thu, Apr 25, 7:11 PM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a subtask for T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset: T219323: Add additional dimensions to edits_hourly in Turnilo and Superset .
Thu, Apr 25, 7:10 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a parent task for T219323: Add additional dimensions to edits_hourly in Turnilo and Superset : T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.
Thu, Apr 25, 7:10 PM · Analytics, Product-Analytics
mforns added a project to T219323: Add additional dimensions to edits_hourly in Turnilo and Superset : Analytics.
Thu, Apr 25, 3:26 PM · Analytics, Product-Analytics

Apr 22 2019

Neil_P._Quinn_WMF awarded T219177: Add user_is_bot_by to MediaWiki history a Love token.
Apr 22 2019, 8:10 PM · Patch-For-Review, Analytics-Kanban, Analytics-Wikistats, Analytics
mforns added a comment to T220485: Add "Top used photos" metric.

Thanks @Urbanecm,
We'll look this task and prioritize it with the team this Thursday.

Apr 22 2019, 7:48 PM · Analytics, Analytics-Wikistats
mforns triaged T155014: Import 2001 wikipedia data as Low priority.
Apr 22 2019, 4:03 PM · Analytics
mforns raised the priority of T155014: Import 2001 wikipedia data from Low to Needs Triage.
Apr 22 2019, 4:03 PM · Analytics
mforns added a comment to T155014: Import 2001 wikipedia data.

Can we do this in a hackathon?

Apr 22 2019, 4:03 PM · Analytics
mforns lowered the priority of T204737: Verify what Python 2 packages deployed to Analytics hosts are needed from Normal to Low.
Apr 22 2019, 4:02 PM · Analytics
mforns lowered the priority of T204736: Move Analytics Report Updater to Python 3 from Normal to Low.
Apr 22 2019, 4:00 PM · Analytics
mforns lowered the priority of T204735: Move the Analytics Refinery to Python 3 from Normal to Low.
Apr 22 2019, 4:00 PM · Analytics
mforns raised the priority of T203811: Sqoop e-mail is emailing errors in try1 for actions that suceeed in try 3 from Normal to High.
Apr 22 2019, 4:00 PM · Analytics
mforns lowered the priority of T203811: Sqoop e-mail is emailing errors in try1 for actions that suceeed in try 3 from High to Normal.
Apr 22 2019, 3:59 PM · Analytics
mforns raised the priority of T203811: Sqoop e-mail is emailing errors in try1 for actions that suceeed in try 3 from Normal to High.
Apr 22 2019, 3:59 PM · Analytics
mforns triaged T203132: Streamline Superset signup and authentication as Normal priority.
Apr 22 2019, 3:58 PM · Analytics, Contributors-Analysis, Product-Analytics
mforns raised the priority of T203132: Streamline Superset signup and authentication from Normal to Needs Triage.
Apr 22 2019, 3:58 PM · Analytics, Contributors-Analysis, Product-Analytics
mforns triaged T202312: Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all as Normal priority.
Apr 22 2019, 3:57 PM · Analytics
mforns raised the priority of T202312: Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all from Normal to Needs Triage.
Apr 22 2019, 3:57 PM · Analytics
mforns added a comment to T202312: Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all.

Waiting for a schema registry, so we can implement this.

Apr 22 2019, 3:57 PM · Analytics
mforns merged T218319: automatic ingestion from annotations on schemas into druid into T202312: Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all.
Apr 22 2019, 3:57 PM · Analytics
mforns merged task T218319: automatic ingestion from annotations on schemas into druid into T202312: Transform EventLoggingToDruid job to read schemas to ingest from a whitelist and process them all.
Apr 22 2019, 3:57 PM · Product-Analytics, Analytics
mforns closed T202292: Unify logic in partition dropping scripts as Resolved.
Apr 22 2019, 3:53 PM · Analytics
mforns lowered the priority of T200904: Use Snakebite instead of subprocess.Popen in HdfsUtils from Normal to Low.
Apr 22 2019, 3:53 PM · Analytics
mforns triaged T194058: Sesssion reconstruction - evaluate privacy threat as Normal priority.
Apr 22 2019, 3:50 PM · Analytics
mforns raised the priority of T194058: Sesssion reconstruction - evaluate privacy threat from Normal to Needs Triage.
Apr 22 2019, 3:50 PM · Analytics
mforns lowered the priority of T193524: Publish data on seen page previews from Normal to Low.
Apr 22 2019, 3:48 PM · Analytics
mforns closed T217041: Use Z UTC suffix in EventBus emitted events rather than +00:00, a subtask of T212529: Standardize datetimes/timestamps in the Data Lake, as Invalid.
Apr 22 2019, 3:47 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Analytics, Product-Analytics
mforns closed T217041: Use Z UTC suffix in EventBus emitted events rather than +00:00 as Invalid.

@Ottomata, this task can be closed right? Because of the changes we're doing on EventGate.
Closing it. Please, reopen if I'm wrong.

Apr 22 2019, 3:47 PM · Patch-For-Review, Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Product-Analytics
mforns lowered the priority of T217040: Add UTC 'Z' suffix to webrequest `dt` field. from High to Normal.
Apr 22 2019, 3:46 PM · Analytics, Product-Analytics
mforns moved T215616: Improve interlingual links across wikis through Wikidata IDs from Smart Tools for Better Data to Radar on the Analytics board.
Apr 22 2019, 3:45 PM · MediaWiki-Database, Wikidata, Analytics, Research
mforns added a comment to T215616: Improve interlingual links across wikis through Wikidata IDs.

@diego Hi! Is there anythin additional for us Analytics here? Thaanks

Apr 22 2019, 3:43 PM · MediaWiki-Database, Wikidata, Analytics, Research
mforns triaged T215001: Revisions missing from mediawiki_revision_create as High priority.
Apr 22 2019, 3:40 PM · Growth-Team, Product-Analytics, Analytics
mforns raised the priority of T215001: Revisions missing from mediawiki_revision_create from High to Needs Triage.
Apr 22 2019, 3:40 PM · Growth-Team, Product-Analytics, Analytics
mforns updated subscribers of T215001: Revisions missing from mediawiki_revision_create.
Apr 22 2019, 3:40 PM · Growth-Team, Product-Analytics, Analytics
mforns updated subscribers of T215001: Revisions missing from mediawiki_revision_create.

@JAllemandou, is that something you're working on? cc @Milimetric

Apr 22 2019, 3:39 PM · Growth-Team, Product-Analytics, Analytics
mforns moved T220484: Add "Top linked article" metric from Pageview API and AQS to Deprioritized on the Analytics board.
Apr 22 2019, 3:36 PM · Analytics, Analytics-Wikistats
mforns added a comment to T220484: Add "Top linked article" metric.

Why do you think a list of most linked articles would be useful?
We can see the value of the top images by num. of appearances, because the image uploaders might be interested in that.
Can you elaborate? Thanks!

Apr 22 2019, 3:35 PM · Analytics, Analytics-Wikistats
mforns added a comment to T220485: Add "Top used photos" metric.

@Urbanecm Do you mean images used in articles?

Apr 22 2019, 3:33 PM · Analytics, Analytics-Wikistats
mforns moved T220483: Add "Number of stub articles" metric from Pageview API and AQS to Deprioritized on the Analytics board.
Apr 22 2019, 3:31 PM · Analytics, Analytics-Wikistats
mforns moved T220482: Add "Top large articles" metric from Pageview API and AQS to Deprioritized on the Analytics board.
Apr 22 2019, 3:29 PM · Analytics, Analytics-Wikistats

Apr 17 2019

mforns moved T196066: Add prometheus metrics for varnishkafka instances running on caching hosts from In Progress to Paused on the Analytics-Kanban board.
Apr 17 2019, 3:45 PM · Patch-For-Review, Analytics-Kanban, Traffic, Analytics, Operations
mforns moved T190840: EventLogging requests we get from non-wiki* hostnames or apps should be filtered at refine time from Next Up to In Progress on the Analytics-Kanban board.
Apr 17 2019, 3:45 PM · Patch-For-Review, Analytics-Kanban, Analytics-Data-Quality, Analytics

Apr 16 2019

mforns moved T212014: Sanitization should be run a second time from In Progress to In Code Review on the Analytics-Kanban board.
Apr 16 2019, 8:37 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

Cool! Glad that you guys liked it.
Yes, I left user_tenure_bucket for next iteration as per Nuria's suggestion in the doc.
User_tenure_bucket was a bit more complex than the other fields, but I checked and I believe it's feasible.

Apr 16 2019, 6:31 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics

Apr 12 2019

mforns added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

I found https://www.mediawiki.org/wiki/Extension:QuickSurveys,
and it explains the code for the survey is loaded dynamically, so JS disabled is not the cause.
DNT is also not the cause, because when it's on, the surveys don't even show.

Apr 12 2019, 3:57 PM · Readers-Web-Backlog (Tracking), Analytics, Analytics-EventLogging, QuickSurveys
mforns added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

I've been looking into this for a bit.
Is there any documentation I can read on the flow of the surveys?
Does the user click on a link on-wiki, that opens a Google/Qualtrics form?
And when do events for QuickSurveyInitiation and QuickSurveysResponses trigger?

Apr 12 2019, 3:47 PM · Readers-Web-Backlog (Tracking), Analytics, Analytics-EventLogging, QuickSurveys

Apr 11 2019

mforns added a comment to T189475: Identify common abuse filters that affect translations.

@Amire80 I couldn't find any other task that refers to fixing the broken job.
Maybe it was in an email... or conversation? I couln't find them either.
We can use this task for that anyway, no?

Apr 11 2019, 11:01 AM · Language-Team (Language-2019-April-June), CX-analytics

Apr 9 2019

mforns moved T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset from In Progress to In Code Review on the Analytics-Kanban board.
Apr 9 2019, 2:30 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns moved T220092: Set up edit_hourly data set in Hive from In Progress to In Code Review on the Analytics-Kanban board.
Apr 9 2019, 2:30 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

This is corrected now. See: https://turnilo.wikimedia.org/#edits_hourly

Apr 9 2019, 2:27 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics

Apr 8 2019

mforns created T220410: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list.
Apr 8 2019, 3:04 PM · Analytics

Apr 5 2019

mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

It's a problem with the generation of edit_hourly in Hive.
The timestamp is not well formatted, I was using:

FROM_UNIXTIME(
    UNIX_TIMESTAMP(event_timestamp, 'yyyy-MM-dd hh:mm:ss.sss'),
    'yyyy-MM-dd hh:00:00.0'
) AS dt

But it's converting that to am-only hours, it's an easy-to-fix.
Will fix that on Monday.

Apr 5 2019, 7:40 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

Wow, that's weird.

Apr 5 2019, 7:18 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

Hey all!

Apr 5 2019, 7:06 PM · Patch-For-Review, Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T220092: Set up edit_hourly data set in Hive.

Thanks @Neil_P._Quinn_WMF! Forgot to do that.
As you see, we had a slight change of plans in the implementation.
We encountered and issue in Druid, which does not allow to apply transforms to fields that are not listed as dimensions, for hive tables stored in parquet format.
So we decided to create this intermediate table in Hive called edit_hourly (maybe edit_daily, if we find that hourly reveals poor performance).
This way we won't need to use druid transforms (transforms will happen in Hadoop via HiveSQL).
Also, we can take advantage of having the Hive version of the data set for more detailed querying.
Druid developers are fixing this issue in the new version, but it will still take some time until we upgrade to that.
In any case, it won't harm to have that intermediate table in Hive.

Apr 5 2019, 2:32 PM · Patch-For-Review, Analytics-Kanban, Analytics

Apr 4 2019

mforns moved T220092: Set up edit_hourly data set in Hive from Next Up to In Progress on the Analytics-Kanban board.
Apr 4 2019, 4:11 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns updated the task description for T220092: Set up edit_hourly data set in Hive.
Apr 4 2019, 2:07 PM · Patch-For-Review, Analytics-Kanban, Analytics