Page MenuHomePhabricator

mforns (Marcel Ruiz Forns)
Software Engineer @ Analytics

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Nov 7 2014, 8:52 PM (227 w, 4 d)
Availability
Available
IRC Nick
mforns
LDAP User
Mforns
MediaWiki User
Unknown

Recent Activity

Mon, Mar 18

Milimetric awarded T218045: Wikistats: Change Mercator Projection to Eckert IV a Pirate Logo token.
Mon, Mar 18, 3:45 PM · Analytics

Wed, Mar 13

mforns added a comment to T215289: Update reportupdater to be able to query the new db cluster that will substitute 1002.

@chelsyx thanks for the clarification!

Wed, Mar 13, 3:13 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns added a comment to T210706: Move AQS to nodejs 10.

we could do it one of these EU evenings?

Sure! I'll be on vacation, though, starting this Fri 15th (included), and will be back on Mon 25th.

Wed, Mar 13, 3:05 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Mar 12

mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

@MNeisler
Great, I already accepted the invite. Thank you!

Tue, Mar 12, 12:43 PM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics

Mon, Mar 11

mforns renamed T200070: Wikistats2: Values in map view show unnecessary decimal digits from Issues with page view map in Wikistats 2 to Wikistats2: Values in map view show unnecessary decimal digits.
Mon, Mar 11, 4:32 PM · Analytics-Wikistats, Analytics
mforns added a comment to T200070: Wikistats2: Values in map view show unnecessary decimal digits.

Issues #2 and #3 are tackled in other tasks: T187212 and T218045 respectively. Let's keep this task about issue #1.
@Jsc39, please feel free to take this task :] Ping me, if you need some help!

Mon, Mar 11, 4:31 PM · Analytics-Wikistats, Analytics
mforns created T218045: Wikistats: Change Mercator Projection to Eckert IV.
Mon, Mar 11, 4:29 PM · Analytics
mforns added a comment to T200070: Wikistats2: Values in map view show unnecessary decimal digits.

@ezachte The issue 1 has changed since your initial description.
Now we do not have intervals, rather we round to the 1000s. However, there is still some non-human values like 304.33M.
What do you think would be a good formatting?

Mon, Mar 11, 4:27 PM · Analytics-Wikistats, Analytics
mforns moved T199571: Increase topojson resolution: Singapore does not appear on wikistats map from Wikistats Beta to Deprioritized on the Analytics board.
Mon, Mar 11, 4:21 PM · Analytics
mforns added a comment to T199432: Consider disabling automatic topic creation in main-kafka.

@elukey we were here in grosking and agreed to look into this to be sure there are no action items left. before closing.

Mon, Mar 11, 4:20 PM · User-Elukey, Core Platform Team Backlog (Designing), ChangeProp, EventBus, WMF-JobQueue, Services (designing), Analytics
mforns added a project to T199432: Consider disabling automatic topic creation in main-kafka: User-Elukey.
Mon, Mar 11, 4:20 PM · User-Elukey, Core Platform Team Backlog (Designing), ChangeProp, EventBus, WMF-JobQueue, Services (designing), Analytics
mforns added a comment to T197896: Make various auth libraries available on stat* machines.

@mpopov Hi! Do you need google-auth here, or you managed with the others?

Mon, Mar 11, 4:14 PM · Patch-For-Review, Product-Analytics, Analytics, SEO
mforns moved T197277: Pageviews agent=bot is always 0 from Data Quality to Forgotten Documentation on the Analytics board.
Mon, Mar 11, 4:10 PM · good first bug, Pageviews-API, Analytics, Tool-Pageviews
mforns moved T192847: Under construction page in wikistats to take site down from Wikistats Beta to Wikistats Production on the Analytics board.
Mon, Mar 11, 4:05 PM · Analytics
mforns moved T190915: Improve scoping of CSS from Wikistats Beta to Deprioritized on the Analytics board.
Mon, Mar 11, 4:04 PM · Patch-For-Review, Analytics-Wikistats, Analytics
mforns added a comment to T190840: Spike: Quantify how many EventLogging requests we get from non-wiki* hostnames or apps .

Much of this data may be coming from bots as well, see: T210006

Mon, Mar 11, 4:03 PM · Analytics-Data-Quality, Analytics
mforns added a comment to T189044: Mediawiki History: moves counted twice in Revision.

@JAllemandou @Milimetric Was something done in this regard, during data quality work?

Mon, Mar 11, 3:59 PM · Analytics
mforns added a comment to T187102: Vagrant's /var/log/daemon.log filling up with kafka errors.

We have not been able to reproduce this issue. But maybe @elukey has an idea of how to solve?

Mon, Mar 11, 3:55 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, MediaWiki-Vagrant
mforns assigned T187102: Vagrant's /var/log/daemon.log filling up with kafka errors to elukey.
Mon, Mar 11, 3:54 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, MediaWiki-Vagrant
mforns updated subscribers of T185342: Wikistats 2: New Pages split by editor type wrongly claims no anonymous users create pages.

@Milimetric, @JAllemandou
(grosking)
Do we know what is the cause of this issue, after the recent quality work?

Mon, Mar 11, 3:51 PM · Analytics, Analytics-Wikistats
mforns raised the priority of T183303: Decomission old analytics kafka cluster from Normal to High.
Mon, Mar 11, 3:48 PM · Analytics
mforns added a comment to T178591: Feedback on hive table mediawiki_history by Erik Z.

Hi @JAllemandou :]
(in grosking)
Can you outline what remains to be done here please?

Mon, Mar 11, 3:47 PM · Analytics-Wikistats, Analytics
mforns reassigned T178591: Feedback on hive table mediawiki_history by Erik Z from Nuria to JAllemandou.
Mon, Mar 11, 3:47 PM · Analytics-Wikistats, Analytics
mforns lowered the priority of T175461: Port Kafka clients to new jumbo cluster from High to Normal.
Mon, Mar 11, 3:45 PM · Analytics, Patch-For-Review, Analytics-Cluster
mforns raised the priority of T175461: Port Kafka clients to new jumbo cluster from Normal to High.
Mon, Mar 11, 3:43 PM · Analytics, Patch-For-Review, Analytics-Cluster
mforns lowered the priority of T172009: Add referer to WebrequestData from Normal to Low.
Mon, Mar 11, 3:43 PM · Product-Analytics, Analytics, Discovery-Analysis, Discovery
mforns removed a subtask for T159840: Alarm on data quality issues : T168648: Productionize analysis of editcount vs per_user_revision_count.
Mon, Mar 11, 3:42 PM · Analytics
mforns removed a parent task for T168648: Productionize analysis of editcount vs per_user_revision_count: T159840: Alarm on data quality issues .
Mon, Mar 11, 3:42 PM · Analytics
mforns added a subtask for T159840: Alarm on data quality issues : T215863: Coarse alarm on data quality for refined data based on entrophy calculations.
Mon, Mar 11, 3:42 PM · Analytics
mforns added a parent task for T215863: Coarse alarm on data quality for refined data based on entrophy calculations: T159840: Alarm on data quality issues .
Mon, Mar 11, 3:42 PM · Analytics
mforns set the point value for T159840: Alarm on data quality issues to 0.
Mon, Mar 11, 3:42 PM · Analytics
mforns lowered the priority of T117945: Add alarms for high volume of views to pages with replacement characters from Normal to Low.
Mon, Mar 11, 3:40 PM · Analytics, Analytics-Data-Quality, Datasets-Webstatscollector, Language-Team
mforns lowered the priority of T214043: Make edit data lake data available as a snapshot on dump hosts from High to Normal.
Mon, Mar 11, 3:39 PM · Analytics
mforns placed T213716: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long up for grabs.
Mon, Mar 11, 3:37 PM · Analytics
mforns lowered the priority of T212879: Anomalous statistics results in eu.wikipedia siteviews from High to Normal.
Mon, Mar 11, 3:36 PM · Pageviews-Anomaly, Analytics-Data-Quality, Analytics
mforns lowered the priority of T212529: Standardize datetimes/timestamps in the Data Lake from High to Normal.
Mon, Mar 11, 3:32 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, Analytics, Product-Analytics
mforns lowered the priority of T211836: Enable Security (stronger authentication and data encryption) for the Analytics Hadoop cluster and its dependent services from High to Normal.
Mon, Mar 11, 3:31 PM · User-Elukey, Analytics
mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

Hi @MNeisler! We'd like to have this done by the end of this quarter. Is there anything we can do, I can help you build a job that loads that data. Maybe we can have a meeting and you can pass me the requirements of the data set.

Mon, Mar 11, 3:29 PM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns lowered the priority of T210313: Statistics for views of individual Wikimedia images from High to Normal.
Mon, Mar 11, 3:27 PM · Analytics, Tool-Pageviews
mforns added a project to T210006: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. : Analytics-Kanban.
Mon, Mar 11, 3:27 PM · Analytics-Kanban, Product-Analytics, Analytics
mforns moved T199928: Quantify volume of traffic on piwik with DNT header set from Data Quality to Blocked on the Analytics board.
Mon, Mar 11, 3:26 PM · Analytics

Fri, Mar 8

mforns moved T215289: Update reportupdater to be able to query the new db cluster that will substitute 1002 from In Progress to Done on the Analytics-Kanban board.
Fri, Mar 8, 6:47 PM · Patch-For-Review, Analytics-Kanban, Analytics
mforns moved T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive from Ready to Deploy to Done on the Analytics-Kanban board.
Fri, Mar 8, 6:09 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns updated subscribers of T215289: Update reportupdater to be able to query the new db cluster that will substitute 1002.

Hi @chelsyx and @Amire80

Fri, Mar 8, 4:51 PM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Mar 7

mforns lowered the priority of T212172: Provide feature parity between the wiki replicas and the Analytics Data Lake from High to Normal.
Thu, Mar 7, 6:12 PM · User-Elukey, Epic, Analytics, Contributors-Analysis, Product-Analytics
mforns lowered the priority of T211950: Add partial blocks to mediawiki history tables from High to Normal.
Thu, Mar 7, 6:11 PM · Product-Analytics, Anti-Harassment, Analytics
mforns raised the priority of T210741: EventStreams process occasionally OOMs from High to Needs Triage.
Thu, Mar 7, 6:08 PM · Core Platform Team Backlog (Watching / External), Services (watching), Patch-For-Review, Wikimedia-Stream, Analytics
mforns lowered the priority of T210423: Wikistats2 metric: top article creators from High to Normal.
Thu, Mar 7, 6:06 PM · Analytics
mforns moved T210423: Wikistats2 metric: top article creators from Wikistats Beta to Wikistats Production on the Analytics board.
Thu, Mar 7, 6:06 PM · Analytics
mforns raised the priority of T208230: Update pageview_hourly to include timestamp for better druid indexation from High to Needs Triage.
Thu, Mar 7, 6:03 PM · Analytics
mforns lowered the priority of T205809: Percentage increase should be removed from"all" time range on wikistats UI from High to Normal.
Thu, Mar 7, 6:02 PM · Analytics
mforns lowered the priority of T205730: Group pageview data per family in AQS so we can surface it in wikistats per-family pageview metrics from High to Normal.
Thu, Mar 7, 6:01 PM · Analytics-Wikistats, Analytics
mforns raised the priority of T205617: Update datasets to have explicit timestamp for druid indexation facilitation from High to Needs Triage.
Thu, Mar 7, 6:01 PM · Analytics
mforns added a project to T205437: Resurrect eventlogging_EventError logging to in logstash: Analytics-Kanban.
Thu, Mar 7, 6:00 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns reopened T205437: Resurrect eventlogging_EventError logging to in logstash as "Open".

Re-opening, needs to check if it is deployed and restarted.

Thu, Mar 7, 5:59 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns closed T205437: Resurrect eventlogging_EventError logging to in logstash as Resolved.

This task is done. Resolving.

Thu, Mar 7, 5:57 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns lowered the priority of T203824: Fix download-project-namespace-map script to send alert if it fails from High to Normal.
Thu, Mar 7, 5:55 PM · Analytics
mforns assigned T199928: Quantify volume of traffic on piwik with DNT header set to Nuria.
Thu, Mar 7, 5:53 PM · Analytics
mforns lowered the priority of T189623: AQS edits API should not allow queries without time bounds from High to Normal.
Thu, Mar 7, 5:52 PM · Analytics
mforns lowered the priority of T168103: heirloom-mailx fails trying to send out email from SWAP notebook from High to Low.
Thu, Mar 7, 5:51 PM · Analytics-SWAP, Analytics
mforns moved T215289: Update reportupdater to be able to query the new db cluster that will substitute 1002 from In Code Review to In Progress on the Analytics-Kanban board.
Thu, Mar 7, 3:46 PM · Patch-For-Review, Analytics-Kanban, Analytics

Wed, Mar 6

mforns moved T215289: Update reportupdater to be able to query the new db cluster that will substitute 1002 from In Progress to In Code Review on the Analytics-Kanban board.
Wed, Mar 6, 3:57 PM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Mar 5

mforns moved T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Tue, Mar 5, 4:04 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns moved T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive from In Progress to In Code Review on the Analytics-Kanban board.
Tue, Mar 5, 2:58 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban

Thu, Feb 28

mforns updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@GoranSMilovanovic I found 2 schemas that I wasn't aware of, that I believe belong to WMDE:
WMDEBannerEvents and WMDEBannerSizeIssue.
I didn't delete them yet, in case their owner wasn't aware of the 90-day deletion policy.
Are you the owner of those schemas?

Thu, Feb 28, 4:54 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

The deletion is finished now. The event database only contains now the last 90 days of data.
We're still deleting the corresponding Hive partitions (meta-data), maybe this causes some warnings when querying data.
The estimated time of completion of the partition meta-data synch'ing is in 24 hours.

Thu, Feb 28, 4:51 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Ok, starting the deletion now.

Thu, Feb 28, 2:23 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.

@chelsyx thanks for the check!

Thu, Feb 28, 11:42 AM · Analytics-Kanban, Patch-For-Review, Product-Analytics, Reading-analysis, Analytics
mforns added a comment to T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

@Tbayer thanks for the check!

Thu, Feb 28, 11:41 AM · Reading Depth, Analytics-Kanban, Readers-Web-Backlog (Tracking), Product-Analytics, Analytics

Wed, Feb 27

mforns updated subscribers of T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.
Wed, Feb 27, 2:31 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Here's the plan for tomorrow:

  1. Collect the names of all tables in the event database that belong to the EL pipeline.
  2. For each table T, delete /wmf/data/event/T/year=2017 and /wmf/data/event/T/year=2018/month=M with M in (1,2,...10).
  3. For each table T, execute an msck repair table T command in Hive.
  4. Execute the deletion script (refinery-drop-older-than) once with --older-than=90 to delete the last due days.
  5. Productionize a systemd timer that calls the deletion script periodically (every day) in puppet.
Wed, Feb 27, 2:30 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns awarded T212396: eventlogging fails flake8 due to new upstream version, breaking CI a Stroopwafel token.
Wed, Feb 27, 1:58 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns reassigned T212396: eventlogging fails flake8 due to new upstream version, breaking CI from mforns to hashar.
Wed, Feb 27, 1:57 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Miriam Thanks!

Wed, Feb 27, 1:55 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban

Mon, Feb 25

mforns updated the task description for T211835: Sunset Wikimetrics .
Mon, Feb 25, 6:11 PM · Analytics-Kanban, Analytics-Wikimetrics, Analytics
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@leila, +1 to Nuria, but I think you're just confused by my latest message.
We already discussed this 14th of November onwards.
As I understood it from @Miriam's comments, you guys copied the CitationUsage schemas over to another location, right?
So, the script won't affect that copy.
I was just giving a last ping after final deletion.
Please, confirm :]

Mon, Feb 25, 3:47 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban

Fri, Feb 22

mforns updated the task description for T211835: Sunset Wikimetrics .
Fri, Feb 22, 6:42 PM · Analytics-Kanban, Analytics-Wikimetrics, Analytics
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@leila, it's this one: https://github.com/wikimedia/analytics-refinery/blob/master/static_data/eventlogging/whitelist.yaml

Fri, Feb 22, 4:23 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

Sorry for the mass ping.

Fri, Feb 22, 4:15 PM · Patch-For-Review, Analytics-EventLogging, Analytics-Kanban
mforns moved T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas from Next Up to Done on the Analytics-Kanban board.
Fri, Feb 22, 4:07 PM · Analytics-Kanban, Patch-For-Review, Product-Analytics, Reading-analysis, Analytics
mforns claimed T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.
Fri, Feb 22, 4:07 PM · Analytics-Kanban, Patch-For-Review, Product-Analytics, Reading-analysis, Analytics
mforns added a comment to T209087: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas.

@chelsyx @mpopov @Neil_P._Quinn_WMF (cc @Nuria)
I backfilled both schemas since the discussed dates: event_sanitized.VisualEditorFeatureUse since 2018-10-24 and event_sanitized.MobileWikiAppShareAFact since 2018-06-21. I vetted the resulting data and it all looks good to me, but please give a quick check to confirm.
Thanks for the changes you guys did! Now I will proceed to productionize the purging script that will delete events older than 90 days from the raw events database (event). This will happen on Feb 28th.
Cheers!

Fri, Feb 22, 4:07 PM · Analytics-Kanban, Patch-For-Review, Product-Analytics, Reading-analysis, Analytics
mforns moved T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema from Next Up to Done on the Analytics-Kanban board.
Fri, Feb 22, 4:00 PM · Reading Depth, Analytics-Kanban, Readers-Web-Backlog (Tracking), Product-Analytics, Analytics
mforns claimed T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.
Fri, Feb 22, 3:59 PM · Reading Depth, Analytics-Kanban, Readers-Web-Backlog (Tracking), Product-Analytics, Analytics
mforns added a comment to T216096: Whitelist sample flags and page/rev ID fields for ReadingDepth schema.

@Tbayer event_sanitized.readingdepth is backfilled using the new whitelist.
I have vetted the resulting data and it looks good to me, but please do a quick check.
Note that the 28th of this month (in one week) we'll execute the purging script to delete data older than 90 days on the event database, and backfillings like this will not be possible any more (after 90 days).
Cheers!

Fri, Feb 22, 3:59 PM · Reading Depth, Analytics-Kanban, Readers-Web-Backlog (Tracking), Product-Analytics, Analytics

Thu, Feb 21

mforns added a comment to T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

@MNeisler Cool :]
Here's the Druid transforms expression list, so that you know the possibilities and the limitations: http://druid.io/docs/latest/misc/math-expr.html
Let me know if I can help!

Thu, Feb 21, 11:53 AM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics

Wed, Feb 20

mforns updated subscribers of T211173: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset.

@kzimmerman @MNeisler
Sure, we can discuss here, or have a meeting, what's better for you. I also just talked to @Neil_P._Quinn_WMF about whether we should extract the data from mediawiki_history to an intermediate Hive table, and then load from that one. Or just use Druid transforms to ingest directly from mediawiki_history. I lean towards the second option, because it doesn't need the extra step (table which will have to be maintained). But let's discuss!

Wed, Feb 20, 4:13 PM · Analytics-Kanban, Better Use Of Data, Product-Analytics, Analytics
mforns added a comment to T212396: eventlogging fails flake8 due to new upstream version, breaking CI.

@hashar Thanks a lot for all the dedicated work!!

Wed, Feb 20, 4:03 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging

Tue, Feb 19

mforns added a comment to T214384: [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema.

@Ottomata I tried the simple ALTER TABLE, and it works, provided the field you want to change the type of, is a top level field. In our case, we want to change the type of a subfield of the event struct. When you do this the whole event field becomes unreadable for all the data that was in the table. I think the problem lies in the serde (parquet serde?). That is why we needed to backfill the entire table since 2017-11. However we only had raw events for the last 3 months, so part of the backfilling has been done with a temp copy of the old data.

Tue, Feb 19, 4:49 PM · Analytics-Kanban, Patch-For-Review, Performance-Team (Radar), Analytics
mforns added a comment to T214384: [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema.

I think this could work, but I believe it has a couple drawbacks that we can avoid:

  • I think it needs some extra cognitive load. Especially because different fields with different orders of magnitude will need different decimal shifts. We can store the length of the shift in the schema field description, but still it's potentially confusing I think.
  • Also, it would need an explicit conversion on the client (1.2345 => 12345). And the shift length per field would have to be stored somewhere or hardcoded in the instrumentation, no?
Tue, Feb 19, 8:57 AM · Analytics-Kanban, Patch-For-Review, Performance-Team (Radar), Analytics
mforns added a comment to T216414: Purge wikitext snapshots.

@JAllemandou Yes. The checksum does not change when the dates change, because the --older-than=90 parameter still remains the same.

Tue, Feb 19, 8:24 AM · Patch-For-Review, Analytics-Kanban, Analytics

Mon, Feb 18

mforns moved T214384: [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema from In Progress to Done on the Analytics-Kanban board.
Mon, Feb 18, 11:57 PM · Analytics-Kanban, Patch-For-Review, Performance-Team (Radar), Analytics
mforns added a comment to T214384: [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema.

Hey all :]

Mon, Feb 18, 11:57 PM · Analytics-Kanban, Patch-For-Review, Performance-Team (Radar), Analytics
mforns moved T212396: eventlogging fails flake8 due to new upstream version, breaking CI from In Progress to Paused on the Analytics-Kanban board.
Mon, Feb 18, 8:04 PM · Analytics-Kanban, Patch-For-Review, Analytics, Analytics-EventLogging
mforns moved T211835: Sunset Wikimetrics from Next Up to In Progress on the Analytics-Kanban board.
Mon, Feb 18, 8:04 PM · Analytics-Kanban, Analytics-Wikimetrics, Analytics
mforns moved T214384: [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema from Next Up to In Progress on the Analytics-Kanban board.
Mon, Feb 18, 8:04 PM · Analytics-Kanban, Patch-For-Review, Performance-Team (Radar), Analytics
mforns moved T212014: Sanitization should be run a second time from In Code Review to In Progress on the Analytics-Kanban board.
Mon, Feb 18, 8:04 PM · Patch-For-Review, Analytics, Analytics-Kanban
mforns added a project to T211835: Sunset Wikimetrics : Analytics-Kanban.
Mon, Feb 18, 8:02 PM · Analytics-Kanban, Analytics-Wikimetrics, Analytics
mforns added a comment to T189475: Identify common abuse filters that affect translations.

@Amire80, sure feel free to schedule one!

Mon, Feb 18, 6:59 PM · Language-Team (Language-2019-January-March), CX-analytics
mforns added a comment to T216414: Purge wikitext snapshots.

@JAllemandou
You need to run the script once without the --execute flag (dry run). The checksum will be printed by the script at the end.
To *really* execute the script, run it again with the same parameters and with --execute <checksum>.
The puppet systemd-timer should thus include the checksum.
This checksum thing was intended to force us to do a dry run first and hopefully check the printlns before executing it for real.

Mon, Feb 18, 6:15 PM · Patch-For-Review, Analytics-Kanban, Analytics