Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Projects (16)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (402 w, 4 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Thu, Jun 16

Milimetric created T310824: Automatically monitor schema changes that would break sqoop.
Thu, Jun 16, 5:34 PM · Data-Engineering
Milimetric closed T309731: Remove unused Gerrit repository mediawiki/services/aqs/deploy as Resolved.

Sorry, Andre, I didn't even know there was a Gerrit tag. I'm marking this as resolved for now. If we ever come up with a different way of handling inactive repositories, I'll circle back and apply it here.

Thu, Jun 16, 4:44 PM · Gerrit, Data-Engineering

Wed, Jun 15

Milimetric added a comment to T310593: Experiencing pipeline failure due to disk-space issues.

Another example: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/jobs/21076

Wed, Jun 15, 2:48 PM · Data-Engineering, GitLab
Milimetric moved T307714: Custom Metadata ingestion from In Code Review to Ready to Deploy on the Data-Engineering-Kanban board.
Wed, Jun 15, 2:16 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Catalog
Milimetric moved T309806: The effect of sqooping large tables on mediawiki history from In Code Review to Done on the Data-Engineering-Kanban board.
Wed, Jun 15, 2:16 PM · Data-Engineering-Kanban
Milimetric moved T309987: Mediawiki History delayed 2022-05 from In Code Review to Ready to Deploy on the Data-Engineering-Kanban board.
Wed, Jun 15, 1:51 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban

Tue, Jun 14

Milimetric awarded T286963: Prototype a Vue SSR implementation using a Node service a Love token.
Tue, Jun 14, 2:25 PM · Design-Systems-Team

Mon, Jun 13

srishakatux awarded T310317: Data missing on the hierarchical view on the wmcs-edits tool a Love token.
Mon, Jun 13, 9:35 PM · Data-Engineering, Data-Engineering-Kanban, Developer-Advocacy, Cloud-Services
Milimetric removed a project from T217343: Package dictionaries better for ORES models: Analytics.
Mon, Jun 13, 5:10 PM · artificial-intelligence, ORES, Machine-Learning-Team
Milimetric removed a project from T216246: [Discuss] ORES model development and deployment processes: Analytics.
Mon, Jun 13, 5:10 PM · Machine-Learning-Team
Milimetric removed a project from T280107: Generate dump of scored-revisions from 2018-2020 for Wikis except English Wikipedia: Analytics.
Mon, Jun 13, 5:10 PM · artificial-intelligence, editquality-modeling, ORES, Machine-Learning-Team

Fri, Jun 10

Milimetric moved T310317: Data missing on the hierarchical view on the wmcs-edits tool from Ready to Deploy to Done on the Data-Engineering-Kanban board.

Ok, jobs ran, dashboard looks ok again, I think it's solved, ping me again if anything seems weird.

Fri, Jun 10, 8:06 PM · Data-Engineering, Data-Engineering-Kanban, Developer-Advocacy, Cloud-Services
Milimetric moved T310317: Data missing on the hierarchical view on the wmcs-edits tool from In Progress to Ready to Deploy on the Data-Engineering-Kanban board.

The logs showed consistent errors since 2021-03, but I think it was just because this file had a trailing half-empty row (just the date and no output). So I reran the jobs, they seem ok... so weird. I think this means the data will be fixed soon. I'll move to Done if I'm right.

Fri, Jun 10, 3:07 PM · Data-Engineering, Data-Engineering-Kanban, Developer-Advocacy, Cloud-Services

Thu, Jun 9

Milimetric claimed T310317: Data missing on the hierarchical view on the wmcs-edits tool.
Thu, Jun 9, 10:16 PM · Data-Engineering, Data-Engineering-Kanban, Developer-Advocacy, Cloud-Services
Milimetric edited projects for T310317: Data missing on the hierarchical view on the wmcs-edits tool, added: Data-Engineering-Kanban, Data-Engineering; removed Analytics.

Yeah, it looks like the queries have been failing and the data dashiki is trying to load is corrupted. But I ran the queries manually and they don't fail. So I'll take this as a bug and work on it as soon as I can. It's weird :)

Thu, Jun 9, 10:15 PM · Data-Engineering, Data-Engineering-Kanban, Developer-Advocacy, Cloud-Services
Milimetric closed T305556: Drop UploadWizard* data as Resolved.
Thu, Jun 9, 7:03 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric closed T307774: Drop GettingStarted* data as Resolved.
Thu, Jun 9, 7:03 PM · Data-Engineering-Kanban, Data-Engineering

Tue, Jun 7

Milimetric added a comment to T307701: Adding Datasets: MediaWiki History.

added my draft at https://wikitech.wikimedia.org/wiki/User:Milimetric/Notebook/MediaWiki_History, shall we edit there before moving it to DataHub or shall we edit on DataHub? I don't think it's useful to craft text there, since that pollutes the history, but let me know what you think

Tue, Jun 7, 9:43 PM · Data-Catalog
Milimetric moved T309987: Mediawiki History delayed 2022-05 from Done to In Code Review on the Data-Engineering-Kanban board.
Tue, Jun 7, 3:13 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban
Milimetric moved T309987: Mediawiki History delayed 2022-05 from In Progress to Done on the Data-Engineering-Kanban board.
Tue, Jun 7, 3:12 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban

Mon, Jun 6

Milimetric added a comment to T309731: Remove unused Gerrit repository mediawiki/services/aqs/deploy.

I'm not sure how we can remove it, https://www.mediawiki.org/wiki/Gerrit/Inactive_projects seems to say we just mark repositories as "Read Only". Is this enough? Does someone know if we have a more permanent removal? It's indeed just a repo that was never really used. @Aklapper: any advice?

Mon, Jun 6, 8:00 PM · Gerrit, Data-Engineering
Milimetric renamed T309987: Mediawiki History delayed 2022-05 from Mediawiki History delayed 2022-06 to Mediawiki History delayed 2022-05.
Mon, Jun 6, 7:43 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban
Milimetric claimed T309806: The effect of sqooping large tables on mediawiki history.
Mon, Jun 6, 3:18 PM · Data-Engineering-Kanban
Milimetric moved T309718: [Airflow] Migrate Oozie's mediawiki_history_load jobs to Airflow from In Code Review to In Progress on the Data-Engineering-Kanban board.
Mon, Jun 6, 3:18 PM · Data-Engineering-Kanban, Airflow
Milimetric moved T309718: [Airflow] Migrate Oozie's mediawiki_history_load jobs to Airflow from Next Up to In Code Review on the Data-Engineering-Kanban board.
Mon, Jun 6, 3:18 PM · Data-Engineering-Kanban, Airflow
Milimetric moved T309987: Mediawiki History delayed 2022-05 from Incoming to Datasets on the Data-Engineering board.
Mon, Jun 6, 3:17 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban
Milimetric reassigned T308766: Fix airflow interlanguage job from NOkafor-WMF to JAllemandou.
Mon, Jun 6, 3:15 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
JAllemandou awarded T309738: Move Mediawiki QueryPages computation to Hadoop a Party Time token.
Mon, Jun 6, 3:14 PM · Data-Persistence (Consultation), Data-Engineering
Milimetric moved T300021: Low Risk Oozie Migration: 4 wikidata metrics jobs from Ready to Deploy to In Code Review on the Data-Engineering-Kanban board.
Mon, Jun 6, 3:06 PM · Data-Engineering-Kanban, Data-Engineering, Airflow
Milimetric moved T300021: Low Risk Oozie Migration: 4 wikidata metrics jobs from In Code Review to Ready to Deploy on the Data-Engineering-Kanban board.
Mon, Jun 6, 3:06 PM · Data-Engineering-Kanban, Data-Engineering, Airflow
Milimetric created T309987: Mediawiki History delayed 2022-05.
Mon, Jun 6, 2:44 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban
Milimetric updated the task description for T309738: Move Mediawiki QueryPages computation to Hadoop.
Mon, Jun 6, 2:37 PM · Data-Persistence (Consultation), Data-Engineering
Milimetric updated the task description for T309738: Move Mediawiki QueryPages computation to Hadoop.
Mon, Jun 6, 2:37 PM · Data-Persistence (Consultation), Data-Engineering
Milimetric added a comment to T309738: Move Mediawiki QueryPages computation to Hadoop.

@Milimetric In order to evaluate impact of doing this work do we have info on how frequently these queries run, the duration and resource allocation is in computing these queries.

Mon, Jun 6, 2:34 PM · Data-Persistence (Consultation), Data-Engineering

Thu, Jun 2

Milimetric moved T309806: The effect of sqooping large tables on mediawiki history from Next Up to In Code Review on the Data-Engineering-Kanban board.
Thu, Jun 2, 7:52 PM · Data-Engineering-Kanban
Milimetric added a project to T309806: The effect of sqooping large tables on mediawiki history: Data-Engineering-Kanban.
Thu, Jun 2, 7:52 PM · Data-Engineering-Kanban
Milimetric created T309806: The effect of sqooping large tables on mediawiki history.
Thu, Jun 2, 7:34 PM · Data-Engineering-Kanban
Ladsgroup awarded T309738: Move Mediawiki QueryPages computation to Hadoop a Love token.
Thu, Jun 2, 5:58 PM · Data-Persistence (Consultation), Data-Engineering

Wed, Jun 1

Milimetric created T309738: Move Mediawiki QueryPages computation to Hadoop.
Wed, Jun 1, 9:02 PM · Data-Persistence (Consultation), Data-Engineering
Milimetric added a comment to T307711: User Experience: Authentication.

Emil is still having a problem authenticating. When he logs in, his username doesn't have the groups that I add for user echetty.

Wed, Jun 1, 7:25 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307711: User Experience: Authentication from Done to In Progress on the Data-Catalog board.
Wed, Jun 1, 7:25 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric updated the task description for T307716: Spike: Evaluate datahub schema versioning support.
Wed, Jun 1, 6:51 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric updated the task description for T307716: Spike: Evaluate datahub schema versioning support.
Wed, Jun 1, 6:51 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric created T309717: Event Utilities partially downloads schemas.
Wed, Jun 1, 2:33 PM · Data-Engineering

Tue, May 31

Milimetric moved T307774: Drop GettingStarted* data from Ready to Deploy to Done on the Data-Engineering-Kanban board.
drop table event_sanitized.gettingstartedredirectimpression;
drop table event.gettingstartedredirectimpression;
Tue, May 31, 8:24 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric moved T305556: Drop UploadWizard* data from Ready to Deploy to Done on the Data-Engineering-Kanban board.
drop table event_sanitized.uploadwizarderrorflowevent;
drop table event_sanitized.uploadwizardexceptionflowevent;
drop table event_sanitized.uploadwizardflowevent;
drop table event_sanitized.uploadwizardstep;
drop table event_sanitized.uploadwizardtutorialactions;
drop table event_sanitized.uploadwizarduploadflowevent;
Tue, May 31, 8:24 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric moved T307774: Drop GettingStarted* data from Next Up to Ready to Deploy on the Data-Engineering-Kanban board.
Tue, May 31, 7:20 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric moved T305556: Drop UploadWizard* data from Next Up to Ready to Deploy on the Data-Engineering-Kanban board.
Tue, May 31, 7:20 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric assigned T307774: Drop GettingStarted* data to Snwachukwu.
Tue, May 31, 7:20 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric claimed T305556: Drop UploadWizard* data.
Tue, May 31, 7:02 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric moved T309000: Check home/HDFS leftovers of razzi from Next Up to Ready to Deploy on the Data-Engineering-Kanban board.
Tue, May 31, 4:03 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric claimed T309000: Check home/HDFS leftovers of razzi.
Tue, May 31, 4:03 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric updated subscribers of T309000: Check home/HDFS leftovers of razzi.

I've reviewed everything above and it can all be safely deleted. An admin needs to do this, with cumin, see instructions (ping @Ottomata) The HDFS and Hive stuff is done, I took care of it.

Tue, May 31, 3:34 PM · Data-Engineering-Kanban, Data-Engineering
Milimetric added a comment to T309000: Check home/HDFS leftovers of razzi.
====== stat1004 ======
total 513244
drwxr-xr-x  2 26051 wikidev      4096 Jul 20  2021 hdfs-namenode-fsimage
-rw-rw-r--  1 26051 wikidev   1245367 Jan 10 16:42 part.txt
-rw-r--r--  1 26051 wikidev      3155 Oct 28  2020 razzi-key.txt
drwxrwxr-x 11 26051 wikidev      4096 Mar 16  2021 refinery
-rw-r--r--  1 root  root    524288000 May 18  2021 test.img
drwxrwxr-x  6 26051 wikidev      4096 Dec  7  2020 venv
drwxrwxr-x  6 26051 wikidev      4096 Dec  7  2020 venv3
Tue, May 31, 2:37 PM · Data-Engineering-Kanban, Data-Engineering

May 27 2022

Milimetric added a comment to T240860: AQS `edited-pages/new` metric does not make clear that the value is net of deletions.

@EChetty: how does this get prioritized though? Is this a bug affecting users? (I think it is, but not sure how we're defining that)

May 27 2022, 6:25 PM · Platform Engineering, Data-Engineering, Product-Analytics
Milimetric added a comment to T241180: RFC: Adopt a modern JavaScript framework for use with MediaWiki.

@Jasonkhanlar: thanks very much, I hadn't heard of that before. I'll consider it for our wikistats refactor, and maybe @egardner would be interested when he gets back.

May 27 2022, 6:18 PM · Front-end-Standards-Group, Design-Systems-team-20200324-20220422, TechCom-RFC (TechCom-RFC-Closed), Security-Team
Milimetric edited projects for T308294: Grant Access to `wmf` for `Dmantena`, added: Data-Engineering; removed Analytics.
May 27 2022, 6:14 PM · Data-Engineering, SRE, LDAP-Access-Requests
Milimetric added a comment to T308294: Grant Access to `wmf` for `Dmantena`.

@Tsevener is right, and that's the access that @RhinosF1 pointed to. @Dmantena: unfortunately, due to how authentication and authorization works more broadly at wmf, this is the only way that we can manage access right now. Desiree Abad is leading an effort to improve that, you can connect with her for more details. But I totally agree with you that there should be a way to get this access without all the other implications. For your peace of mind, you can read the User Responsibilities section. You'll notice that you're very unlikely to get in trouble if you're going through the use case you describe here.

May 27 2022, 6:14 PM · Data-Engineering, SRE, LDAP-Access-Requests
Milimetric added a comment to T309382: DataHub rights assignment is case-sensitive.

seems like a bug to me. If this is a requirement of the system, it should just lowercase transparent to the user.

May 27 2022, 5:53 PM · Data-Engineering, Data-Catalog
Milimetric moved T307711: User Experience: Authentication from Paused to Ready to Deploy on the Data-Engineering-Kanban board.
May 27 2022, 5:52 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307711: User Experience: Authentication from Blocked to Done on the Data-Catalog board.

Has anyone carried out any work on ascertaining who will be our data stewards?

May 27 2022, 5:51 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T252227: Mobile redirects drop provenance parameters.

I'm very intrigued @Milimetric about your comment about reinstrumenting pageviews in a declarative way (that sounds like it could help with some of our work around differential privacy too) though I assume that's a large large project.

May 27 2022, 5:48 PM · Data-Engineering, Traffic-Icebox, SRE

May 25 2022

Milimetric moved T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport from In Code Review to Done on the Data-Engineering-Kanban board.
May 25 2022, 8:29 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307716: Spike: Evaluate datahub schema versioning support from In Code Review to Done on the Data-Engineering-Kanban board.
May 25 2022, 8:29 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T307714: Custom Metadata ingestion.

Jobs are up for review at https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/63, tested in prod

May 25 2022, 8:29 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Catalog

May 24 2022

Milimetric added a comment to T299703: Evaluate DataHub as a Data Catalog.
source:
  type: "kafka"
  config:
    connection:
      bootstrap: "kafka-jumbo1001.eqiad.wmnet:9092"
      schema_registry_url: http://localhost:8081

sink:
  type: "datahub-rest"
  config:
    server: 'http://localhost:8080'
May 24 2022, 4:43 PM · Data-Catalog, Data-Engineering-Kanban, Data-Engineering

May 23 2022

Milimetric created T309046: Airflow: pin dependency versions to prevent long installs.
May 23 2022, 7:57 PM · Data-Engineering-Kanban, Airflow, Data-Engineering

May 19 2022

Milimetric added a comment to T308253: Is it possible to setup wikistats for a new wiki?.

In short: it would be very hard. There's a complicated data pipeline leading to the UI. It depends on how much value you would get out of such a tool. It's not a priority for us to make this generic beyond the scope of WMF projects, but it's not an inflexible piece of code.

May 19 2022, 4:42 PM · NFDI, Data-Engineering, Analytics-Wikistats
Milimetric moved T308610: Use dedicated Phabricator bug report / feature request forms from Incoming to Visualize on the Data-Engineering board.
May 19 2022, 4:16 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Wikistats
Milimetric moved T308597: Split turnilo staging off of an-tool1005 from Incoming to Ops on the Data-Engineering board.
May 19 2022, 4:16 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Engineering

May 17 2022

Milimetric added a comment to T308294: Grant Access to `wmf` for `Dmantena`.

Indeed, RhinosF1 is right, take a look at that link and I believe you need analytics-privatedata-users to run queries and access Presto-backed dashboards

May 17 2022, 8:45 PM · Data-Engineering, SRE, LDAP-Access-Requests

May 13 2022

Milimetric moved T307714: Custom Metadata ingestion from Next Up to In Code Review on the Data-Engineering-Kanban board.
May 13 2022, 7:44 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Catalog
Milimetric renamed T307714: Custom Metadata ingestion from Custom Metadata ingestion: to Custom Metadata ingestion.
May 13 2022, 7:44 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307714: Custom Metadata ingestion from MVP to In Review on the Data-Catalog board.
May 13 2022, 7:43 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport from Next Up to In Review on the Data-Catalog board.
May 13 2022, 7:43 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307716: Spike: Evaluate datahub schema versioning support from Next Up to In Review on the Data-Catalog board.
May 13 2022, 7:43 PM · Data-Engineering-Kanban, Data-Catalog

May 11 2022

Milimetric added a comment to T301895: Help with data that's not appearing on charts.

@Mayakp.wiki I think we should build all new line charts using apache echarts (Time Series Line Chart in this case). Whenever the migration CLI is ready, we can use it. Until then, echarts seem strictly better (let me know if I'm wrong). So maybe by the time the migration CLI comes out, we'll be naturally migrated anyway.

May 11 2022, 8:36 PM · Data-Engineering-Kanban, Superset, Data-Engineering, Product-Analytics
Milimetric added a comment to T263973: Wikistats Bug - easy to understand language for pageviews.

@Kipala & @TheresNoTime: I recently updated the language here as part of another task, can you take a look and see if it makes more sense? If not, please feel free to suggest a change and I can incorporate it: https://stats.wikimedia.org/#/sw.wikipedia.org/reading/total-page-views/normal|bar|1-year|~total|monthly

May 11 2022, 6:45 PM · Data-Engineering, User-TheresNoTime, Voice & Tone, good first task, Analytics, Analytics-Wikistats
Milimetric updated subscribers of T307245: Swift for differential privacy data publication.

@Htriedman: I know you're talking to @EChetty about this, we're triaging it to this column which is like a task "incubator". Once this is fully formed and we know what the pipeline looks like, we can help you expand this into the necessary tasks. When you're done, you can move this back to incoming to effectively ping us.

May 11 2022, 6:36 PM · Data-Persistence, SRE-swift-storage, Privacy Engineering, Data-Engineering

May 10 2022

Milimetric updated the task description for T307716: Spike: Evaluate datahub schema versioning support.
May 10 2022, 9:27 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307716: Spike: Evaluate datahub schema versioning support from In Progress to In Code Review on the Data-Engineering-Kanban board.
May 10 2022, 9:22 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric updated the task description for T307716: Spike: Evaluate datahub schema versioning support.
May 10 2022, 9:22 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric updated the task description for T307716: Spike: Evaluate datahub schema versioning support.
May 10 2022, 9:20 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport from In Progress to In Code Review on the Data-Engineering-Kanban board.

TODO: validate with @EChetty that the description here is what we want to evaluate (it looks more like what we want to know about schemas). And if not, see what else we need to understand about descriptions.

May 10 2022, 9:19 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport.

Ok, got a sense for how this works:

May 10 2022, 9:18 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T308052: Upgrade Datahub from MVP to Backlog on the Data-Catalog board.
May 10 2022, 8:24 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307711: User Experience: Authentication from Next Up to In Progress on the Data-Engineering-Kanban board.
May 10 2022, 8:24 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric assigned T307711: User Experience: Authentication to BTullis.

@BTullis: I'm doling out these tasks per our grooming session today, just to expedite the process. We decided there's only a few of us and we can stay in a tight loop. This was the top infrastructure thing we needed to look into. Emil said he validated that users without ssh access can login to datahub, but that it's confusing knowing which username to use. I guess maybe some clarity on the approach here, like a simple wiki article that we can link to, would be useful? Ping me if you want to brainbounce.

May 10 2022, 8:24 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T307710: Tagging Policy - Strategy.

@EChetty: how should we do this? Do you want to draft a policy and set up a meeting to discuss? Would you like me to have a first draft? Your call, I'm happy either way.

May 10 2022, 8:20 PM · Data-Catalog
Milimetric moved T308052: Upgrade Datahub from Backlog to MVP on the Data-Catalog board.
May 10 2022, 8:18 PM · Patch-For-Review, Data-Engineering, Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport from Next Up to In Progress on the Data-Engineering-Kanban board.
May 10 2022, 8:18 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric claimed T307717: Spike: Evaluate interaction of manual description edits and automatic description reimport.

I will work on this in parallel with the schema spike since repeated ingestion will tell us about both.

May 10 2022, 8:18 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric moved T307716: Spike: Evaluate datahub schema versioning support from Next Up to In Progress on the Data-Engineering-Kanban board.
May 10 2022, 8:17 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric claimed T307716: Spike: Evaluate datahub schema versioning support.

I will work on this first, using my hive database, milimetric, and reporting findings here.

May 10 2022, 8:17 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T307716: Spike: Evaluate datahub schema versioning support.

This looks to be available behind the scenes but just not surfaced in the UI yet? https://datahubproject.io/docs/dev-guides/timeline/

May 10 2022, 6:01 PM · Data-Engineering-Kanban, Data-Catalog
Milimetric added a comment to T307944: Evaluate Kafka Stretch cluster potential, and if possible, request hardware ASAP.

Quick stats check on revision sizes and diff sizes:

May 10 2022, 4:00 PM · Data-Engineering-Kanban, Event-Platform, Data-Engineering, Generated Data Platform

May 3 2022

Milimetric updated subscribers of T252227: Mobile redirects drop provenance parameters.

@BBlack: this was never our pipeline. It looks like @dr0ptp4kt's original idea was remove wprov so it doesn't fragment the cache. We don't particularly care one way or another, it doesn't affect our datasets directly. But obviously if the mechanism chosen here creates duplicate data, we should consider what we could add to the duplicate requests so they can be filtered out later. Personally, I think it's way overdue that we just instrument pageviews in a declarative way instead of parsing them out of webrequest.

May 3 2022, 2:10 AM · Data-Engineering, Traffic-Icebox, SRE
Milimetric added a comment to T301895: Help with data that's not appearing on charts.

I was wondering if we could disable the Line Chart type then, if it's deprecated, and did some digging but it doesn't seem to be easy to do. So this is a good workaround until we can get a Superset build without the buggy charts and replace all the existing dashboards. @Iflorez let us know what you think.

May 3 2022, 1:51 AM · Data-Engineering-Kanban, Superset, Data-Engineering, Product-Analytics

Apr 18 2022

Milimetric updated subscribers of T299897: Connect MVP to Hive metastore [Mile Stone 4].

When I get back I'll write an airflow job that does the ingestion on a regular basis. For now I'd just like @EChetty and @odimitrijevic to take a look and let me know their thoughts on the set of databases we chose to ingest (event, event_sanitized, wmf, wmf_raw, canonical_data), the frequency that we think we want to do this at, and anything else that comes to mind.

Apr 18 2022, 10:34 AM · Data-Engineering-Kanban, Data-Engineering, Data-Catalog
Milimetric moved T299897: Connect MVP to Hive metastore [Mile Stone 4] from In Progress to In Review on the Data-Catalog board.
Apr 18 2022, 10:31 AM · Data-Engineering-Kanban, Data-Engineering, Data-Catalog