Page MenuHomePhabricator

odimitrijevic (Olja Dimitrjevic)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 28 2021, 12:42 AM (136 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
ODimitrijevic (WMF) [ Global Accounts ]

Recent Activity

Thu, Dec 7

odimitrijevic added a comment to T259163: Migrate legacy metawiki schemas to Event Platform.

Decommissioning EventLogging would be EPIC!

Thu, Dec 7, 9:40 PM · Data-Engineering, Better Use Of Data, Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Product-Data-Infrastructure, Event-Platform

Wed, Dec 6

odimitrijevic updated subscribers of T352879: Update the sqoop configuration for mediawiki to obtain linktarget from the production replicas, instead of wikireplicas.
Wed, Dec 6, 5:55 PM · Data-Platform-SRE, Data-Engineering

Tue, Dec 5

odimitrijevic added a comment to T346463: Identify and label prefetch proxy data in our traffic.

Can the header be translated into an x-analytics value?

Tue, Dec 5, 4:37 PM · Patch-For-Review, Traffic, Movement-Insights, Data-Engineering

Sat, Dec 2

odimitrijevic updated subscribers of T300102: Upgrade Kafka to from 1.x to later version.
Sat, Dec 2, 1:00 AM · Data-Platform-SRE, Event-Platform, Epic, Data-Engineering, SRE, observability, serviceops
odimitrijevic added a comment to T300102: Upgrade Kafka to from 1.x to later version.

A few questions:

  • While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
  • What are the upgrade considerations for Kafka clients?
  • Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?
Sat, Dec 2, 12:59 AM · Data-Platform-SRE, Event-Platform, Epic, Data-Engineering, SRE, observability, serviceops

Fri, Dec 1

odimitrijevic added a comment to T351387: Requesting access to wmf and analytics-privatedata-users for EHughes (superset access with no server access).

Approved

Fri, Dec 1, 6:17 PM · Patch-For-Review, SRE, SRE-Access-Requests
odimitrijevic reassigned T351387: Requesting access to wmf and analytics-privatedata-users for EHughes (superset access with no server access) from odimitrijevic to eoghan.
Fri, Dec 1, 6:17 PM · Patch-For-Review, SRE, SRE-Access-Requests
odimitrijevic reassigned T350918: Requesting access to WMF LDAP group and deployment and analytics-privatedata-users shell access group for Grace (ecarg) from odimitrijevic to eoghan.
Fri, Dec 1, 6:17 PM · SRE, SRE-Access-Requests
odimitrijevic added a comment to T350918: Requesting access to WMF LDAP group and deployment and analytics-privatedata-users shell access group for Grace (ecarg).

Approved

Fri, Dec 1, 6:16 PM · SRE, SRE-Access-Requests
odimitrijevic added a comment to T352098: Requesting access to "researchers" and "analytics-privatedata-users" for Xiao Xiao.

Approved

Fri, Dec 1, 6:08 AM · SRE, SRE-Access-Requests

Oct 31 2023

odimitrijevic closed T310229: Data Catalog Documentation Style Guide as Resolved.

This was delivered as part of the "documentathon": https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub/Data_Catalog_Documentation_Guide

Oct 31 2023, 4:34 PM · Documentation, Data-Engineering, Data-Catalog
odimitrijevic closed T310229: Data Catalog Documentation Style Guide, a subtask of T349103: Define dataset documentation strategy, as Resolved.
Oct 31 2023, 4:34 PM · Goal, Tech-Docs-Team, Data-Engineering

Oct 25 2023

odimitrijevic added a comment to T344199: Grant Access to analytics-privatedata-users for ATsay-WMF.

Approved!

Oct 25 2023, 8:00 PM · SRE-Access-Requests, SRE, LDAP-Access-Requests

Oct 4 2023

odimitrijevic created T348204: Indicate cluster location metadata for Druid datasets.
Oct 4 2023, 10:45 PM · Data-Engineering

Sep 22 2023

odimitrijevic added a comment to T345726: Requesting Creation of a new POSIX group and system user for the Analytics WMDE team..

approved

Sep 22 2023, 9:07 PM · Data-Platform-SRE, SRE, SRE-Access-Requests

Sep 19 2023

odimitrijevic added a comment to T346694: Requesting access to analytics and search resources for dr0ptp4kt.

Approved

Sep 19 2023, 4:50 PM · SRE, SRE-Access-Requests

Sep 8 2023

odimitrijevic added a comment to T345959: Requesting access to analytics-privatedata-users for ahoelzl.

Approved!

Sep 8 2023, 9:41 PM · SRE, SRE-Access-Requests

Sep 6 2023

odimitrijevic moved T345390: eventutilities-python: cookicutter template example should be updated from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.
Sep 6 2023, 3:05 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

Sep 5 2023

odimitrijevic moved T345193: Document the onboarding journey on Event Platfrom from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.
Sep 5 2023, 5:58 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
odimitrijevic added a comment to T341848: Consult on investigation of relation between UA deprecation and increase in automated traffic.

@MGerlach does the pre-fetch traffic have headers that can identify it as such as it comes through as webrequests?

Sep 5 2023, 3:44 PM · Research (FY2023-24-Research-July-September)

Sep 1 2023

odimitrijevic added a comment to T345455: Requesting access to analytics-admins for cjming.

Approved

Sep 1 2023, 5:16 PM · SRE, SRE-Access-Requests

Aug 24 2023

odimitrijevic added a comment to T333011: [Maintenance] Define Migration/Deprecation Plan for Hue.

It looks like the request is also in PyHive with the following PR still open: https://github.com/dropbox/PyHive/pull/328

Bug closed because too old, and not fixed: https://github.com/apache/superset/issues/3243

Aug 24 2023, 11:56 PM · Data-Engineering, Data Pipelines (Sprint 14)

Aug 23 2023

odimitrijevic added a comment to T333011: [Maintenance] Define Migration/Deprecation Plan for Hue.

@JAllemandou is the limitation in data formatting coming from Presto or Superset (or both :) ?

Aug 23 2023, 6:30 PM · Data-Engineering, Data Pipelines (Sprint 14)
odimitrijevic moved T341277: mediawiki page_content_change should generate new meta.id field from In Review to Done on the Data Engineering and Event Platform Team (Sprint 1) board.
Aug 23 2023, 3:06 PM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform
odimitrijevic updated subscribers of T318863: [Event Platform] Event Platform and DataHub Integration.

@BTullis we'll need the SRE team's help with the deployment of the event platform schema ingestion into Datahub. The deployment involves a) creating the event steams custom platform and
b) deploying the ingestion code/transformer

Aug 23 2023, 6:08 AM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 18 2023

odimitrijevic edited projects for T326002: [Event Platform] eventgate-wikimedia occasionally fails to produce events due to stream config fetch errors, added: Data Engineering and Event Platform Team (Sprint 2); removed Data Engineering and Event Platform Team.

The failure of this job requires a manual rerun, and based on a recent assessment this happens with some frequency (on average once daily). Let's bring this into current sprint and continue to troubleshoot.

Aug 18 2023, 11:07 PM · Data-Engineering (Sprint 6), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), Event-Platform, Data Pipelines
odimitrijevic created Data Engineering and Event Platform Team (Sprint 2).
Aug 18 2023, 10:36 PM
odimitrijevic added a comment to T344199: Grant Access to analytics-privatedata-users for ATsay-WMF.

I approve

Aug 18 2023, 10:27 PM · SRE-Access-Requests, SRE, LDAP-Access-Requests

Aug 17 2023

odimitrijevic added a comment to T344257: Requesting membership in analytics-privatedata-users group, sql_lab role, Kerberos Principal for Omari Sefu.

Approving group membership

Aug 17 2023, 10:30 PM · SRE, SRE-Access-Requests

Aug 16 2023

odimitrijevic added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Here are some considerations that we discussed, that we need to further explore and decide on:

  • Explore creating a custom platform for Event Streams
  • Add top level event schema description as the dataset documentation. TBD on how to accomplish this given import options.
  • The schema import automatically adds subgroups under kafka based on the first dot segment of the schema name. In the production instance of DataHub there are also streams with the naming analytics/mediawiki/web_ab_test_enrollment. Can “/” be used as a separator to designate the top level category?
  • Can we import goblin lineage to propagate lineage from kafka > hive?
  • There would value to import hive event_raw database for completion of lineage events
  • Can we add a link to the event platform schema/datahub documentation to hive tables in event and event_sanitized? Lineage would be one way to trace this. Another would be to add links in the documentation to datasets with equivalent schema both upstream and downstream. This falls into the larger consideration on how to propagate metadata between equivalent datasets stored across different platforms and refinements.
  • Some of the kafka topics are remnants of tests and misconfiguration/misnamings. There is an option to add them to an exclusion list. Ideally we'd delete these in Kafka, otherwise there is an exclusion list.
  • Given that the prod datahub has the event streams current Kafka metadata can we delete and reimport all the Kafka metadata? If a fresh backup is not available it would be have one handy
  • Is there a way to add ownership data to event schema json and import it from there? This would benefit Metrics Platform work and allow alerting the right parties about event publishing errors. Some discussion about adding this data already happened https://phabricator.wikimedia.org/T201063#4546544
  • What is the best way to ingest the metadata? Datahub transformer vs airflow vs TBD?
Aug 16 2023, 10:25 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
odimitrijevic updated the task description for T318863: [Event Platform] Event Platform and DataHub Integration.
Aug 16 2023, 10:08 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
odimitrijevic updated the task description for T318863: [Event Platform] Event Platform and DataHub Integration.
Aug 16 2023, 9:45 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
odimitrijevic added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

@tchin as discussed today, that sounds like a good approach. Before deploying to production, let's wipe out the kafka metadata given that the original POC was imported under the kafka platform. I'll add these to the acceptance criteria.

Aug 16 2023, 9:43 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
odimitrijevic closed T288254: Explore Containerization Solutions for DE Applications as Resolved.
Aug 16 2023, 4:24 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Spike, Data Pipelines, Data-Engineering
odimitrijevic added a comment to T288254: Explore Containerization Solutions for DE Applications.

The work related to this has been done as part of standing up the DSE K8s cluster. I will go ahead and close the ticket.

Aug 16 2023, 4:24 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Spike, Data Pipelines, Data-Engineering

Aug 14 2023

odimitrijevic added a comment to T330834: Check home/HDFS leftovers of echetty.

@BTullis These are good to be removed

Aug 14 2023, 7:01 PM · Data-Engineering

Aug 10 2023

odimitrijevic added a comment to T343988: Security Issue Access Request for odimitrijevic.

Done. Are there any recovery keys to be had in case I am not able to access
my phone for whatever reason?

Aug 10 2023, 11:50 PM · Security-Team, Security

Aug 2 2023

odimitrijevic added a comment to T342797: Requesting access to analytics-privatedata-users for Maryana Pinchuk.

Approved.

Aug 2 2023, 10:04 PM · SRE, SRE-Access-Requests

Aug 1 2023

odimitrijevic added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

@Htriedman we are picking this work up again. Is the POC that you did available in a repository on gitlab?

Aug 1 2023, 5:39 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
odimitrijevic added a comment to T342878: GeoIP2-Anonymous-IP Subscription expired.

Thank you @jbond!

Aug 1 2023, 5:15 AM · Data-Platform-SRE, Infrastructure-Foundations, Data-Engineering, Puppet-Infrastructure

Jul 28 2023

odimitrijevic added a comment to T343039: Requesting access to Wiki Replicas end-to-end tiers for dr0ptp4kt.

Approved

Jul 28 2023, 11:55 PM · SRE, SRE-Access-Requests

Jul 27 2023

odimitrijevic added a comment to T315024: Creating a Spark session causes a torrent of log spam.

@Mayakp.wiki @nshahquinn-wmf Is this still an issue?

Jul 27 2023, 7:33 PM · Data-Engineering, Product-Analytics
odimitrijevic added a project to T342878: GeoIP2-Anonymous-IP Subscription expired: Data-Platform-SRE.
Jul 27 2023, 3:21 PM · Data-Platform-SRE, Infrastructure-Foundations, Data-Engineering, Puppet-Infrastructure
odimitrijevic added a comment to T342878: GeoIP2-Anonymous-IP Subscription expired.

This dataset is no longer subscribed to. We should remove the database from the download list.

Jul 27 2023, 3:19 PM · Data-Platform-SRE, Infrastructure-Foundations, Data-Engineering, Puppet-Infrastructure

Jul 26 2023

odimitrijevic added a comment to T342535: Requesting access to analytics_privatedata_users, deployment_members for Mabualruz.

Approved

Jul 26 2023, 11:26 PM · SRE, SRE-Access-Requests

Jul 11 2023

odimitrijevic added a comment to T336357: Grant temporary access to web based Data Engineering tools to Bishop Fox.

@BTullis do the permissions need to be removed before closing the task?

Jul 11 2023, 9:45 PM · Shared-Data-Infrastructure (2022-23 Q4 Wrap up), Data-Platform-SRE, Data-Engineering, LDAP-Access-Requests
odimitrijevic closed T326598: Ingest feature Hive schema into datahub as Resolved.
Jul 11 2023, 9:29 PM · Data-Engineering, Data-Catalog
odimitrijevic closed T290203: Discussion of Event Driven Systems as Resolved.
Jul 11 2023, 9:28 PM · Data-Engineering, Data Engineering and Event Platform Team, Event-Platform
odimitrijevic moved T340861: Implement a backfill job for the dumps hourly table from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 11 2023, 5:40 PM · Data Products (Sprint 01)

Jul 7 2023

odimitrijevic placed T333011: [Maintenance] Define Migration/Deprecation Plan for Hue up for grabs.
Jul 7 2023, 8:30 PM · Data-Engineering, Data Pipelines (Sprint 14)
odimitrijevic placed T336745: Split Cassandra Airflow dags by dataset up for grabs.
Jul 7 2023, 8:30 PM · Data Engineering and Event Platform Team (Sprint 0), Data Pipelines (Sprint 14)
odimitrijevic reassigned T340463: [Iceberg Migration] P.O.C. on Iceberg sensor using Iceberg table to keep status of updates from mforns to JAllemandou.
Jul 7 2023, 8:28 PM · Data-Engineering
odimitrijevic updated subscribers of T304889: Airflow CI/CD Documentation.

@Antoine does this still need to be implemented?

Jul 7 2023, 7:49 PM · Data Engineering and Event Platform Team, Documentation, Data Pipelines
odimitrijevic removed a parent task for T304889: Airflow CI/CD Documentation: T304409: [Airflow] Implement CI/CD pipelines for shared infrastructure..
Jul 7 2023, 7:48 PM · Data Engineering and Event Platform Team, Documentation, Data Pipelines
odimitrijevic removed a subtask for T304409: [Airflow] Implement CI/CD pipelines for shared infrastructure.: T304889: Airflow CI/CD Documentation.
Jul 7 2023, 7:48 PM · Epic, Data Engineering and Event Platform Team, Data Pipelines
odimitrijevic added a subtask for T295199: [Airflow] User manual and documentation: T304889: Airflow CI/CD Documentation.
Jul 7 2023, 7:48 PM · Data-Engineering-Kanban, Epic, Data Pipelines, Data-Engineering
odimitrijevic added a parent task for T304889: Airflow CI/CD Documentation: T295199: [Airflow] User manual and documentation.
Jul 7 2023, 7:48 PM · Data Engineering and Event Platform Team, Documentation, Data Pipelines

Jul 6 2023

odimitrijevic assigned T340059: Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments to gmodena.
Jul 6 2023, 5:35 PM · serviceops-radar, Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
odimitrijevic moved T340059: Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments from Next Up to Blocked/Paused on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 6 2023, 2:42 PM · serviceops-radar, Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
odimitrijevic moved T340746: mw-page-content-change-enrich should bump page_change schema from Next Up to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 6 2023, 2:41 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
odimitrijevic moved T341134: Investigate drift between `dt` and `meta.dt` from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 6 2023, 2:26 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Data Products (Sprint 0)
odimitrijevic added a project to T341134: Investigate drift between `dt` and `meta.dt`: Data Engineering and Event Platform Team (Sprint 0).
Jul 6 2023, 2:25 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Data Products (Sprint 0)
odimitrijevic moved T338169: mw-page-content-change-enrich should partition by and process by wiki_id,page_id from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 6 2023, 2:18 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
odimitrijevic edited projects for T341229: ProduceCanaryEvents job should be scheduled by Airflow, added: Data Engineering and Event Platform Team (Sprint 0); removed Data Engineering and Event Platform Team.
Jul 6 2023, 2:00 PM · Data-Engineering
odimitrijevic added a project to T341229: ProduceCanaryEvents job should be scheduled by Airflow: Data Engineering and Event Platform Team.
Jul 6 2023, 1:59 PM · Data-Engineering
odimitrijevic edited projects for T336084: [SPIKE] Model impact of User-Agent deprecation on top line metrics, added: Data Products; removed Data Engineering and Event Platform Team.
Jul 6 2023, 1:41 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14), Google-Chrome-User-Agent-Deprecation, Product-Analytics (Kanban), Data-Engineering
odimitrijevic edited projects for T340673: [Airflow Migration] Update Airflow Documentation, added: Data Engineering and Event Platform Team (Sprint 0); removed Data Engineering and Event Platform Team.
Jul 6 2023, 1:26 PM · Data-Engineering, Data Pipelines
odimitrijevic closed T282033: Airflow collaborations as Resolved.
Jul 6 2023, 1:25 PM · Data Engineering and Event Platform Team, Epic, Data Pipelines, Data-Engineering, Platform Team Workboards (Image Suggestion API)
odimitrijevic closed T271429: Replace Oozie with better workflow scheduler, a subtask of T282033: Airflow collaborations, as Resolved.
Jul 6 2023, 1:23 PM · Data Engineering and Event Platform Team, Epic, Data Pipelines, Data-Engineering, Platform Team Workboards (Image Suggestion API)
odimitrijevic closed T271429: Replace Oozie with better workflow scheduler as Resolved.

So gratifying to be able to be closing this task!

Jul 6 2023, 1:23 PM · Data Engineering and Event Platform Team, Data Pipelines, Data-Engineering, Epic, Product-Analytics
odimitrijevic closed T299074: Migrate Oozie jobs to Airflow, a subtask of T271429: Replace Oozie with better workflow scheduler, as Resolved.
Jul 6 2023, 1:23 PM · Data Engineering and Event Platform Team, Data Pipelines, Data-Engineering, Epic, Product-Analytics
odimitrijevic closed T299074: Migrate Oozie jobs to Airflow as Resolved.
Jul 6 2023, 1:22 PM · Data Engineering and Event Platform Team, Epic, Patch-For-Review, Data-Engineering, Data Pipelines
odimitrijevic added a project to T336084: [SPIKE] Model impact of User-Agent deprecation on top line metrics: Data Engineering and Event Platform Team.
Jul 6 2023, 1:21 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14), Google-Chrome-User-Agent-Deprecation, Product-Analytics (Kanban), Data-Engineering

Jun 28 2023

odimitrijevic added a subtask for T336739: Post Oozie -> Airflow migration refactorings: T340673: [Airflow Migration] Update Airflow Documentation.
Jun 28 2023, 4:34 PM · Data-Engineering, Epic, Data Pipelines
odimitrijevic added a parent task for T340673: [Airflow Migration] Update Airflow Documentation: T336739: Post Oozie -> Airflow migration refactorings.
Jun 28 2023, 4:34 PM · Data-Engineering, Data Pipelines
odimitrijevic created T340673: [Airflow Migration] Update Airflow Documentation.
Jun 28 2023, 4:32 PM · Data-Engineering, Data Pipelines

May 19 2023

odimitrijevic updated subscribers of T318346: Add Python Linter Checks to CI.

@lbowmaker now that we are on the other side of the Oozie migration can we prioritize for the next sprint?

May 19 2023, 2:48 AM · Patch-For-Review, Data Pipelines (Sprint 14), Data-Engineering-Planning

Apr 26 2023

odimitrijevic added a comment to T335445: Add user xcollazo to archiva-deployers LDAP group.

Yes, confirming the above. I approve the request.

Apr 26 2023, 7:30 PM · LDAP-Access-Requests, SRE

Apr 14 2023

odimitrijevic closed T334750: Superset is showing Database Error when trying to query against presto_analytics_hive as Resolved.
Apr 14 2023, 3:55 PM · Data-Engineering
odimitrijevic added a comment to T334750: Superset is showing Database Error when trying to query against presto_analytics_hive.

It turns out that the sql_labs role was missing. Unfortunately the error message did not indicate a permissions issue.

Apr 14 2023, 3:55 PM · Data-Engineering
odimitrijevic added a comment to T334750: Superset is showing Database Error when trying to query against presto_analytics_hive.

@HShaikh Can you please post the SQL statement that you are running.

Apr 14 2023, 3:46 PM · Data-Engineering
odimitrijevic added a project to T332953: Migrate PipelineLib repos to GitLab: Shared-Data-Infrastructure.
Apr 14 2023, 3:16 PM · Wikimedia-Developer-Portal, cloud-services-team, Data Engineering and Event Platform Team, Data-Platform-SRE, API Platform, Patch-For-Review, [DEPRECATED] wdwb-tech, Wikidata, Security-Team, SRE, Wikidata-Campsite, Anti-Harassment, Wikispeech, Structured-Data-Backlog, Platform Engineering, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Editing-team, Content-Transform-Team, Metrics Platform Backlog, Machine-Learning-Team, GitLab (Project Migration), Release-Engineering-Team (Priority Backlog 📥)

Apr 3 2023

odimitrijevic added a comment to T329310: Deprecate old mobile datasets.

Consider implementing together with https://phabricator.wikimedia.org/T329978

Apr 3 2023, 4:11 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
odimitrijevic added a comment to T329978: Delete empty tables unique_devices_*_wide_*.

Consider implementing together with https://phabricator.wikimedia.org/T329310

Apr 3 2023, 4:11 PM · Data Pipelines (Sprint 12), Data-Engineering-Planning
odimitrijevic added a comment to T328049: Investigate the effects of IP Masking on Data Eng systems.

Related conversation: https://phabricator.wikimedia.org/T332420

Apr 3 2023, 4:08 PM · Data Pipelines (Sprint 12)

Mar 21 2023

odimitrijevic added a comment to T331647: Grant Hal deployment rights.

Approved

Mar 21 2023, 3:10 PM · SRE, SRE-Access-Requests

Mar 16 2023

odimitrijevic created T332321: Cleanup HDFS folders for departed users.
Mar 16 2023, 3:25 PM · Data-Platform-SRE

Mar 15 2023

odimitrijevic updated subscribers of T310541: [NEEDS GROOMING] We should improve and automate python linting.

@xcollazo @Antoine_Quhen Does this apply to the airflow ci/cd that DE manages? Are there any improvements that we wish to adopt or expand?

Mar 15 2023, 3:47 PM · Data-Engineering, Data Pipelines
odimitrijevic added a comment to T325611: Add TikTok's in-app browser to ua-parser library.

Another ping to @Maryana and @MMiller_WMF . Do you have opinions on asking the TikTok team vs parsing the user-agent string as is?

Mar 15 2023, 3:40 PM · Data-Engineering, Data Pipelines, Product-Analytics

Mar 9 2023

odimitrijevic added a comment to T331125: Security Issue Access Request for nfraison.

Approved

Mar 9 2023, 9:09 PM · SecTeam-Processed, Security-Team, Security

Mar 3 2023

odimitrijevic created T331160: Assign Superset sql_labs access through customer roles.
Mar 3 2023, 5:54 PM · Data Pipelines, Data-Engineering-Planning
odimitrijevic added a comment to T327027: Massive spike in pageviews for a few enwiki pages beginning with "Index".

Is this the same issue reported in https://phabricator.wikimedia.org/T328127?

Mar 3 2023, 12:25 AM · Product-Analytics, Data Pipelines, Data-Engineering-Planning, Pageviews-Anomaly

Feb 17 2023

odimitrijevic added a comment to T327458: Document Traffic Datasets in Datahub.

unique_devices_project_wide_daily and unique_devices_project_wide_monthly have no data and have been marked as deprecated. Ticket to delete: https://phabricator.wikimedia.org/T329978

Feb 17 2023, 10:53 PM · Data Pipelines (Sprint 11), Data-Catalog
odimitrijevic added a comment to T329310: Deprecate old mobile datasets.

Consider doing this at the same time as https://phabricator.wikimedia.org/T329978

Feb 17 2023, 10:48 PM · Data Pipelines (Sprint 14), Data-Engineering-Planning
odimitrijevic moved T329978: Delete empty tables unique_devices_*_wide_* from Ops Week to Pipelines on the Data-Engineering-Planning board.
Feb 17 2023, 10:48 PM · Data Pipelines (Sprint 12), Data-Engineering-Planning
odimitrijevic created T329978: Delete empty tables unique_devices_*_wide_*.
Feb 17 2023, 9:14 PM · Data Pipelines (Sprint 12), Data-Engineering-Planning
odimitrijevic closed T178891: German derivative of Wikistats report shows marked difference for new editors in Aug vs Sep as Declined.
Feb 17 2023, 6:39 PM · Data-Engineering, Analytics-Radar, Data-Engineering-Wikistats

Feb 14 2023

odimitrijevic added a comment to T327458: Document Traffic Datasets in Datahub.

Currently only unique_devices_per_domain_monthly has the dataset description.
For unique devices, let's document all of:

  • unique_devices_per_domain_daily
  • unique_devices_per_domain_monthly
  • unique_devices_per_project_family_daily
  • unique_devices_per_project_family_monthly
  • unique_devices_project_wide_daily
  • unique_devices_project_wide_monthly
Feb 14 2023, 12:31 AM · Data Pipelines (Sprint 11), Data-Catalog

Feb 10 2023

odimitrijevic closed T37196: WikiStats should recognize global bots as Declined.
Feb 10 2023, 5:18 PM · Data-Engineering-Icebox

Feb 8 2023

odimitrijevic updated subscribers of T327458: Document Traffic Datasets in Datahub.
  • The [[ media requestvapi entry | https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,event.mediawiki_api_request,PROD)/Schema?is_lineage_mode=false ]] is lacking information for the normalized host. Not sure why that's the case given that the rest of the structs are filled in. The other aspects look good. Who would be the best person to fill it in? Who should be assigned as the data owner.
  • Are there links to external documentation that can be added?
  • The field is_wmf_domain talks about how it is derived but not what the field means. Is it a boolean that indicates if the request came from a wmf domain vs externally e.g. bot or toolforge tool?
  • Who should be the owner of this dataset?
Feb 8 2023, 8:50 PM · Data Pipelines (Sprint 11), Data-Catalog